Short-Term Traffic Forecasting: A Dynamic ST-KNN Model
         Considering Spatial Heterogeneity and Temporal
                         Non-Stationarity
                                        Shifen Cheng, Feng Lu
                   State Key Lab of Resources and Environment Information System
    Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences
                    11A, Datun Road, Chaoyang District, Beijing 100101, P. R. China
                                    {chengsf, luf}@lreis.ac.cn
ABSTRACT                                                                                  categories: parametric models and nonparametric models. A para-
Accurate and robust short-term traffic forecasting is a critical                          metric model uses an explicit parametric function to quantify the
issue in intelligent transportation systems and real-time traffic                         relationship between historical traffic data and predicted traffic
related applications. Existing short-term traffic forecasting ap-                         data. Considering the stochastic and nonlinear characteristics of
proaches are used to adopt global and static model structures and                         traffic, constructing a mathematical model with high accuracy
assume the traffic correlations between adjacent road segments                            for characterizing traffic characteristics in practice is difficult
within assigned time periods. Due to the inherent characteristics                         [1]. Nonparametric models, such as data-driven methods, do not
of spatial heterogeneity and temporal non-stationarity of city                            require a priori knowledge and explicit expression of mechanism;
traffic, it is rather difficult for these approaches to obtain stable                     thus, they are more suitable for short-term traffic forecasting
and satisfying results. To overcome the problems of static model                          problems [22] [23] [7] [31].
structures and quantitatively unclear spatiotemporal dependency                              As a typical nonparametric method, the k-nearest neighbors
relationships, this paper proposes a dynamic spatiotemporal k-                            (KNN) model has received considerable attention. Many scholars
nearest neighbor model, named D-ST-KNN, for short-term traffic                            have successfully applied the traditional KNN model to short-
forecasting. It comprehensively considers the spatial heterogene-                         term traffic prediction [2][17][19][4][16][30][28]. Considering
ity and temporal non-stationarity of city traffic with dynamic                            that the evolution of traffic is a spatiotemporal interaction pro-
spatial neighbors, time windows, spatiotemporal weights and                               cess, traffic conditions of road segments are spatially and mutu-
other parameters. First, the sizes of spatial neighbors and the                           ally affected [6]. Therefore, spatiotemporal relationships between
lengths of time windows for traffic influence are automatically                           multiple road segments in road networks are considered to im-
determined by cross-correlation and autocorrelation functions,                            prove traffic prediction [25][21][20]. Based on the traditional
respectively. Second, dynamic spatiotemporal weights are intro-                           KNN model, [26] realized an enhanced model with the support
duced into the distance functions to optimize the search mecha-                           of spatiotemporal information and argued that it achieves better
nism. Then, dynamic spatiotemporal parameters are established                             performance than the model that employs only temporal informa-
to adapt the continuous change in traffic conditions, including                           tion. [27] considered upstream and downstream traffic informa-
the dynamic number of candidate neighbors and dynamic weight                              tion and proposed a distributed architecture of a spatiotemporal-
allocation parameters. Finally, the D-ST-KNN model is evaluated                           weighted KNN model for short-term traffic prediction. [3] em-
using two vehicular speed datasets collected on expressways in                            ployed a spatiotemporal state matrix instead of the traditional
California, U.S. and city roads in Beijing, China. Four traditional                       time series to describe the traffic state while using a Gaussian
prediction models are compared with the D-ST-KNN model in                                 weight distance to select the nearest neighbor to improve the
terms of the forecasting accuracy and the generalization ability.                         KNN model. However, the disadvantages of these ST-KNNs are
The results demonstrate that the D-ST-KNN model outperforms                               that the spatiotemporal relation cannot be accurately quantified,
existing models in all time periods, especially in the morning                            which is primarily reflected in the modeling process, the size of
period and evening peak period. In addition, the generalization                           the spatial dimension m and the length of time window n of the
ability of the D-ST-KNN model is also proved.                                             state space cannot be automatically determined, and some values
                                                                                          are artificially set. For example, for m=3, three adjacent road
                                                                                          segments are selected; for n=2, the historical data of the first two
1    INTRODUCTION                                                                         time steps of the current time step are used to construct samples.
                                                                                          When the time series problem is transformed into a supervised
Short-term traffic forecasting, which has an important role in
                                                                                          machine learning problem, the values of m and n determine the
intelligent transportation systems, enables traffic managers to
                                                                                          number of selected features. Therefore, manually engineered fea-
formulate reasonable and efficient strategies for alleviating traffic
                                                                                          tures can easily cause dimensional disaster prevent the guarantee
congestion and optimizing traffic assignments. Short-term traffic
                                                                                          of the prediction accuracy of the model [15].
forecasting also enables the public to achieve accurate vehicular
                                                                                             The prediction model is usually static, thus, it cannot describe
path planning [29][10].
                                                                                          the characteristics of the dynamic change in traffic, which are
   In the past few decades, researchers have proposed several
                                                                                          primarily reflected in the following three aspects: 1) existing stud-
short-term traffic forecasting models that can be divided into two
                                                                                          ies usually assume that the spatial neighbors and time windows
                                                                                          are globally fixed, which indicates that once the number of road
© 2018 Copyright held by the owner/author(s). Published in the Workshop
Proceedings of the EDBT/ICDT 2018 Joint Conference (March 26, 2018, Vienna,
                                                                                          segments m associated with the predicted road segment and the
Austria) on CEUR-WS.org (ISSN 1613-0073). Distribution of this paper is permitted         length of the time window n are determined, they do not change
under the terms of the Creative Commons license CC-by-nc-nd 4.0.


                                                                                    133
in the spatiotemporal range. Considering the dynamic character-               neighbor and the dynamic time window instead of the traditional
istics of an urban road network, traffic flow in the road network             time series or the static spatiotemporal matrix to characterize
is not a static point but is a moving process from one location               the state space. Finally, we introduced the dynamic spatiotempo-
to another location. The spatial neighbors of the road segment                ral weight, dynamic spatiotemporal parameters, and Gaussian
primarily rely on the current traffic conditions. The number of               weight function to improve the KNN model to adapt to the dy-
spatial neighbors is very small if traffic congestion exists but is           namic and heterogeneous characteristics of the traffic.
large during flat peak periods [5]. From the perspective of ur-                  The remainder of this paper is organized as follows: Section 2
ban road network heterogeneity, the number of relevant road                   proposes a D-ST-KNN model that considers the spatial hetero-
segments for different road segments also differs; thus, sharing              geneity and temporal non-stationarity of city road traffic. The
parameter m is difficult in the entire spatial range [29]. The selec-         construction of the dynamic spatiotemporal state matrix, weights,
tion of a time window based on a time series is used to determine             and other parameters are also introduced in this section. In Sec-
the length of the historical traffic data to match similar traffic            tion 3, the dynamic characteristics, prediction performance, and
patterns. The traffic data in the historical time step and the cur-           computational efficiency of the presented model are compre-
rent time step must be relevant in the selection process [18]. Due            hensively validated. The experimental results are also discussed.
to the dynamic and heterogeneous nature of the road network,                  Section 4 concludes the paper and provides an outlook of future
even the same road segment, a significant difference is observed              work.
in the time series of traffic data in different time periods (such as
morning and evening peak periods). That causes the selection of               2     METHODOLOGY
the time window to be dynamic [8]. Thus, the spatial neighbors                In this section, we propose a D-ST-KNN model. Our method is
and time windows that dynamically change over time and space                  divided into five phases: the data bucket partition, state space
are not easily described with globally fixed spatiotemporal state             definition, distance function definition, optimal neighbor selec-
matrices; thus, there is a need for a dynamic spatiotemporal KNN              tion, and prediction function definition, which corresponds to
model to adapt to the characteristics of traffic changes. 2) Existing         Sections 2.1-2.5. First, considering the dynamic nature of traffic,
research considers that different historical data for different time          the original spatiotemporal data sets are partitioned according
periods have different contributions to the prediction of future              to different time periods to form different data buckets. Second,
traffic conditions. When calculating the distance between two                 considering the spatial heterogeneity, each segment of a data
state spaces, the weight distance criterion is usually adopted to             bucket is separately processed, and the appropriate spatial neigh-
assign different weights to each component in the state space.                bors and time windows are selected. The spatiotemporal state
The closer the time window is to the predicted time, the larger the           matrix is constructed to describe the traffic conditions. Then, we
allocated weight; the closer the spatial distance is to the predicted         introduce the spatiotemporal weight matrix to define the dis-
road segment, the greater the assigned weight [3]. However, dy-               tance function and measure the distance between the current
namic changes in the spatial neighbor and the time window not                 spatiotemporal state matrix and the historical spatiotemporal
only affect the dimension of the space-time matrix but also cause             state matrix to select the K nearest neighbors. Finally, we inte-
the intensity of the correlation among different positions to dy-             grate these neighbors to obtain the predicted value of the target
namically change over time. Therefore, the influence of different             road segment.
components of traffic data is difficult to characterize with global
fixed spatiotemporal weight matrix. 3) To determine the value of
                                                                              2.1     Data bucket
the number of similar state spaces K, researchers usually employ
a cross-validation method to select a suitable value, then share in           Considering the non-stationarity and periodicity of traffic data,
the entire range of space and time[26] [28]. Due to the difference            there are significant differences in the traffic characteristics among
in traffic patterns in the different time periods and space loca-             different time periods, such as the morning peak period, inter-
tions, the global fixed value of K cannot adapt to the dynamic                peak period, and evening peak period. In the same period, the
and heterogeneous nature of a road network.                                   traffic data of same road segment has statistical homogeneity
    The key to short-term traffic forecasting models is the effective         and the traffic pattern tends to be stable with periodic changes,
use of the potential spatiotemporal dependencies in the traffic               such as different days for the morning peak period, which results
data. The existing KNN models usually assume that the traffic                 in the spatial neighbor, the time window, and spatiotemporal
change is a static point process and often disregard its important            parameters that can be shared. Therefore, we divide the origi-
                                                                                                    L
dynamics and heterogeneous characteristics. As a result, the                  nal traffic data {volt j , j ∈ [1, N ], t ∈ [t 0 , tc ]} into different time
structure of the prediction model is usually globally fixed in time           periods to describe the homogeneity in same time period and
and space, including the globally fixed spatial neighbor, time                dynamics in different time periods, where t 0 and tc represent the
window, spatiotemporal weights, and spatiotemporal parameters,                start time step and the current time step of the time series, and
such as the traditional KNN model and the spatiotemporal KNN                  L j denotes the jth road segment.
model.                                                                            In the study of urban traffic modeling and prediction, to distin-
    In this paper, we propose a dynamic spatiotemporal KNN                    guish the difference among the traffic characteristics in different
model (D-ST-KNN) for short-term traffic prediction considering                time periods, [24] divided a day into six time periods (period 1:
spatial heterogeneity and temporal non-stationarity of city traf-             midnight-6:30 am; period 2: 6:30-10:00; period 3: 10:00-13:30; pe-
fic. First, we investigated the autocorrelation of road traffic to            riod 4:13:30-17:00; period 5:17:00-20:30; period 6:20:30-midnight).
determine the time window required for the traffic data. Second,              The test reveals that the partition is statistically acceptable. Based
we used the cross-correlation among different road segments to                on this analysis and according to the same strategy, the original
analyze the spatiotemporal dependencies of traffic and build a                traffic data are divided into M different time periods (M= 6) ac-
dynamic spatial neighbor for each road segment. The dynamic                   cording to the time dimension, which corresponds to different
spatiotemporal state matrix is obtained by the dynamic spatial                data buckets. Assuming that the entire traffic data set is BK, the


                                                                        134
                                                                                           Lv
data bucket division must be satisfied:                                          where ψbk      is the lag value that maximizes cross-correlation of
                                                                                             i

                                                                                the surrounding road segment Lv to the predicted road segment
          
          
          
            BK = bk 1 ∪ bk 2 ∪ ... ∪ bk M                                                         Lv
                      L                                                          in bki , and ψbk     describes the maximum impact time range of
            bki = {volt j |1 ≤ j ≤ N , ∀t ∈ [tabki , tbbki )}       (1)
          
                                                                                                    i
           bk ∩ bk = ϕ                                                          the surrounding segments in different buckets on the predicted
             i     o
                                                                                 road segment, which can be employed for efficient selection of
where i ∈ [1, M], o ∈ [1, M], i , o, bki is the ith bucket (i.e.,                spatial neighbors. Consider the predicted road segment L j in bki
                     L                                                           and its predicted time interval ∆t. When the surrounding road
bucket 1), and volt j is the traffic data of road segment L j at
                                                                                 segments deliver the traffic flow to the predicted road segments
time step t. t ∈ [tabki , tbbki ) indicates that time step t is within           within a given time interval, they influence the predicted road
the corresponding time period of the ith bucket (i.e.,[0:00-6:30),               segments, and the road segments beyond this time interval are
[6:30-10:00)). L j denotes the jth road segment (i.e., Link 1), and              excluded. Its formal definition is expressed as
N is the total number of road segments. Note that dividing the                                           n                                o
                                                                                                 L                   Lv
original traffic data into different buckets at the pre-processing                             Rbkj ← Lv |∀0 ≤ ψbk      ≤ ∆t, v ∈ [1, N ]        (4)
                                                                                                  i                       i
stage does not have any impact on the analyses and conclusions
                                                                                        L
in this study because the same partitioning strategy were used                   where Rbkj is the set of spatial neighbors of the jth road segment
for all the algorithms that are evaluated.
                                                                                           i
                                                                                 in the ith bucket.
                                                                                    2.2.2 Dynamic time windows. Considering that the selection
2.2 Dynamic spatiotemporal state matrix                                          of the time window is based on the time series of the predicted
2.2.1 Dynamic spatial neighborhoods. The dynamic spatial neigh-                  road segment, we can select n historical traffic data that have a
borhood is used to determine how the traffic conditions of the                   correlation with the predicted road segment. The autocorrelation
predicted road segment are affected by the surrounding road seg-                 function is usually employed to measure the correlation between
ments in different buckets to determine the correlation among                    the time series and its delayed version; thus, it can be used for the
road segments. The traditional method usually calculates the cor-                selection of the time window, i.e., the lag in which the prediction
relation coefficients between the time series of the predicted road              error is minimized can be set as the window size. Note that the
segments and the time series of other road segments and sets the                 lag in the autocorrelation function describes the delay effect of
threshold to select the relevant road segments [3]. Considering                  the time series, and the lag described in Section 2.2.1 is used to
that a road network has multiple internal and external factors,                  characterize the delay effect between different time series. Given
                                                                                                                                              L
such as the influence of traffic lights, the impact of surround-                 the time series of the jth road segment L j in bki , U = {volt j |∀t ∈
ing road segments on predicted road segments has a certain                               bk                                         L
                                                                                 [tabki , tb j )}, the autocorrelation function ρbkj (δ ) can be defined
degree of lag. Therefore, the delayed spatiotemporal relation-                                                                          i
                                                                                 as follows:
ships cannot be exactly expressed by correlation coefficients. The
cross-correlation function is a delayed version of the correlation                         L           E [(ut − µu ) (ut −δ − µu )]
                                                                                          ρbkj (δ ) =                               , δ = 0, 1, 2, · · · , (5)
coefficient function, which measures the correlation coefficients                             i                    σu2
of two time series at a specific lag [14]; therefore, it is more                     Using the autocorrelation function to set the time window
suitable for describing the spatiotemporal dependence of traffic.                entails three steps. First, consider the computational limitations,
    Assume that bki is the bucket of the predicted road segment                  it is necessary to determine the maximum range of lag. Second,
L j at time step t, and t ∈ [tabki , tbbki ). Given the surrounding              within the range, the parameters of the predictive model are
road segments Lv , the time series of the traffic data for two road              fixed, and cross-validation is performed with different lags. This
                                                L
segments can be expressed as U = {volt j |∀t ∈ [tabki , tbbki )},                strategy is based on the fact that the value of the traffic data has a
                                                                                 significant correlation within the maximum lag range. Finally, the
Z = {voltLv |∀t ∈ [tabki , tbbki )}, j ∈ [1, N ], v ∈ [1, N ], and their
                                                                                 lag that minimizes the prediction error is chosen as the optimal
cross-correlation at lag φ is defined as follows:
                                                                                 time window.
           
           
                              bk
           
                             γ zi (φ)
           
                 bk i
                       (φ) = u,       , φ = 0, ±1, ±2, · · · ,
                                                                                2.3     Dynamic spatiotemporal weights
             cc fu,z
           
                             
                              α u σz                    
            γu,z
                bk i
                      (φ) = E q (ut − µu ) zt +φ − uz                            Considering the traffic conditions have significant differences
                                 Í                                 (2)
           
                                    (ut − µu )2                                 at different time intervals, which results in a change in the spa-
                             q
                         αu =
           
                                Í              2                               tiotemporal weights with time; the historical data of different
           
                       σz =         zt +φ − uz                                  time and space will influence the future traffic conditions by a
        bk i
                                                                                 different degree. The dimension of the spatiotemporal weight
where γu,z   (φ) is the correlation coefficient between time series              is related to the spatiotemporal state matrix, and the dynamic
U and time series Z at lag φ in bucket bki , µu and uz are the mean              change in the spatiotemporal matrix causes the dimension of
values of U and Z, respectively, and σu and σz are the standard                  the spatiotemporal weight matrix to change with different time
deviations of U and Z, respectively.                                             periods. Based on the traditional weight distance function, we
   In this definition, the cross-correlation function can be re-                 introduce a dynamic spatiotemporal weight in the distance func-
garded as a function of lag, and the lag value that makes the                    tion and optimize the weight distance function to adapt the near-
cross-correlation function obtain the maximum value is the av-                   est neighbor similarity measure of the dynamic spatiotemporal
erage delay time of the surrounding segments to the predicted                    matrix.
road segment [29]. The formal definition is expressed as                            In the temporal dimension, we use the time interval length (i.e.,
                                                                               5 min interval) to characterize the contribution of different time
             Lv
            ψbk              bk i
                = arдmax cc fu,z  (φ) , v ∈ [1, N ]                 (3)          steps. In the spatial dimension, the spatial correlation (such as
                 i        φ                                                      cross-correlation) is used to characterize the influence of different


                                                                           135
spatial distances. The construction method is described as follows:                       prediction accuracy of the model. The K value is primarily em-
assume that the predicted road segment L j at the current time                            ployed to determine the number of candidate neighbors. If the K
step tc is in data bucket bki and the dimension of the spatiotempo-                       value is too small, the model becomes more complex and overfit-
                      L     L
ral state matrix is mbkj ×nbkj , which is determined by the method                        ting is possible. If the K value is too large, the model is simpler
                        i      i
provided in Section 2.2. Then, the spatiotemporal                                         and under-fitting is possible. Considering that the selection of the
                                                   state matrix
                                                               of                        K value is significantly influenced by the finite sample nature of
                                                          L       L       L
the current time step can be expressed as χtcj mbkj , nbkj .The
                                                    i     i                               the problem, the assignment of its values is usually performed by
spatiotemporal
              matrix of the historical time step hi can be de-                          cross-validation to select the K value that minimizes the model
           L         L       L            L
fined as χh j mbkj , nbkj , where mbkj is the spatial dimension of                        error [27].
           i      i      i               i
the spatiotemporal state matrix of the jth predicted road segment                            The existing methods usually assume that the K value is glob-
in the ith bucket, which is related to the number of elements                             ally fixed. When the K value is determined, it is shared through-
                                  L                    L
in the set of spatial neighbors Rbkj . Moreover, nbkj is the tem-                         out the entire space and time. In contrast to the existing method,
                                      i                    i
                                                                                          the selection of the K value in the D-ST-KNN model considers the
poral dimension of the spatiotemporal state matrix of the jth
                                                                                          characteristics of dynamic changes of traffic. Instead of setting
predicted road segment in the ith bucket, which is the size of
                                                                                          a global fixed K value, we can select the optimal K value for
the time window. The time-weighted matrix is defined as Wtbki ,
                                                                                          different buckets, i.e., Kbki , bki ∈ BK, i] ∈ [1, M].
and the space-weighted matrix is defined as Wsbki . The corre-                               To verify these assumptions, we use cross-validation to set the
                                                  L                 L
sponding elements are w tbki (ti, t j), ti ∈ [1, nbkj ], t j ∈ [1, nbkj ]                 range of K to [1, 40] and test the effect of different K values on
                                                              i               i
                                 L                    L                                   MAPE of the model in different buckets, as shown in Fig. 1.
and w sbki (si, sj), si ∈ [1, mbkj ], sj ∈ [1, mbkj ], which represent
                                       i                   i
the time weight value and space weight value, respectively, as-
signed to the jth predicted road segment in the ith bucket. The
                                                                                                                       Bucket1                                             Bucket2                                   Bucket3
                                                                                                       12.5                                            20.5                                              16.0


weight distribution is as follows:
                                                                                                       12.0                                            19.5                                              15.0


                                            , Lj
                                                                                                       11.5                                            18.5                                              14.0


                                                                                                                                                 MAPE(%)


                                                                                                                                                                                               MAPE(%)
                                    
                                                                                             MAPE(%)


                                                                                                       11.0


                                    
                                                                                                                                                       17.5                                              13.0


                                             Ínbki
                                                                                                       10.5

                                    
                                    
                                                                                                                                                       16.5                                              12.0
                                                                                                       10.0
                                                                                                                                                       15.5                                              11.0
                                         ti     t i=1 ti, ti = t j
                 w tbki (ti, t j) =
                                                                                                        9.5

                                                                            (6)
                                    
                                                                                                                                                       14.5                                              10.0
                                                                                                        9.0

                                    
                                                                                                                                                                  0   10       20    30   40                    0   10    20   30    40

                                    
                                                                                                              0   10        20    30        40
                                                                                                                                                                                K                                          K


                                                   0, ti , t j
                                                                                                                             K


                                            ,
                          
                                                                                                                       Bucket4                                             Bucket5                                   Bucket6

                          
                                             Ímbki
                                                  Lj
                           cc f si
                          
                                                                                                       19.5                                                22.5                                          15.0
                                                                                                                                                           21.5                                          14.5
                                                           si                                          18.5
                                                                                                                                                                                                         14.0

        w sbki (si, sj) =                      si=1 cc f Lv , L j , si = sj (7)
                                                                                                       17.5                                                20.5                                          13.5
                                   Lv , L j
                          
                                                                                             MAPE(%)


                                                                                                                                                 MAPE(%)


                                                                                                                                                                                               MAPE(%)
                                                                                                                                                           19.5

                          
                                                                                                                                                                                                         13.0
                                                                                                       16.5


                          
                                                                                                                                                           18.5                                          12.5


                                                      0, si , sj
                                                                                                       15.5                                                17.5                                          12.0
                                                                                                                                                                                                         11.5
                                                                                                       14.5                                                16.5                                          11.0
                                                                                                       13.5                                                15.5                                          10.5

    In this definition, the temporal and spatial weights are linearly                                         0   10       20
                                                                                                                            K
                                                                                                                                 30    40                         0   10       20
                                                                                                                                                                                K
                                                                                                                                                                                     30   40                    0   10   20
                                                                                                                                                                                                                          K
                                                                                                                                                                                                                               30   40


distributed according to the proximity of the current time step
and the predicted road segments. cc f Lsi , L is the cross-correlation                    Figure 1: Impact of the number of candidate neighbors
                                                  v j
between the time series of the si spatial neighbor (whose road                            Kbki on the MAPEs of different data buckets.
segment is Lv ) and the predicted road segment L j . The closer
the value is to the predicted time, the greater the weight of                                As the K value increases, the prediction error is gradually
the allocation; the greater the relation to the space of the pre-                         reduced. When the K value attains a certain value, the error
dicted road segment, the greater the weight. By introducing spa-                          of the model begins to stabilize. Thus, the optimal K value for
tiotemporal weights into the original spatiotemporal matrix, the                          each bucket can be determined (i.e., Kbk1 = 27, Kbk2 = 23).
spatiotemporal-weighted state matrices of the current time step                           Compared with different buckets, the K values dynamically vary
  L
Γtc j and the spatiotemporal-weighted state matrices of the his-                          with different time periods. The global fixed K value has difficulty
                         L                                                                describing the dynamic change in traffic. Therefore, the dynamic
torical time step Γh j are denoted by the following:
                                                                                          K value proposed in this paper is reasonable. The parameters
                                             
                    i
               Lj             L    L     L                                                of the D-ST-KNN model also contain the parameters introduced
             Γtc = Wsbki × χtcj mbkj , nbkj × Wtbki                           (8)
                                                                                          by the predicted generation function (refer to Section 2.5). The
                                             
                                     i      i

               L              L    L     L                                                calibration method of the parameter is shown in Section 3.2.
             Γh j = Wsbki × χh j mbkj , nbkj × Wtbki                          (9)
                 i                   i        i       i
                                       L      L
   By calculating the distance dbki (Γtc j , Γh j ) between the histor-                   2.5                 Predictive function
                                               i
ical spatiotemporal state matrix and the current spatiotemporal                           Due to the spatiotemporal state space, the spatiotemporal weight,
state matrix, candidate neighbors can be selected. The formula is                         and the spatiotemporal parameters dynamically change with dif-
expressed as                                                                              ferent buckets; to adapt to this change, the predictive generation
                   r                                                              function should also dynamically change. This paper transforms
           L    L              L     L        L      L ′
  dbki Γtc j , Γh j = trac Γtc j − Γh j × Γtc j − Γh j      (10)                          the four types of traditional weight distribution methods to en-
                                                                                          able them to adapt to the dynamics of traffic, including the inverse
                     i                            i                   i

where trac represents the trace of the matrix.                                            distance weight [23], rank-based weight[11][13], and Gaussian
                                                                                          weight [3]. Selecting the best prediction function by comparing
2.4 Dynamic spatiotemporal parameters                                                     the performance of different predictive functions (refer to Section
In the KNN model, the spatiotemporal parameters include the                               3.2). Note that the weight referred to in this section is expressed
K values and the parameters introduced during the method con-                             as the weight assigned by the candidate neighbor, whereas the
struction (such as the prediction generation functions). The rea-                         weight in Section 2.3 represents the weight matrix of the weights
sonableness of the parameters has substantial influence on the                            assigned to each element in the spatiotemporal state matrix.


                                                                                    136
   Assuming that dbk
                   k is the distance between the kth candi-
                     i
                                                                                     3 EXPERIMENTS
date neighbor and the predicted road segment in the ith bucket                       3.1 Data preparation
                             L
obtained by formula (10), voltcj+1 the predicted value of the pre-                   In this study, two different data sets are used to evaluate the
dicted road segment L j at time step tc + 1 is defined as                            performance of the prediction model. The first data set is PeMS,
                                                                                     which is a high-quality data set with open access. PeMS is exten-
                           ÍKbki         L                  L
               L
                                      volh j +1 (k) × φbkj (k)                       sively applied in the field of traffic prediction. The traffic speed
                               k =1
             voltcj+1 =               ÍKbki L j
                                          i               i
                                                                        (11)         data from 59 consecutive locations on the US 101 freeway from
                                       k =1 bk
                                              φ     (k)
                                                    i
                                                                                     PeMS were downloaded for a total of 60 days; the time period
                                                                                     is August 15, 2016, to October 14, 2016 and time interval is 5
where tc ∈ [tabki , tbbki ] is used to map the current time step into                min (as shown in Table 1). Each detector represents a position;
the corresponding bucket, is used to determine the number of                         the positional distribution is shown in Fig. 2. The second data
candidate neighbors for the corresponding bucket, volh j +1 (k)
                                                               L                     set is the floating car trajectory data obtained from the Beijing
represents the traffic data of the kth candidate neighbor, and
                                                                 i
                                                                                     road network, which is generated from more than 50,000 vehicles
                            L                                                        equipped with GPS. The frequency of data acquisition is 5 min,
hi ∈ [tabki , tbbki ]; and φbkj (k) and represent the weight of the kth
                               i                                                     and the time period is March 1, 2012, to April 30, 2012 (as shown
neighbor of the jth predicted road segment in the ith bucket. The                    in Table 1). In this study, a representative region that contains
form is defined as follows:                                                          30 road segments is used for the experiment with the position
                           
                                             1                                      distribution shown in Fig. 2. In the two data sets, the last ten
                           
                           
                           
                                             Kbki
                                                                                    days are used as the test data to evaluate the accuracy of the
                           
                           
                                              1
                 L         
                                             k
                                             dbk                                     model. The remaining days of data are employed as training data
             φbkj (k) =                         i
                                      (Kbki − rq + 1)2                  (12)         to construct the historical database of the predicting model.
                           
                           
                           
                     i
                           
                           
                                                                2                       In addition, we normalize the original traffic data and use the
                           
                           
                                                  k
                                                dbk
                                    1              i
                                  4π abki exp(− 4abki 2 )
                                                                                     ratio of the average traffic speed to the maximum speed limit of
                           
                                                                                     each road segment to express the traffic conditions of the road
   Formula (12) corresponds to equal weights, inverse distance                       segment. The formal expression is as follows:
weights, the rank-based weight and the Gaussian weight, where                                                vi,t
                                                                                                    vdi,t =        , i ∈ [1, N ], t ∈ [t 0 , tc ]     (16)
rq represents the order of the qth candidate neighbors, and abki                                            fi,max
is the spatiotemporal parameter whose value is similar to the                        where vdl,t is the normalized speed of the ith road segment at
value of the previously discussed spatiotemporal parameter K,                        time step t, vi,t is the real average speed data of the road segment,
which dynamically values with different time periods. The corre-                     and fi,max is the speed limit for the ith road segment.
sponding parameter calibration is shown in Section 3.2.
                                                                                           Table 1: Description of the experimental data sets
2.6 Accuracy metrics
Three criteria are selected to verify the prediction accuracy of                         Data set                 PeMS                    Beijing
the D-ST-KNN model, namely, mean absolute error (MAE), mean                             Time span          8/15/2016-10/14/2016     3/1/2012-4/30/2012
absolute percentage error (MAPE) and root-mean-square error
                                                                                       Time interval              5 min                   5 min
(RMSE). These indicators depict the essential characteristics of
errors from different perspectives. The RMSE indicates a fluctua-                     Number of links               59                      30
tion in the error of the prediction model, and the MAPE indicates
the difference between the predicted and the actual traffic data. In
contrast, the MAE and RMSE provide a measure of the similarity
between the predicted and the actual traffic data [12]. The MAE,
MAPE, and RMSE are defined as follows:

                 1     ÕM Õ  N Õ  S
                                          L              L
    M AE =                            vol tcj+1 (s) − vol tcj+1 (s)     (13)
             M × N × S i =1 j =1 s =1

            v
            u
            u                                                                      Figure 2: Spatial distribution of traffic data in the Beijing
            u
            u                                       
            u
            u
                                       L               L
            u
            t               Õ
                            M Õ
                              N Õ
                                S  vol tcj+1 (s) − vol tcj+1 (s)                     and PeMS data set.
                    1
 MAP E =                                                                (14)
                M × N × S i =1 j=1 s =1                    L                        3.2      Variable estimation
                                                        vol tcj+1 (s)
                                                                                     3.2.1 Determining the optimal distance function. The distance func-
           v
           u
           t
                            Õ
                            M Õ S 
                              N Õ                                2                  tion is used to measure the similarity among the spatiotemporal
                    1                               
 RMS E =
                                       L               L
                                   vol tcj+1 (s) − vol tcj+1 (s)    (15)             state matrices to obtain the historical spatiotemporal matrix,
                M × N × S i =1 j=1 s =1                                              which is similar to the spatiotemporal state matrix of the target
 where M is the number of buckets M = 6, N is the number                             road segment. Fig. 3 shows the performance differences of the dis-
of predicted road segments, S is the number of test samples,                         tance function constructed with different weights. The traditional
   L               L
                                                                                     method directly calculates the Euclidean distance between two
voltcj+1 (s) and voltcj+1 (s) indicate the actual traffic data and the               spatiotemporal state matrices, which treats the elements in the
predicted traffic data at the next time step of the jth predicted                    space state matrix equally. The influence of the historical traffic
road segment at the current time step, and s indicates the sth test                  conditions of different time and space distribution on the pre-
sample in the ith bucket.                                                            diction of future traffic conditions is difficult to describe, which


                                                                               137
yields the lowest performance. The distance function constructed                                                                                performance of the prediction model changes with Kbki (refer
by the Gaussian function assigns weights in the time dimension                                                                                  to Section 2.4). Because the impact of parameter Kbki on the
and space dimension; thus, the performance of the prediction                                                                                    prediction performance was discussed in Section 2.4, this section
model is significantly improved. However, this method requires                                                                                  focuses on the calibration of parameter abki .
additional introduction of the time-weighted parameter α 1 and                                                                                     Fig. 5 shows the impact of changes in abki on the performance
the space-weighted parameter α 2 in the construction process,                                                                                   of the D-ST-KNN model in different buckets. The trend in Fig. 5
which makes calibration of its parameters and the global optimal                                                                                reveals that the value of abki has a significant influence on the
combination of parameters difficult. We adopt a similar strategy                                                                                prediction performance. For the minimum abki , the prediction
that uses the linear time distribution weight in the time dimen-                                                                                error of the model attains the maximum abki . As abki increases,
sion and the spatial correlation between the surrounding road                                                                                   the prediction error gradually decreases and begins to stabilize.
segments and the target road segment to assign weights in spatial                                                                               We compare the variation of the parameters among the different
dimensions. Then, a dynamic spatiotemporal weight assignment                                                                                    buckets. For example, in bucket 1, the optimal value of abk1 is
method is constructed that does not require any additional pa-                                                                                  0.017, whereas the optimal value of abk2 in bucket 2 is 0.015. The
rameters. The dynamic weight distribution has the lowest MAPE,                                                                                  value of abki also changes dynamically over time. Considering
RMSE and MAE, which reflects the high efficiency of the method                                                                                  that Kbki also changes dynamically with time, the parameters of
compared to that of the other two weight distribution methods.                                                                                  the D-ST-KNN model change with time. The calibration results of
                                                                                                                                                the entire model are listed in Table 2, and the values of Kbki are
                                                                                                                                                shown in Fig. 5. In this analysis, setting the global fixed param-
           17.00                                          0.16                                       0.115
           16.50                                                                                     0.110
                                                          0.15

                                                                                                                                                eters is unreasonable when constructing the prediction model.
           16.00                                                                                     0.105
                                                          0.14                                       0.100
           15.50
 MAPE(%)


                                                                                                                                                We propose the concept of the data bucket, and the prediction
                                                                                                     0.095
                                                   RMSE


                                                                                               MAE


           15.00                                          0.13
                                                                                                     0.090
           14.50

                                                                                                                                                model is constructed in different time periods, which causes the
                                                          0.12                                       0.085
           14.00                                                                                     0.080
                                                          0.11
           13.50

                                                                                                                                                model parameters to change with the time period to adapt to the
                                                                                                     0.075
           13.00                                          0.10                                       0.070


                                                                                                                                                dynamic nature of traffic.
            Origin   Gaussian       Space-Time              Origin   Gaussian   Space-Time             Origin   Gaussian   Space-Time


       Figure 3: Comparison of different distance functions                                                                                                 Table 2: Calibration results of the model parameters
   3.2.2 Determining the optimal predictive function. Based on the
discussion in the previous sections, we transform four types of                                                                                                                                                                           Parameters
                                                                                                                                                                             Bucket
weight distribution methods, including equal weight, inverse                                                                                                                                                                    Kbki                                      abki
distance weight, rank-based weight, and Gaussian weight, which                                                                                                               Bucket 1                                            27                                       0.017
are used to integrate the candidate neighbors to obtain the final                                                                                                            Bucket 2                                            23                                       0.015
predicted value. In the process of cross-validation, we fix the other                                                                                                        Bucket 3                                            28                                       0.011
parameters of the model, such as Kbki and abki , and calculate                                                                                                               Bucket 4                                            18                                       0.017
the influence of different weight distribution methods on the                                                                                                                Bucket 5                                            17                                       0.014
prediction accuracy of the D-ST-KNN model to obtain the average                                                                                                              Bucket 6                                            25                                       0.019
error of the entire test data set for different weight distribution
methods. The results are shown in Fig. 4. The MAPE, RMSE and
MAE of the Gaussian weight method are lower than the MAPE,                                                                                                 15.0
                                                                                                                                                                        Bucket1
                                                                                                                                                                                                              18.5
                                                                                                                                                                                                                           Bucket2
                                                                                                                                                                                                                                                               15.0
                                                                                                                                                                                                                                                                             Bucket3


RMSE and MAE of the other three weight distribution methods.                                                                                               14.0                                                                                                14.5
                                                                                                                                                                                                              18.0
                                                                                                                                                                                                                                                               14.0
                                                                                                                                                MAPE(%)


                                                                                                                                                           13.0

In the D-ST-KNN model, we employ the Gaussian function as
                                                                                                                                                                                                                                                     MAPE(%)


                                                                                                                                                                                                              17.5                                             13.5
                                                                                                                                                                                                    MAPE(%)


                                                                                                                                                           12.0                                                                                                13.0
                                                                                                                                                                                                              17.0
                                                                                                                                                                                                                                                               12.5

the weight distribution method for candidate neighbors.
                                                                                                                                                           11.0                                               16.5                                             12.0
                                                                                                                                                           10.0                                                                                                11.5
                                                                                                                                                                                                              16.0
                                                                                                                                                                                                                                                               11.0
                                                                                                                                                            9.0                                               15.5                                             10.5
                                                                                                                                                                  0   0.01   0.02    0.03    0.04                    0   0.01   0.02   0.03   0.04                    0    0.01   0.02   0.03   0.04
           13.75                                          0.116                                      0.0850                                                                    a                                                  a                                                 a
           13.70                                          0.115
                                                                                                     0.0845
           13.65
                                                                                                                                                                        Bucket4                                            Bucket5                                           Bucket6
 MAPE(%)


                                                          0.114
                                                   RMSE


                                                                                               MAE


           13.60                                                                                     0.0840                                                18.0                                               20.5                                             14.5
                                                          0.113                                                                                            17.5                                               20.0
           13.55                                                                                                                                                                                                                                               14.0
                                                                                                     0.0835                                                17.0                                               19.5                                             13.5
                                                          0.112
                                                                                                                                                                                                    MAPE(%)
                                                                                                                                                 MAPE(%)


                                                                                                                                                                                                                                                     MAPE(%)


           13.50                                                                                                                                           16.5                                               19.0
                                                                                                                                                                                                                                                               13.0
                                                                                                                                                           16.0                                               18.5
           13.45                                          0.111                                      0.0830                                                                                                   18.0                                             12.5
                                                                                                                                                           15.5
             Equal              Inverse distance            Equal           Inverse distance           Equal           Inverse distance                    15.0                                               17.5                                             12.0
                                                                                                                                                           14.5                                               17.0                                             11.5
             Rank-based         Gaussian                    Rank-based      Gaussian                   Rank-based      Gaussian
                                                                                                                                                           14.0                                               16.5                                             11.0
                                                                                                                                                                  0   0.01   0.02   0.03    0.04                     0   0.01   0.02   0.03   0.04                    0    0.01   0.02   0.03   0.04
                                                                                                                                                                               a                                                  a                                                 a


Figure 4: Comparison of different weight allocation meth-
ods.                                                                                                                                            Figure 5: Impact of the weight parameter abki on MAPEs
                                                                                                                                                for different data buckets.
   3.2.3 Calibrating hyper-parameters In the D-ST-KNN model,
the hyper-parameters primarily include the number of candidate
neighbors Kbki and the Gaussian weight parameter abki . In the                                                                                  3.3                   Accuracy evaluation
parameter calibration process, to find the best combination of                                                                                  3.3.1 Overall results. Based on the variable estimation, we compare
Kbki and abki that enables the prediction model to obtain the                                                                                   our model with several existing traffic prediction models, includ-
minimum MAPE, we set the range of Kbki to [1, 40] and the range                                                                                 ing the historical average model (HA), Elman neural network
of abki to [0.001, 0.04]. We apply the cross-validation method to                                                                               (Elman-NN) [9], traditional KNN model (Original-KNN), and spa-
obtain the optimal combination of the parameters for each bucket.                                                                               tiotemporal KNN model (ST-KNN). Fig. 6 shows the prediction
The effect of parameter variation on the prediction accuracy of                                                                                 performance of different models. The HA model, the Elman-NN
the D-ST-KNN model can be tested by fixing other parameters of                                                                                  model, and the Original-KNN model regard the problem of the
the model. For example, we can fix the values of abki and test the                                                                              traffic prediction as a simple time series problem and disregard


                                                                                                                                          138
the influence of the spatial factors on the predicted road segment.                                                     31
                                                                                                                        29
Therefore, their prediction performance is lower than the predic-                                                       27
tion performance of the ST-KNN model and the D-ST-KNN model                                                             25
proposed in this paper by comparing the values of MAPE. The ST-                                                         23


                                                                                                              MAPE(%)
KNN model introduces the spatiotemporal state matrix, which                                                             21
                                                                                                                        19
improves the prediction performance of the model. However,                                                              17
this matrix ignores the spatial heterogeneity and the temporal                                                          15
non-stationarity of the road network and cannot describe the                                                            13
                                                                                                                        11
essential characteristics of the traffic dynamics using a static                                                         9
ST- KNN model (including global fixed spatiotemporal matrix                                                                   Bucket1 Bucket2 Bucket3 Bucket4 Bucket5 Bucket6
and global fixed parameters). The D-ST-KNN model constructs
models for different time periods by introducing the concept                                                             HA     Elman-NN                       Original-KNN        ST-KNN              D-ST-KNN
of data buckets. Simultaneously, the dynamic space neighbor,
dynamic time window, dynamic spatiotemporal weight, and dy-                                    Figure 7: Accuracy comparison of the D-ST-KNN model on
namic spatiotemporal parameters are introduced to construct the                                MAPEs for different data buckets.
D-ST-KNN model, which can adequately adapt to the dynamic
changes of traffic conditions. The experimental results indicate
that the D-ST-KNN model proposed in this paper is superior to
                                                                                               3.4                       Generalization ability evaluation
other models.                                                                                  To evaluate the generalization ability of the D-ST-KNN model,
                                                                                               we fix all parameters of the model and compare the performance
           21   HA                    0.22   HA                   0.125   HA                   of the different methods with the test data set from PeMS; the
                                                                                               experimental results are shown in Fig. 8. The results indicate
                Elman-NN                     Elman-NN             0.120   Elman-NN
           20
                Original-KNN          0.20   Original-KNN                 Original-KNN
                                                                  0.115

                                                                                               that the prediction accuracy of the D-ST-KNN model on the
           19   ST-KNN                       ST-KNN                       ST-KNN
                                      0.18                        0.110
           18   D-ST-KNN                     D-ST-KNN                     D-ST-KNN
 MAPE(%)


                                                                                               PeMS data set is significantly improved compared with that of
                                                                  0.105
                               RMSE


                                                            MAE


           17                         0.16
                                                                  0.100

                                                                                               the Beijing floating car data set. The data quality of the PeMS
           16
                                      0.14                        0.095
           15                                                     0.090

                                                                                               data set is relatively complete, and the data collection area is
                                      0.12
           14                                                     0.085


                                                                                               the expressway. Compared with the traffic conditions of the
           13                         0.10                        0.080


                                                                                               urban road network, the traffic mode is relatively simple with
Figure 6: Accuracy comparison of different models in the
                                                                                               minimal changes, which enables the prediction model to easily
Beijing data set.
                                                                                               represent the regular traffic pattern characteristics. However, the
                                                                                               D-ST-KNN model maintains the same prediction trend; in all
   3.3.2 Local results. To further evaluate the performance of the                             predicted models, its MAPE, RMSE, and MAE are lower than the
D-ST-KNN model, we compare the MAPEs of different models in                                    other models, which exhibit excellent predictive performance
different data buckets by averaging the prediction performance                                 and generalization ability.
of different road segments in a single bucket. The experimen-
tal results are displayed in Fig. 7. In terms of overall trends, the
                                                                                                          8                       HA                    0.08              HA                   0.070         HA
                                                                                                                                  Elman-NN                                Elman-NN                           Elman-NN
                                                                                                          7                       Original-KNN          0.07              Original-KNN                       Original-KNN

performance of different models corresponds to the degree of con-
                                                                                                                                                                                               0.060
                                                                                                                                  ST-KNN                                  ST-KNN                             ST-KNN
                                                                                                          6                                             0.06              D-ST-KNN
                                                                                                                                  D-ST-KNN                                                     0.050         D-ST-KNN

gestion of the traffic conditions. For example, in bucket 1, bucket
                                                                                                MAPE(%)


                                                                                                          5                                             0.05
                                                                                                                                                 RMSE


                                                                                                                                                                                         MAE


                                                                                                                                                                                               0.040

3, and bucket 6, all models have a lower MAPE than other buck-
                                                                                                          4                                             0.04
                                                                                                                                                                                               0.030
                                                                                                          3                                             0.03

ets because the time periods that correspond to the three data                                            2                                             0.02                                   0.020


buckets are midnight-6:30 am, 10:00-13:30, and 20:30-midnight.                                            1                                             0.01                                   0.010


The traffic in Beijing during these three time periods belongs
to the flat peak period, and road traffic has low congestion and                               Figure 8: Performance comparison of different models in
exhibits regular changes. In buckets 2, 4, and 5, all models achieve                           the PeMS data set.
a relatively poor performance. Bucket 2 corresponds to the time
period of 6:30-10:00, bucket 4 corresponds to the time period
of 13:30-17:00 and bucket 5 corresponds to the time period of                                  4                   SUMMARY AND FUTURE WORK
17:00-20:30. These time periods correspond to the peak period in                               In this paper, we propose a D-ST-KNN model for short-term traffic
Beijing. The changes of traffic conditions during these time peri-                             prediction. The proposed model considers the spatial heterogene-
ods are more complicated than the traffic conditions of the other                              ity and temporal non-stationarity of road networks to adapt to
buckets. In addition, in terms of the performance of different                                 the dynamic characteristics of traffic, including dynamic spatial
models in a single data bucket, the prediction trend of different                              neighbors, time windows, spatiotemporal weights, and spatiotem-
models was similar to those of the overall results. For example,                               poral parameters. With cross-correlation and autocorrelation
in bucket 1, the ST-KNN and D-ST-KNN models perform better                                     function computation, the automatic selections of spatial neigh-
than the HA, Elman-NN, and Original-KNN models, which is due                                   bors and the time window are realized, which efficiently solve
to the benefits of the introduction of spatial factors. However,                               the dimensionality disaster problem encountered in the existing
the D-ST-KNN model considers the spatial heterogeneity and                                     KNN models. The spatiotemporal weights are integrated into
temporal non-stationarity of road networks to adapt to the dy-                                 a distance function to help identify candidate neighbors. Time
namic characteristics of traffic, making the model performance                                 variable parameters are also introduced, including the dynamic
better than other models in all time periods, especially in the peak                           number of candidate neighbors and dynamic weight allocation
period. This also explains why the D-ST-KNN model is superior                                  parameters, to further adapt to the dynamic and heterogeneous
to the other models in the overall result.                                                     nature of road networks.


                                                                                         139
   Using real traffic data collected from city roads and inter-city                            [11] Kamran Ghavamifar. 2009. A decision support system for project delivery
expressways, we calculate the number of spatial neighbors and                                       method selection in the transit industry. (2009).
                                                                                               [12] Filmon G. Habtemichael and Mecit Cetin. 2016. Short-term traffic flow rate
the time window size of each road segments, which reflects the                                      forecasting based on identifying similar traffic patterns. Transportation Re-
distinct heterogeneity and non-stationarity of urban road traffic.                                  search Part C 66 (2016), 61–78.
                                                                                               [13] Filmon G. Habtemichael, Mecit Cetin, and Khairul A. Anuar. 2015. METHOD-
Then, we validate the performance of the proposed D-ST-KNN                                          OLOGY FOR QUANTIFYING INCIDENT-INDUCED DELAYS ON FREEWAYS
model with comparisons to HA, Elman-NN, traditional KNN and                                         BY GROUPING SIMILAR TRAFFIC PATTERNS. Transportation Research Record
spatiotemporal KNN models. The experimental results indicate                                        Journal of the Transportation Research Board (2015).
                                                                                               [14] James Douglas Hamilton. 1994. Time series analysis. 401–409 pages.
that the D-ST-KNN model has a higher accuracy on short-term                                    [15] Haikun Hong, Wenhao Huang, Xiabing Zhou, Sizhen Du, Kaigui Bian, and
traffic prediction than the existing models. In addition, we ex-                                    Kunqing Xie. 2016. Short-term traffic flow forecasting: Multi-metric KNN
plore the local performance of different models in different data                                   with related station discovery. In International Conference on Fuzzy Systems
                                                                                                    and Knowledge Discovery. 1670–1675.
buckets and find that all models correspond to the degree of traf-                             [16] Xiaoyu Hou, Yisheng Wang, and Siyu Hu. 2013. Short-term Traffic Flow
fic congestion, and the D-ST-KNN model performs better than                                         Forecasting based on Two-tier K-nearest Neighbor Algorithm ąî. Procedia -
                                                                                                    Social and Behavioral Sciences 96 (2013), 2529–2536.
other models in all time periods, especially in the morning pe-                                [17] Myung Jiwon, Dong Kyu Kim, Seung Young Kho, and Chang Ho Park. 2011.
riod and evening peak period. To summarize, compared with the                                       Travel Time Prediction Using k Nearest Neighbor Method with Combined
existing models, the proposed D-ST-KNN model significantly im-                                      Data from Vehicle Detector System and Automatic Toll Collection System.
                                                                                                    Transportation Research Record Journal of the Transportation Research Board
proves the accuracy of short-term traffic prediction. Furthermore,                                  20, 2256 (2011), 51–59.
we compare the performance of different models using the ac-                                   [18] Arief Koesdwiady, Ridha Soua, and Fakhreddine Karray. 2016. Improving
tual traffic data collected from PeMS. The D-ST-KNN model also                                      Traffic Flow Prediction With Weather Information in Connected Cars: A Deep
                                                                                                    Learning Approach. IEEE Transactions on Vehicular Technology 65, 12 (2016),
achieves the best performance, which verifies the generalization                                    9508–9517.
ability of the proposed model.                                                                 [19] Shuangshuang Li, Zhen Shen, and Gang Xiong. 2012. A k-nearest neighbor
                                                                                                    locally weighted regression method for short-term traffic flow forecasting. In
   In the follow-up study, the following problems need to be                                        International IEEE Conference on Intelligent Transportation Systems. 1596–1601.
investigated to further improve the D-ST-KNN model. The D-                                     [20] X. Ma, Z. Dai, Z. He, J. Ma, Y. Wang, and Y. Wang. 2017. Learning Traffic as
ST-KNN model behaves slightly differently in peak and off-peak                                      Images: A Deep Convolutional Neural Network for Large-Scale Transportation
                                                                                                    Network Speed Prediction. Sensors 17, 4 (2017).
time periods. Further improvement of the model performance                                     [21] Wanli Min and Laura Wynter. 2011. Real-time road traffic prediction with
during peak hours will be a constant challenge. Moreover, a multi-                                  spatio-temporal correlations. Transportation Research Part C: Emerging Tech-
threaded approach could be used to improve the efficiency of                                        nologies 19, 4 (2011), 606–616.
                                                                                               [22] Brian Lee Smith. 1997. Forecasting freeway traffic flow for intelligent trans-
D-ST-KNN. A parallel P-D-ST-KNN model on an existing parallel                                       portation systems application. Transportation Research Part A 1, 31 (1997),
computing framework is expected to alleviate the pressure of                                        61.
                                                                                               [23] Brian L Smith, Billy M Williams, and R Keith Oswald. 2002. Comparison of
real-time computation.                                                                              parametric and nonparametric models for traffic flow forecasting. Transporta-
                                                                                                    tion Research Part C: Emerging Technologies 10, 4 (2002), 303–321.
5     ACKNOWLEDGMENTS                                                                          [24] Anthony Stathopoulos and Matthew G. Karlaftis. 2003. A multivariate state
                                                                                                    space approach for urban traffic flow modeling and prediction. Transportation
This research is supported by the Key Research Program of the                                       Research Part C Emerging Technologies 11, 2 (2003), 121–135.
Chinese Academy of Sciences (Grant No. ZDRW-ZS-2016-6-3)                                       [25] Eleni I. Vlahogianni, Matthew G. Karlaftis, and John C. Golias. 2007. Spatio-
                                                                                                    Temporal Short-Term Urban Traffic Volume Forecasting Using Genetically
and the State Key Research Development Program of China                                             Optimized Modular Networks. Computer-Aided Civil and Infrastructure Engi-
(Grant No. 2016YFB0502104). Their supports are gratefully ac-                                       neering 22, 5 (2007), 317ĺC325.
knowledged. And we also thank the anonymous referees for their                                 [26] Shanhua Wu, Zhongzhen Yang, Xiaocong Zhu, and Bin Yu. 2014. Improved k-
                                                                                                    nn for Short-Term Traffic Forecasting Using Temporal and Spatial Information.
helpful comments and suggestions.                                                                   Journal of Transportation Engineering 140, 7 (2014), 04014026.
                                                                                               [27] Dawen Xia, Binfeng Wang, Huaqing Li, Yantao Li, and Zili Zhang. 2016. A
REFERENCES                                                                                          distributed spatialĺCtemporal weighted model on MapReduce for short-term
                                                                                                    traffic flow forecasting. Neurocomputing 179, C (2016), 246–263.
 [1] J Scott Armstrong. 2006. Findings from evidence-based forecasting: Methods                [28] Bin Yu, Xiaolin Song, Feng Guan, Zhiming Yang, and Baozhen Yao. 2016. k-
     for reducing forecast error. International Journal of Forecasting 22, 3 (2006),                Nearest neighbor model for multiple-time-step prediction of short-term traffic
     583–598.                                                                                       condition. Journal of Transportation Engineering 142, 6 (2016), 04016018.
 [2] Brenda I. Bustillos and Yi Chang Chiu. 2011. Real-Time Freeway-Experienced                [29] Yang Yue and Anthony Gar-On Yeh. 2008. Spatiotemporal traffic-flow de-
     Travel Time Prediction Using N -Curve and k Nearest Neighbor Methods.                          pendency and short-term traffic forecasting. Environment and Planning B:
     Transportation Research Record Journal of the Transportation Research Board                    Planning and Design 35, 5 (2008), 762–771.
     2243, -1 (2011), 127–137.                                                                 [30] Lun Zhang, Qian Rao, Wenchen Yang, and Meng Zhang. 2013. An Improved
 [3] Pinlong Cai, Yunpeng Wang, Guangquan Lu, Peng Chen, Chuan Ding, and                            k-NN Nonparametric Regression-Based Short-Term Traffic Flow Forecasting
     Jianping Sun. 2016. A spatiotemporal correlative k -nearest neighbor model                     Model for Urban Expressways. In International Conference on Transportation
     for short-term traffic multistep forecasting. Transportation Research Part C                   Engineering. 1214–1223.
     Emerging Technologies 62 (2016), 21–34.                                                   [31] Zuduo Zheng and Dongcai Su. 2014. Short-term traffic volume forecasting:
 [4] H. Chang, Y. Lee, B. Yoon, and S. Baek. 2012. Dynamic near-term traffic flow                   A k -nearest neighbor approach enhanced by constrained linearly sewing
     prediction: systemoriented approach based on past experiences. Iet Intelligent                 principle component algorithm. Transportation Research Part C Emerging
     Transport Systems 6, 3 (2012), 292–305.                                                        Technologies 43 (2014), 143–157.
 [5] Tao Cheng, James Haworth, and Jiaqiu Wang. 2012. Spatio-temporal autocor-
     relation of road network data. Journal of Geographical Systems 14, 4 (2012),
     389–413.
 [6] Tao Cheng, Jiaqiu Wang, James Haworth, Benjamin Heydecker, and Andy
     Chow. 2014. A Dynamic Spatial Weight Matrix and Localized SpaceĺCTime Au-
     toregressive Integrated Moving Average for Network Modeling. Geographical
     Analysis 46, 1 (2014), 75ĺC97.
 [7] Stephen Clark. 2003. Traffic Prediction Using Multivariate Nonparametric
     Regression. Journal of Transportation Engineering 129, 2 (2003), 161–168.
 [8] Peibo Duan, Guoqiang Mao, Shangbo Wang, Changsheng Zhang, and Bin
     Zhang. 2016. STARIMA-based Traffic Prediction with Time-varying Lags.
     (2016).
 [9] Jeffrey L Elman. 1990. Finding structure in time. Cognitive Science 14, 2 (1990),
     179–211.
[10] Gaetano Fusco, Chiara Colombaroni, and Natalia Isaenko. 2016. Short-term
     speed predictions exploiting big data on large urban road networks. Trans-
     portation Research Part C: Emerging Technologies 73 (2016), 183–201.


                                                                                         140