1. Introduction

Short-term tra c flow forecasting using a distributed spatial- temporal model

A A Agafonov

A S Yumaganov

0 0 Samara National Research University , Moskovskoye shosse 34, Samara, Russia, 443086

2018

402 409

In this paper, we consider the problem of short-term trac flow prediction. We propose a distributed model for short-term trca flow forecasting based otnhe k nearest neighbors method, that takes into accountspatial and temporal tra c flow distribution. To consider spatial-temporalcorrelations, we partition a transportationgraph in clusters by an area and describe tra c flow by a feature vector defined for eachcluster. The proposed model is implemented as a MapReduce based algorithm on an Apache Spark framework. The proposed tra c flow predictionmodel is tested usingthe actual average tra c speed data over aroad network in Samara, Russia.

1. Introduction

Issues related to the tra c ow management are common in every major city around the world. Tra c congestion leads to economic, environmental and social problems, which emphasizes the importance of the transport planning and logistics. To solve these problems, it is important to obtain accurate and timely tra c ow information. Due to this fact, road tra c forecasting has been a subject of active research for more than 40 years.

E orts devoted to mitigate the tra c congestion problem are usually classi ed in three directions: modi cation of the transport infrastructure, improving the operational quality of the public transport and managing tra c ows. The rst and the second directions are often limited by economic or social factors, while the tra c ow management has been continuously improving due to the development of tra c data collecting and processing technologies.

Recently, much attention has been paid to the data-driven programming paradigm. This interest is explained by the development of new technologies, methods and techniques for massive data processing within the Big Data concept, the availability of multiple data sources for predicting tra c ows, and the "open data" idea, that some data should be freely available to everyone to use, without restrictions from copyright, patents or other mechanisms of control.

Short-term tra c ow forecasting considers the tra c prediction problem on the basis of current and archived information about the tra c ows state. A review of the latest achievements in the road tra c forecasting eld, as well as the main unresolved technical challenges, can be found in [ 1 ]. Most research on this topic has focused on developing methods for modeling the characteristics of tra c ows (for example, density or speed). An overview of the methods of short-term tra c forecasting presented in the [ 2 ]. These methods can be classi ed into three categories:

1) Parametric methods [ 2, 3 ], including time series models [ 4 ], state space models, etc. 2) Non-parametric methods [ 5 ], including models of arti cial neural networks [ 6 ], k-nearest neighbor (kNN) [ 7 ], support vector regression (SVR) [ 8 ]. 3) Hybrid methods that combine parametric and non-parametric methods [ 9, 10 ].

However, these methods have both advantages and disadvantages when working under di erent conditions using di erent datasets, so it is hard to conclude that one method signi cantly superior others.

In this paper, we propose an approach based on the k nearest neighbor algorithm - one of the main non-parametric techniques for short-term tra c ow prediction. Results presented in [ 5, 11, 12 ] showed that kNN outperformed other modern comparable models, including ANN, SARIMA, random forest, and Nave Bayes. However, if the sample data size is too large, kNN may not be suitable for real-time prediction due to the computational costs. Despite this issue, a relatively small number of works are devoted to the short-term tra c ow forecasting with a focus on processing big tra c data using the distributed computations, in particular, using the MapReduce framework [ 12, 13 ].

In this article, we consider a problem of short-term tra c ow forecasting for 10 minutes ahead. We focus on developing a distributed forecasting model based on the weighted kNN algorithm, taking into account the spatial and temporal characteristics of the transport ows in the spatially compact area of a transport network. For distributed data processing, we use MapReduce processing model implemented in the open source cluster-computing framework Apache Spark. Experimental analysis on real-world tra c data sets allows us to conclude that the proposed model has a high prediction accuracy and reasonable execution time, su cient for real-time prediction.

The paper is organized as follows. Section 2 contains the formulation of the problem. The proposed model and its distributed implementation are described in detail, respectively, in Sections 3 and 4. In section 5, we provide experimental results of the proposed model and verify the accuracy of the proposed approach. Finally, we conclude the paper, and then present possible directions for further research.

2. Problem formulation

A road network is considered as a directed graph G = (V; E)), with nodes V; NV = jV j representing the road intersections and edges E; NE = jEj denoting road segments.

Let Vtj denotes an observed tra c ow characteristic on an edge j 2 E at time interval t. As a tra c ow characteristic can be used travel time, average speed, density or ow.

In this work as a predicted tra c ow characteristic for the experimental study, we use the average tra c speed.

The short-term tra c ow forecasting problem can be formulated as follows: given a graph G(V; E) and sequence fVtj g; j 2 E; t = 1; 2; : : : T of observed tra c ow data, predict the tra c ow characteristic V^tj+ ; j 2 E at time interval (t + ) for a prede ned prediction horizon .

3. Proposed model

In this paper, we propose a short-term tra c ow forecasting model based on non-parametric regression k-nearest neighbors algorithm. To apply the kNN method to the tra c ow prediction problem, it is necessary to solve the following tasks: 1. De ne a feature vector to describe tra c ow. 2. De ne a suitable distance metric to determine the proximity between a feature vector describing current tra c ow characteristics and feature vectors describing historical tra c ow observations. where m0;m1

0 xmin +

1 xmin + m0 M0 m0 M0

0 xmax

1 xmax xmin ; x0min +

0 xmin ; x1min + 1 m0 + 1 m0 + 1

M0 M0

0 xmax

1 xmax

0 xmin

1 xmin ; 3. De ne a prediction function to forecast a tra c ow characteristic by selected nearest neighbors.

These challenges are described in the next subsections.

3.1. Feature vector

The choice of a feature vector in the kNN method depends on the particular application of the method in practice. To solve the tra c ow prediction problem, it is reasonable to use a feature vector that takes into account spatial and temporal correlations of the tra c ow characteristics.

In the paper [ 12 ] as a feature vector authors used tra c ow of targeted road segment j, downstream road segment j 1 and upstream road segment j + 1 for T time intervals: (Vtj T ; : : : ; Vtj 1; Vtj ; Vtj T1; : : : ; Vtj 11; Vtj 1; Vtj+T1; : : : ; Vtj+11; Vtj+1)

However, such feature vector does not consider tra c ow on adjacent segments. In addition, in some cases, the upstream / downstream road segment cannot be uniquely determined. Therefore, to describe tra c ow, it is proposed to use a feature vector that taking into account the tra c ow characteristics in the spatially-compact cluster of the transport network graph.

In this paper, we de ne the feature vector as follows: 1. The transportation network graph is partitioned into several spatially compact clusters fGig. In each cluster i the feature vector is de ned as follows: fVtj gi; j 2 Gi; t = tcur

T; : : : ; tcur 2. For the de ned feature vector fV gi in the cluster i dimension reduction is performed using principal component analysis procedure. Result of this procedure is a new feature vector fXngi; n = 1; : : : ; N . 3. Proposed feature vector for each road segment j 2 E is de ned from the initial feature vector of the targeted road segment j and the feature vector of the cluster i such that j 2 Gi:

Sj = (fVtj g; fXngi); j 2 Gi; t = tcur

T; : : : ; tcur; n = 1; : : : ; N: (3)

Graph partitioning algorithm is described in the next subsection.

3.2. Graph partitioning

Let each edge i 2 E corresponding to the road segment ei with two terminal points xistart = xs0tart; xs1tart i and xiend = xe0nd; xe1nd i.

Then graph partitioning by an area can be described as follows: 1. Choose the numbers of clusters M0; M1. 2. The cluster Gm with index m = m0M1 + m1; m0 = 0; M0 1; m1 = 0; M1 1 contains the edges i 2 E, for which coordinates of at least one of the corresponding terminal points are inside the corresponding rectangular area m0;m1 :

Gm0M1+m1 i 2 E : xistart 2

i m0;m1 _ xend 2 m0;m1 ; (1) (2) (4)

s xmin =

min v=fstart;endg i2E xsv;i;

s xmax =

max v=fstart;endg i2E xsv;i; s = 0; 1:

The number of clusters along the vertical and horizontal axis M0; M1 is chosen empirically. We assume, that each edge of the graph can get into only one cluster.

3.3. Proximity measure

To de ne the proximity between the feature vectors, it is necessary to determine a suitable distance metric. Di erent distance functions between feature vectors are available in the literature, including Euclidean, Mahalanobis, Hamming distance.

In this paper, we use a weighted Euclidean distance, modi ed to use the feature vector describing transportation network clusters. The distance is considered separately for parts of the feature vector describing tra c ows on the current segment fV g and in the corresponding cluster fXg.

vu T d(S; Si) = tuX t=1

T t+1 Vt

Vti 2 +

Xni 2: vu N uX t n=1 where 0 < 1, T denotes the total number of time intervals in the feature vector, N denotes the total number of elements in the feature vector describing the graph cluster, S is the feature vector describing current tra c ow, Si is the feature vector describing ith historical tra c ow, Vt; Vti are the feature vectors values representing respectively current and historical tra c ows on the selected road segment at time interval t, Xn; Xni are the nth feature vectors values representing respectively current and historical tra c ows in the graph cluster.

3.4. Prediction function

The traditional approach for estimating the value in k-NN regression is to choose the average or the weighted average of the values of its k nearest neighbors [ 5 ].

A prediction function by the average has the following form:

K X^T +1 = k1 X XTk+1

k=1

K X^T +1 = X k=1 dk 1 k=1 dk 1 XTk+1 PK (5) (6) (7) where X^T +1 is the predicted tra c ow value at the next time interval T + 1, XTk+1 is the tra c ow value of the kth nearest neighbor at the time interval T + 1, K is the total number of the neighbors.

A prediction function by the weighted average has the following form: where dk denotes the distance between the feature vector describing the current tra c data and the kth nearest neighbors.

We use the prediction function by the weighted average.

4. MapReduce implementation

The proposed model of tra c ow prediction uses a large amount of current and historical tra c ow data. To improve the e ciency of the proposed model, we implement it on the basis of MapReduce model [ 14 ] for distributed computing using Apache Spark engine [ 15 ].

MapReduce provides parallel processing of big amount of data in computing clusters. MapReduce model usually consists of three main steps: Map, Shu e and Reduce. Figure 1 illustrates a computation owchart of the proposed model based on MapReduce engine. Input data

Preprocessing phase

Map phase

Shuffle phase Reduce phase

Output data Training data

Split data Testing data

Split data train_split_0 train_split_1

... train_split_n test_split_0 test_split_1

... test_split_k

Cartesian join ttreasitn__sspplilti_t_00 proMcaepdure Sort ttreasitn__sspplilti_t_01 proMcaepdure Sort

... ttreasitn__sspplilti_t_kn-1-1 proMcaepdure Sort ttreasitn__sspplilti_t_kn proMcaepdure Sort key1:local_top_list1 key1:local_top_list2 ...

Reduce procedure

As illustrated in Figure 1, the rst step is a preparation of input data for Map phase. At rst, the historical and test data are divided into partitions. The optimal number of such partitions depends on the amount of processed data and the number of computing nodes. As mentioned in o cial Apache Spark documentation, the recommended value of partitions is 2-3 partition per CPU core in the cluster. Then, ordered pairs of historical and test data partitions are formed using the Cartesian product. Next, in the Map phase, a map function is applied to each pair of partitions. This function returns an intermediate set of key / value pairs - the test element / local list of k nearest neighbors. At the Shu e phase, the key-value pairs are grouped and transferred to the reduce functions. At the nal Reduce step for each test data element, the set of local k nearest neighbors lists is converted to the resulting (global) list of k nearest neighbors. The resulting lists of k nearest neighbors are subsequently used to nd the predicted value tra c ow.

The results of evaluating the e ciency of the proposed model based on the MapReduce concept are presented in Section 5.

5. Experiments

In this work, in the experimental study, we predict average tra c speed in the city of Samara, Russia for short-term prediction horizon 10 minutes. The dataset contains records for 34 days. We compare the proposed model with the model described in [ 12 ]. This model uses feature vector in form (1), where the feature vector considers time domain and upstream / downstream road segments (denoted below as "TDUD"). Our model we denote as "Clusters" because the feature vector considers spatial-temporal correlations in graph clusters.

During testing, these models are performed on each day (test set) and the remaining days considered as a historical dataset (training set). Then the average performance across the full data set is calculated.

We conduct the experiments on an Apache Spark cluster. The tra c ow was predicted for a small area contained 698 road segments (Figure 2). Each road segment is considered as two edges with di erent directions. The total size of the dataset was 3.5 GB.

To compare the performance of the proposed model, we use two standard metrics: mean absolute error (MAE) and mean absolute percentage error (MAPE) that can be formulated as: n MAPE = 1 X n t=1

jVt n MAE = 1 X jVt n t=1

Vt ^ Vtj ^ Vtj (8) (9) where Vt is the actual value of tra c ow at time interval t, V^t is the predicted value for the same time interval t, n is the total number of tra c ow observations.

6. Conclusion

The paper presents the distributed spatial-temporal model of short-term traffic forecasting based on the method of non-parametric regression k nearest neighbors. In the model, spatial and temporal characteristics of the transport flow in a compact cluster of the transport network are taken into account for the feature space description.

For distributed Big Data processing, we use MapReduce processing model implemented in the open source cluster-computing framework Apache Spark. Experimental analysis on real-world traffic data sets allows us to conclude that the proposed model has a high prediction accuracy and reasonable execution time, sufficient for real-time prediction.

Day Day

Clusters TDUD Clusters TDUD 3.6 3.4 3.2 EA3.0 M 2.8 2.6

The possible direction of further research including dataset ltering for weekday / weekends tra c data and development of graph partitioning algorithms based on the tra c ow characteristics during a speci c time period.

Acknowledgments

This work was supported by the Russian Foundation for Basic Research (RFBR) grant 18-0700605, grant 18-29-03135.

[1] Lana

, Del Ser

, Velez

and Vlahogianni

2018 Road traffic forecasting: Recent advances and new challenges IEEE Intelligent Transportation Systems Magazine 10 93 - 109

[2] Vlahogianni

, Golias

and Karlaftis

M 2004

Short-term tra c forecasting: Overview of objectives andmethods

Transport Reviews 24 533 - 557

[3] Karlaftis

and Vlahogianni E 2011 Statistical methods versus neural networks in transportation research: Differences, similarities and some insights Transportation Research Part C: Emerging Technologies 19 387 - 399

[4] Shekhar

and Williams

B 2007

Adaptive seasonal time series models for forecasting short-term traffic flow

Transportation Research Record 116-125

[5] Smith

, Williams

and Keith Oswald R 2002 Comparison of parametric and nonparametric models for traffic flow forecasting Transportation Research Part C: Emerging Technologies 10 303 - 321

[6] Yin

, Wong

, Xu

and Wong C 2002

Urban traffic flow prediction using a fuzzy-neural approach

Transportation Research Part C: Emerging Technologies 10 85 - 98

[7] Zheng

and Su

D 2014

Short-term tra c volume forecasting: A k-nearest neighbor approach enhanced by constrained linearly sewing principle component algorithm Transportation Research Part C: Emerging Technologies 43 143 - 157

[8] Wu

C H

, Ho J M and Lee D 2004 Travel -time prediction with support vector regression IEEE Transactions on Intelligent Transportation Systems 5 276 - 281

[9] Sun

and Zhang C 2007

The selective random subspace predictor for tra c ow forecasting

IEEETransactions on Intelligent Transportation Systems 8 367 - 373

[10] Agafonov

and Myasnikov

V 2015

Tra c ow forecasting algorithm based on combination of adaptive elementary predictors

Communications in Computer and Information Science 542 163 - 174

[11] Smith

and Demetsky M 1997 Tra

c ow forecasting: Comparison of modeling approaches

Journal ofTransportation Engineering 123 261 - 266

[12] Xia

, Wang

, Li

and Zhang Z 2016

A distributed spatial-temporal weighted model on mapreduce for short-term tra c ow forecasting

Neurocomputing 179 246 - 261

[13] Lv

, Duan

, Kang

, Li

and Wang F Y 2015

Tra c ow prediction with big data: A deep learningapproach

IEEE Transactions on Intelligent Transportation Systems 16 865 - 873

[14] Dean

and Ghemawat S 2008 Mapreduce: Simpli ed data processing on large clusters Communications of the ACM 51 107{113

[15] ApacheSpark 2018 (Access mode: https://spark .apache.org/)