Introduction

Structural Optimization of the Travel Time Prediction Model Based on Hierarchical Regression

Anton Agafonov

ant.agafonov@gmail.com 0 1

Vladislav Myasnikov

vmyas@geosamara.ru 0 1 0 Image Processing Systems Institute of the Russian Academy of Sciences (IPSI RAS) , Samara , Russia 1 Samara State Aerospace University (SSAU) , Samara , Russia

In this paper we consider a problem of public transport arrival time prediction for a large city in real time. We investigate the algorithm based on a model of an adaptive combination of elementary prediction algorithms. Adaptability means that parameters of the constructed combination depend on a number of control parameters of the model. We compare our model with the nonlinear arti cial neural network model. Comparison is performed on data of the public transport network in Samara, Russia.

travel time prediction combination of algorithms neural network hierarchical regression

Introduction

Geoinformation systems ans intelligent transportation systems (ITS) based on automated vehicle location data are widely used by public transportation companies around the world. ITS can be used for the improvement of public transport reliability using di erent operational planning and control strategies. Such strategies can be used to improve the public transport network design, evaluate the schedule reliability and improve schedule planning [ 1 ].

Travel time prediction problem is one of the most common transportation problems. Travel time prediction can be used for solving many other problems in di erent contexts, including management, navigation, planning, monitoring and control problem, mentioned early.

There are much research focused on the travel time prediction problem. All approaches can be divided into the four following categories [ 1 ]. 1. Historical data based models. These approaches predict travel time in a certain period by the averaged over historical values during the same period in the previous days. The simplicity is the main drawback of these models, because models ignore both current road situation (such as congestions or accidents) and complex relationship between travel times and other variables in urban public transport networks. 2. Tra c theory based models [ 2 ]. These models perform travel time prediction based on tra c ow / tra c density estimations. The requirement of an actual tra c ow model is the main drawback of these models. 3. Time-series models [ 3 ]. These approaches assume that the future travel time depends only on the most recent data samples. Accuracy of models decreases for a longer forecasting horizon. 4. Machine learning and regression methods. These methods construct a regression function depending on a set of independent variables, which may include real-time and historical data of the road links travel times, road situation, passenger, weather situation, delays at stops, etc [ 4 ]. Over the last two decades, regression models have been the state of the art for this kind of applications. Arti cial neural networks (ANN) are widely used for the travel time prediction [ 5, 6 ], can represent complex nonlinear relationships between the target travel time value and the set of independent variables. Support vector regression (SVR) is used for travel time prediction in [ 7, 8 ]. Nonlinear regressions methods (such as ANN or SVR) has several drawbacks compared with other regression methods, including high training computational complexity and possibility for over tting.

In this paper we investigate the adaptive model proposed in [ 9 ]. This model is based on adaptive combination of the elementary prediction algorithms. The model uses hierarchical regression for representing complex nonlinear relationships between the target travel time value and the set of independent variables. Main purposes of this paper are the structural optimisation of the hierarchical regression based model and the comparison of the proposed model with an arti cial neural network model.

The structure of this paper is organized as follows: the second section contains a problem statement. The third section contains a description of the base model. The next section describes modi ed structure of the base model, used elementary predictors and aggregated function. In the fth section we present an experimental study of the algorithm. Lastly we present our conclusions and recommendations for further work. 2

Problem statement

We use the following notations: { w is the road link in a transport network; { j is the index of the speci ed vehicle; { s is the transport type (bus, tram, trolley), S is the set of transport types; { j 2 [0; 1] is the relative position of the vehicle j on the road link; { tc is the current time instant (at which predictions are calculated).

The position of any vehicle in the transport network we de ne by the pair (w; ).

Let the vehicle j at the time tc be in the position (w0; 0). We should estimate the arrival time, after which the vehicle be at the stop with position (wK ; K ).

Let the route between two positions (w0; 0) and (wK ; K ) includes the following edges (w0; 0) ! w1 ! w2 ! ! wK 1 ! (wK ; K ). Let Tjw (tc; t) be the travel time along the edge w starting at the time t, assuming that the prediction is calculated at the time tc.

Assuming, that the vehicle j moves monotonically on each road link, the predictive arrival time to the stop (wK ; K ) , calculated at the time tc, can be written as follows:

T (w0; 0)!(wK; K)(tc) = (1 j

0)Tjw0 (tc; tc) k=1

K 1 + X Tjwk tc; tc + Tj(w0; 0)!(wk;0)(tc) +

K TjwK tc; tc + T (w0; 0)!(wK;0)(tc) : j (1) (2) As it can be seen from the above expression, its main components are terms Tjwk ( ) that characterize the travel time of the road links.

For simpli cation, further we consider the travel time prediction problem at the speci ed edge wk and the speci ed time t:

T w (tc; t) :

In next section we brie y describe our prediction model proposed in [ 9 ]. 3

The base model

Calculation of prediction (2) can be performed using di erent information usually established in urban public transport networks. Firstly we describe the different travel time estimations and control parameters that a ect the predictive travel time value (2): 9. 10. 1. nT w(s)(t)o is the averaged historical travel time of the edge w by the vehis cles of the transport type s; 2. Tlwast(tc) is the most recent travel time of the link w; 3. T0w(t) is the normative travel time by the planning schedule; 4. Tvw(j)(tc) is the estimated travel time by the mean speed of the speci ed vehicle; 5. f(t tj ; Tj (tc))gjw is the set of precedents containing time instants and travel times of the edge w before the time tc; 6. w is the time elapsed since the last vehicle has passed the link w; 7. t tc is the required prediction horizon; 8. (t) 2 [0; 1] is the parameter de ning the driving di culty at the time t. This parameter takes into account the light and the weather conditions integrally; w(t) 2 [0; 1] is the parameter de ning the tra c density; w(t) is the parameter de ning dynamic of the tra c ow on the edge w.

These values have di erent meaningful information that a ects the way it is used. Parameters 1-5 present individual or aggregated statistical "precedents" of the travel times. These values can be used (together or separately, directly or indirectly) in the construction of the required estimate T w (tc; t) as some realizations of this assessment. Parameters 6-10 also directly a ect this estimate. However, these parameters cannot be considered as the precedents of the travel times. We use these parameters as the control parameters of the model (further we use notation pi for them).

To predict travel time T w (tc; t) we proposed to use a model based on the combination of the elementary prediction algorithms [ 9 ]. Fig. 1 provides a scheme of the proposed model.

Tlwast(tc); T0w(t); T w(s)(t); Tvw(j)(t) s = 0 f(t tj; Tj(tc))gjw(s)

s = jSwj 1 f(t tj; Tj(tc))gjw(s)

Algorithm 0 Algorithm K 1 Algorithm 0 Algorithm K 1

T 0w;1 tw;1 0 T 0w;K tw;K 0 T jwS;w1j 1 w;1 tjSwj 1 T jwS;wKj 1 w;K tjSwj 1

Aggregation function

Tbsw(tc; t)

We used four kernels: rectangular kernel (k = 0), triangular kernel (k = 1), exponential kernel (k = 2) and rational kernel (k = 3). As the aggregation function we used the following function:

K 1 Tbsw(tc; t) = X

X arksT rw;k k=0 r2S (tc; t)+ (4) + a1Tlwast(tc) + a2T0w(t) + a3T w(s)(t) + a4Tvw(j)(tc) :

Since our algorithm should be adaptive to the control parameters, we proposed to use hierarchical regression similar to regression tree. Parameters of the aggregation function (4) are de ned for each subdomain of de nition of the control parameters. Subdomains are calculated automatically using the hierarchical regression with the following approach: 1) domain of de nition for each control parameter pi is determined: [pimin; pimax]; 2) interval [pimin; pimax] is splitted into two intervals by an average value. To increase accuracy and decrease computation complexity of the proposed model, we modify used elementary prediction algorithms and the aggregation function. Experiments also show that the information, a ected the travel time and described in the previous section, has redundancy. We decide to use only signi cant parameters in our modi ed model.

First of all, we added a new transport type car to an input data of the modi ed model. We decided to use the following most signi cant information that a ect the predictive travel time value (2): o is the averaged historical travel time of the edge w by the vehis cles of the transport type s; 2. f(t tj ; Tj (tc))gjw is the set of precedents containing time instants and travel times of the edge w before the time tc; 3. t tc is the required prediction horizon; 4. (t) 2 [0; 1] is the parameter de nes the driving di culty at the time t. This parameter takes into account the light and the weather conditions integrally; 5. w(t) is the parameter de ning dynamics of the tra c ow on the edge w; 6. w(t) 2 [0; 1] is the parameter de ning the tra c density .

The elementary prediction algorithms (3) based on kernel functions have the similar accuracy and, moreover, have high correlation. To avoid these disadvantages we use the best one algorithm with the exponential kernel.

As an aggregation function we use ridge regression (regression with L2 regularization) of the elementary prediction algorithms. Ridge regression penalizes the euclidian norm of regression coe cients.

We also modify the method for de ning the intervals [pimin; pimax] of the control parameters for the each node in the hierarchical regression: 1) for each interval [pimin; pimax] we calculate the median value pimedian using histogram of the control parameter pi distribution; 2) divide each interval [pimin; pimax] into two using the median value: [pimin; pimax] ! [pimin; pimedian]; [pimedian; pimax]. 5

Test results and analysis

Main purpose of experiments is to compare proposed hierarchical regression based model with other complex nonlinear methods, since in [ 9 ] we compare our model only with elementary prediction algorithms. As such nonlinear method we choose an arti cial neural network, because this method is widely used for the travel time prediction and require not so much time for training procedure as support vector regression. Input dataset of the ANN contains both travel time estimations and control parameter values. As the ANN we use feed forward network with one hidden layer.

Models has been tested with the data of all the bus routes in Samara, Russia. The road network consists of 3387 edges. The number of the vehicles equipped with GPS sensors is more than 1500; coordinates of the vehicles positions are obtained every 30 seconds.

Tests were performed on data of all the public vehicles positions in Samara for one week. A validation set size is about 140 million records; a control set size is about 20 million records.

We compare the modi ed adaptive model with an elementary prediction algorithm based on the exponential kernel, the averaged historical travel time (statistics on the gures) and the neural network model. Number of hidden units were selected by the criterion of the mean absolute error minimization of 10 20 Prediction horizon, m 30 0

10 20 Prediction horizon, m 30

Exponential kernel Statistics Neural network Adaptive model

the ANN on the training set. We used ANN with 3 hidden units in a hidden layer. The exponential kernel width was also selected experimentally.

The prediction accuracy is evaluated by computing the mean absolute error (MAE) and the mean absolute percentage error (MAPE) depending on the prediction horizon. Fig. 3 shows the experimental results on the training set, g. 4 shows the experimental results on the control set. 0.25 0.20 E0.15 P A M0.10 0.05

In terms of considered accuracy measure the proposed adaptive model based on the hierarchical regression and the neural network model show similar results.

Since the prediction model should be run on-line, the computation time of the model is a critical issue. Next experiment shows the computation time of the adaptive and the neural network models for a di erent simultaneously processing number of the vehicles, using laptop computer (Intel Core i5-3740 3.20 GHz, 8 GB RAM) ( g. 5).

150 s m ,e100 m it n o itt a pu50 m o C 0 0 20

40 Number of vehicles

60 Neural network

The processing time for one vehicle takes about 7 ms on average, mainly depends on the required number of travel time predictions and is determined by the route and the position of the vehicle. Computation time of the neural network model is a little bit bigger than computation time of the adaptive model. Nonetheless both models can be used to predict the arrival time of all public vehicles in a large city in real time. 6

Conclusion and future work

In this paper we modify travel time prediction model based on adaptive combination of the elementary prediction algorithms. Our modi cations concern the set of elementary prediction algorithms and the aggregation function. The modi ed model has the same advantages as the base model: { allows to include prediction algorithms of any type in the adaptive combination; { has adaptability to the tra c situation changes; { has adaptability to the factors that has direct or indirect impact on the transport movement and / or prediction result.

In experiments performed on data of all bus routes in Samara (Russia) the modi ed adaptive model provides almost the same results as the neural network. The proposed model doesn't require so much time and memory resources for the training as the neural network. This is the main advantage of our model.

Comparison between the models of di erent classes, including time-series models and nonlinear regression models, will be conducted as our future work. Acknowledgments. The proposed base model and the modi ed model based on hierarchical regression (sections 3 and 4) were developed with support from the Russian Foundation for Basic Research grant 13-07-12103-o -m, grant 1507-01164-a, grant 16-37-00055-mol-a. The experimental results (section 5) were obtained with support from the Russian Science Foundation grant 14-31-00014 Establishment of a Laboratory of Advanced Technology for Earth Remote Sensing.

1. Moreira-Matias , L. , Mendes-Moreira , J. , De Sousa , J.F. , Gama , J.: Improving Mass Transit Operations by Using AVL-Based Systems: A Survey . In: IEEE Transactions on Intelligent Transportation Systems , vol. 16 ( 4 ), pp. 1636 - 1653 ( 2015 )

2. Hoogendoorn , S.P. : State-of-the-art of vehicular tra c ow modeling . In: Proceedings of the Institution of Mechanical Engineers. Part I: Journal of Systems and Control Engineering , vol. 215 ( 4 ), pp. 283 - 303 ( 2001 )

3. Cryer , J. , Chan , K. : Time Series Analysis With Applications in R. Springer-Verlag, New York ( 2008 )

4. Sun , H. , Liu, H.X. , Xiao , H. , He , R.R. , Ran , B. : Use of Local Linear Regression Model for Short-term Tra c Forecasting . In: Transportation Research Record , vol. 1836 , pp. 143150 ( 2003 )

5. Chen , M. , Liu , X. , Xia , J. , Chien , S.I.: A dynamic bus-arrival time prediction model based on APC data . In: Computer-Aided Civil and Infrastructure Engineering , vol. 19 ( 5 ), pp. 364 - 376 ( 2004 )

6. Jeong , R. , Rilett , L.R. : Bus arrival time prediction using arti cial neural network model . In: IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC , pp. 988 - 983 ( 2004 )

7. Bin , Y. , Zhongzhen , Y. , Baozhen , Y. : Bus arrival time prediction using support vector machines . In: Journal of Intelligent Transportation Systems: Technology, Planning, and Operations , vol. 10 ( 4 ), pp. 151 - 158 ( 2007 )

8. Wu , C.-H., Ho , J.-M. , Lee , D.T. : Travel-time prediction with support vector regression . In: IEEE Transactions on Intelligent Transportation Systems , vol. 5 ( 4 ), pp. 276 - 281 ( 2004 )

9. Agafonov , A. , Myasnikov , V. : An Adaptive Algorithm for Public Transport Arrival Time Prediction Based on Hierarhical Regression . In: IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC , pp. 2776 - 2781 ( 2015 )