Performance comparison of machine learning methods in the bus arrival time prediction problem A A Agafonov1 and A S Yumaganov1 1Samara National Research University, Moskovskoye shosse, 34, Samara, Russia, 443086 e-mail: ant.agafonov@gmail.com, yumagan@gmail.com Abstract. The problem of predicting the movement of public transport is one of the most popular problems in the field of transport planning due to its practical significance. Various parametric and non-parametric models are used to solve this problem. In this paper, heterogeneous information affecting the prediction value is used to predict the arrival time of public transport, and a comparison of the main machine learning algorithms for the public transport arrival time forecasting is given: neural networks, support vector regression. An experimental analysis of the algorithms was carried out on real traffic information about bus routes in Samara, Russia. 1. Introduction Public passenger transport is an important part of the transport system. Efficient use of passenger transport will help to reduce road congestion by reducing the use of personal vehicles, as well as cut down fuel consumption and reduce environmental pollution. To improve the quality of passenger transport service, among other things, it is necessary to provide passengers with information about the exact arrival time of vehicles at stops. This information is important for passengers because it allows them to choose alternative routes and reduce the waiting time for vehicles. The arrival time of vehicles at stops can be considered as stochastic, since it depends on many factors, including the passing time of road segments, the time spent at stops and the delay time at intersections. Furthermore, such factors as traffic congestion, incidents and weather conditions must be taken into account to predict the arrival time. Thus, the development of prediction model that takes into account various spatial-temporal factors is a difficult task. Despite the popularity of the above mentioned problem, many papers consider only individual factors (for example, speed of the vehicle on the current and previous road segments) to predict the arrival time at stop. Moreover, the comparison of algorithms in those papers is carried out on different sets of data that often include information about only one or a few routes. In this paper, a comparison of different public transport arrival time prediction models including artificial neural networks, support vector regression and linear regression is made. Heterogeneous information describing the transport situation is used for prediction. Comparison of algorithms is carried out on the traffic data of bus network in Samara, Russia. V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) Data Science A A Agafonov and A S Yumaganov 2. Related works There are a large number of studies devoted to the problem of public transport arrival prediction. All existing works can be divided into several categories according to the type of used models and algorithms: parametric and non-parametric regression models, Kalman filters based models, artificial neural networks, the support vector machine, hybrid models. Linear regression models [1, 2] are constructed as regression functions from a set of independent variables. The applicability of these models to transport systems is limited due to the strong correlation of the variables of the regression function. Nonparametric regression, in particular, the k-nearest-neighbor method, was used to solve the prediction problem in the papers [3, 4, 5]. However, the requirement of a large sample size imposes a restriction on the use of this method in real time. In [6], a clustering algorithm was used to determine the distribution of the travel time of the road segment. Models based on the Kalman filter [7] allow to estimate the future values of the dependent variables based on the recursive procedure, taking into account the stochastic nature of the process and the noise of the measurements. Models of artificial neural networks (ANN) [8, 9] are the most commonly used approaches for predicting arrival time. Prediction model presented in [8] combines two models of neural networks trained using two sets of data respectively: travel times dataset and arrival time at stops dataset. Authors of [9] used the Bayesian approach to combine several neural networks to build a prediction. The support vector regression (SVR) is a set of similar learning algorithms with a teacher used for classification and regression analysis problems [11, 12]. In [12], the travel time of the current and next road segments was used for prediction. In [11], the authors used a genetic algorithm to select SVR parameters. The authors of [13] used a prediction model that combines two SVR models. Hybrid models are also used to reduce the forecast error [14, 15, 16]. These models combine several heterogeneous methods and algorithms. The travel time prediction problem is necessary to solve other complex problems, such as reliable path finding [17] or autonomous vehicles routing [18]. The results of a comparison of several regression models and machine learning methods are presented in [19], the best result was shown by the SVR model. Inverse results were obtained in [20], the best results were shown by the neural network model. In most works, the best results of the public transport arrival time prediction were shown using machine learning methods: neural network models and SVR. However, the choice of a particular model depends on the used input data. 3. Basic notation and problem formulation A transport network is considered as a directed graph, the vertices of which correspond to the stops and the edges denotes segments of the transport network between the stops. Let’s s denotes a bus stop from set S; wij denotes the segment of the transport network between the stops i ∈ S and j ∈ S with length |wij |; r denotes public transport route from set R; Rij denotes the set of routes passing through segment wij ; n denotes a vehicle from set N ; Nr denotes a set of vehicles with route r ∈ R. The problem of arrival time prediction for the vehicle n ∈ N with route r ∈ R at the stop j ∈ S can be formulated as: tarr,n j = tdep,n i + Tijtravel,n , (1) where tarr,n j denotes the arrival time at the stop j, tdep,n i denotes the departure time from the stop i, Tijtravel,n denotes the travel time between stops i and j. V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 58 Data Science A A Agafonov and A S Yumaganov Then the problem of the arrival time prediction is reduced to the problem of travel time prediction Tijtravel,n or, equivalently, problem of vehicle’s speed vij n prediction. The problem can be formulated as follows: using the transport network graph, as well as statistical and real-time data, predict a speed n (t , t) at the time t, considering that the prediction is calculated at time t . v̂ij c c 4. Proposed model 4.1. Factors of prediction In order to obtain a speed prediction v̂ijn of a vehicle n ∈ N running the route r ∈ R, various factors affecting the predicted value can be taken into account. In contrast to the works known to the authors, this article proposes the use of heterogeneous information describing the transport situation. This information defined as follows: n of the vehicle n ∈ N on the segment w ; • The speed vij ij route,r • The weighted average speed vij of vehicles running the route r ∈ R on the segment wij :   P dep,k k k∈N r ω t − ti vij route,r vij (t) = P   , dep,k k∈Nr ω t − ti where ω(t) is a kernel ( exp (−αt), t ≤ ∆max , ω(t) = 0, t > ∆max ; ∆max is a time interval for which estimates of speed are considered. • The weighted average speed vij all of vehicles with any route on the segment w : ij   P P dep,k k all r∈R ij k∈Nr ω t − ti vij vij (t) = P   ; P dep,k r∈Rij k∈Nr ω t − ti • The average hourly traffic flow speed v hour ; • The average daily traffic flow speed v day ; stat (t) of vehicles with any route on the segment w at time • The historical average speed vij ij interval t; f low • The average traffic flow speed vij (t) on the segment wij at the time point t; f N ow • The traffic flow speed vij on the segment wij at the current time. It is assumed that the average hourly and average daily speeds reflect the current seasonal and weather situation indirectly, the average speed of the traffic flow reflects the changes in the traffic situation and the occurrence of congestion. 4.2. The basic model of an artificial neural network In [20], the neural network model with one hidden layer containing 5 neurons was used as a prediction model. Three factors were used to predict the travel time of the vehicle n ∈ N with the route r ∈ R on the road segment wij : route,r • the weighted speed of vehicle with the same route on the road segment vij (t); all (t); • the weighted speed of vehicle with any route on the road segment vij n • the vehicle speed on the previous segment vi−1,i . We denote this model as ANN3,5,1 . V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 59 Data Science A A Agafonov and A S Yumaganov 4.3. Support vector regression model The support vector regression (SVR) method is a special class of algorithms characterized by the use of kernels. The most common kernels are linear, polynomial, radial basis function, sigmoid. In this work a radial basis function is used in the following form: k(x, x0 ) = exp(−γkx − x0 k2 ), where γ > 0 is a model parameter, x and x0 are the input data of the model. The three above mentioned factors are used as an input data. 4.4. Extended model of artificial neural network We proposed to use an extended model of the neural network to predict the speed v̂ij n (t , t) of c a vehicle n ∈ N , running the route r ∈ R. The input data includes all the factors described in Section 4.1, and it can be written as a vector:  n n1 n2 route,r all stat stat V = vi−1,i , vij , vij ,vij (t), vij (t), vij (tc ), vij (t),  f low f low f N ow vij (tc ), vij (t), v hour (t), v day (t), vij . where n1 is a preceding vehicle of the route r which passed the transport segment wij , n2 is a preceding vehicle of any route which passed the road segment wij . The neural network model of the following form is used for prediction: one input layer (12 neurons), one hidden layer (13 neurons) and one output layer (1 neuron). The Adam [21] method was used as the optimization method. 4.5. Experiments Experimental studies of models were carried out on traffic data of bus routes in the transport network of Samara, Russia, for two months, from August 1, 2018 to September 30, 2018. The forecast was performed for 837 vehicles on 176 routes. The comparison of the linear regression model LR, basic neural network model ANN3,5,1 , support vector regression model SVR and the extended neural network model ANNext was made. In order to evaluate the prediction quality of each prediction model, two standard metrics were used: mean absolute percentage error (MAPE) and mean absolute error (MAE). n 1 X |vt − vˆt | MAPE = × 100% (2) n vt t=1 n 1X MAE = |vt − vˆt | (3) n t=1 where vt is a real value and v̂t is a predicted value. Table 1 shows the comparison of prediction models for one of the routes of the analysed transport network. Table 1. Comparison of prediction models. LR ANN3,5,1 SVR ANNext MAPE 29.58 29.76 34.75 27.75 MAE 1.76 1.77 2.20 1.60 V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 60 Data Science A A Agafonov and A S Yumaganov In this case, the size of the input data used for training and forecasting was limited to the size of selected route’s data. Data obtained on a given day were used as a test data, all the rest data were used as a train data. The table shows the average MAE and MAPE values obtained for 7 days. From the obtained results it can be seen that the average value of the prediction error for one road segment is quite high. The best result is demonstrated by the extended model of an artificial neural network. However, more interesting are the results of predicting the arrival time of vehicles at distant stops. For experimental studies of the dependence of MAPE and MAE on the forecast horizon, the full volume of data on the vehicles movement was used. The studies were carryed out for one day and all routes, while the data obtained for the entire above-mentioned period of time except the selected day were used as archival data. The time spent on training the SVR model amounts to tens of hours for such a significant amount of input data and the results obtained above show the superiority of other models. Thus the SVR model was not used on these experimental studies.The dependence of MAPE and MAE on the forecast horizon are shown in Figure 1. 80 1000 900 70 800 60 700 MAPE, % 50 600 MAE, c 40 500 400 30 300 20 200 10 100 0 0 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Forecast horizon, min , Forecast horizon. min Extended model of artificial neural network Linear regression Basic model of artificial neural network Figure 1. The dependence of MAPE and MAE on the forecast horizon. Based on the obtained results, it can be concluded that the prediction quality of the extended model of an artificial neural network is higher throughout the forecast horizon than the prediction quality of the other models . The worst result was obtained using the basic model of the artificial neural network. At the same time, the value of MAPE decreases for all considered models with an increase in the forecast horizon value. The prediction quality of the vehicles arrival time at distant stops is significantly higher than the prediction quality for the nearest stops. 5. Conclusion This paper proposed an extended model of the neural network which takes into account heterogeneous information to predict the arrival time of the public transport. The experiments were carried out on real traffic information about bus routes in the Samara, Russia. The proposed model showed the best results compared to linear regression model, support vector regression model and the basic model of the artificial neural network. The proposed model can be used to predict the arrival time of public transport in real time. The possible direction of further research includes the usage of different models for individual routes or periods of the day. V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 61 Data Science A A Agafonov and A S Yumaganov 6. References [1] Agafonov A A, Sergeyev A V and Chernov A V 2012 Forecasting of the motion parameters of city transport by satellite monitoring data Computer Optics 36(2) 453-458 [2] Jeon R and Rilett L 2005 Prediction model of bus arrival time for real-time applications Transportation Research Record 1927 195-204 [3] Chanh H, Park D, Lee S, Lee H and Baek S 2010 Dynamic multi-interval bus travel time prediction using bus transit data Transportmetrica 6 19-38 [4] Smith B, Williams B and Keith Oswald R 2002 Comparison of parametric and nonparametric models for traffic flow forecasting Transportation Research Part C: Emerging Technologies 10 303-321 [5] Agafonov A A, Yumaganov A S and Myasnikov V V 2018 Big data analysis in a geoinformatic problem of short-term traffic flow forecasting based on a K nearest neighbors method Computer Optics 42(6) 1101-1111 DOI: 10.18287/2412-6179-2018-42-6-1101-1111 [6] Xu H and Ying J 2017 Bus arrival time prediction with real-time and historic data Cluster Computing 20 3099-3106 [7] Chen M, Liu X, Xia J and Chien S 2004 A dynamic bus-arrival time prediction model based on APC data Computer-Aided Civil and Infrastructure Engineering 19 364-376 [8] Chien S J, Ding Y and Wei C 2002 Dynamic bus arrival time prediction with artificial neural networks Journal of Transportation Engineering 128 429-438 [9] van Hinsbergen C, van Lint J and van Zuylen H 2009 Bayesian committee of neural networks to predict travel times with confidence intervals Transportation Research Part C: Emerging Technologies 17 498-509 [10] Jeong R and Rilett L 2004 Bus arrival time prediction using artificial neural network model Proc. of the 7th International IEEE Conference on Intelligent Transportation Systems 1 988-993 [11] Yang M, Chen C, Wang L, Yan X and Zhou L 2016 Bus arrival time prediction using support vector machine with genetic algorithm Neural Network World 26 205-217 [12] Bin Y, Zhongzhen Y and Baozhen Y 2006 Bus arrival time prediction using support vector machines Journal of Intelligent Transportation Systems: Technology, Planning, and Operations 10 151-158 [13] Yu B, Yang Z Z and Yu B 2009 Hybrid model for multi-stop arrival time prediction Neural Network World 19 321-332 [14] Agafonov A and Myasnikov V 2015 Traffic flow forecasting algorithm based on combination of adaptive elementary predictors Communications in Computer and Information Science 542 163-174 [15] Yu B, Yang Z Z, Chen K and Yu B 2010 Hybrid model for prediction of bus arrival times at next station Journal of Advanced Transportation 44 193-204 [16] Zheng W, Lee D H and Shi Q 2006 Short-term freeway traffic flow prediction: Bayesian combined neural network approach Journal of Transportation Engineering 132 114-121 [17] Agafonov A A and Myasnikov V V 2016 Method for the reliable shortest path search in time- dependent stochastic networks and its application to GIS-based traffic control Computer Optics 40(2) 275-283 DOI: 10.18287/2412-6179-2016-40-2-275-283 [18] Agafonov A A and Myasnikov V V 2018 Numerical route reservation method in the geoinformatic task of autonomous vehicle routing Computer Optics 42(5) 912-920 DOI: 10.18287/ 2412-6179-2018-42-5-912-920 [19] Yu B, Lam W and Tam M 2011 Bus arrival time prediction at bus stop with multiple routes Transportation Research Part C: Emerging Technologies 19 1157-1170 [20] Yin T, Zhong G, Zhang J, He S and Ran B 2017 A prediction model of bus arrival time at stops with multi-routes Transportation research procedia 25 4627-4640 [21] Kingma D P and Ba J L 2014 Adam: A Method for Stochastic Optimization Computing Research Repository 15 Acknowledgments This work was supported by the RFBR (research projects N18-29-03135-mk, N 18-07-00605 A). V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 62