Structural Optimization of the Travel Time
     Prediction Model Based on Hierarchical
                   Regression

                   Anton Agafonov1,2 and Vladislav Myasnikov1,2
             1
             Samara State Aerospace University (SSAU), Samara, Russia,
     2
         Image Processing Systems Institute of the Russian Academy of Sciences
                            (IPSI RAS), Samara, Russia
                  ant.agafonov@gmail.com, vmyas@geosamara.ru


         Abstract. In this paper we consider a problem of public transport ar-
         rival time prediction for a large city in real time. We investigate the
         algorithm based on a model of an adaptive combination of elementary
         prediction algorithms. Adaptability means that parameters of the con-
         structed combination depend on a number of control parameters of the
         model. We compare our model with the nonlinear artificial neural net-
         work model. Comparison is performed on data of the public transport
         network in Samara, Russia.

         Keywords: travel time prediction, combination of algorithms, neural
         network, hierarchical regression


1   Introduction
Geoinformation systems ans intelligent transportation systems (ITS) based on
automated vehicle location data are widely used by public transportation com-
panies around the world. ITS can be used for the improvement of public trans-
port reliability using different operational planning and control strategies. Such
strategies can be used to improve the public transport network design, evaluate
the schedule reliability and improve schedule planning [1].
    Travel time prediction problem is one of the most common transportation
problems. Travel time prediction can be used for solving many other problems
in different contexts, including management, navigation, planning, monitoring
and control problem, mentioned early.
    There are much research focused on the travel time prediction problem. All
approaches can be divided into the four following categories [1].
1. Historical data based models. These approaches predict travel time in a
   certain period by the averaged over historical values during the same period
   in the previous days. The simplicity is the main drawback of these models,
   because models ignore both current road situation (such as congestions or
   accidents) and complex relationship between travel times and other variables
   in urban public transport networks.
2         Structural Optimization of the Travel Time Prediction Model

 2. Traffic theory based models [2]. These models perform travel time prediction
    based on traffic flow / traffic density estimations. The requirement of an
    actual traffic flow model is the main drawback of these models.
 3. Time-series models [3]. These approaches assume that the future travel time
    depends only on the most recent data samples. Accuracy of models decreases
    for a longer forecasting horizon.
 4. Machine learning and regression methods. These methods construct a re-
    gression function depending on a set of independent variables, which may
    include real-time and historical data of the road links travel times, road sit-
    uation, passenger, weather situation, delays at stops, etc [4]. Over the last
    two decades, regression models have been the state of the art for this kind of
    applications. Artificial neural networks (ANN) are widely used for the travel
    time prediction [5, 6], can represent complex nonlinear relationships between
    the target travel time value and the set of independent variables. Support
    vector regression (SVR) is used for travel time prediction in [7, 8]. Nonlinear
    regressions methods (such as ANN or SVR) has several drawbacks compared
    with other regression methods, including high training computational com-
    plexity and possibility for overfitting.
    In this paper we investigate the adaptive model proposed in [9]. This model
is based on adaptive combination of the elementary prediction algorithms. The
model uses hierarchical regression for representing complex nonlinear relation-
ships between the target travel time value and the set of independent variables.
Main purposes of this paper are the structural optimisation of the hierarchi-
cal regression based model and the comparison of the proposed model with an
artificial neural network model.
    The structure of this paper is organized as follows: the second section con-
tains a problem statement. The third section contains a description of the base
model. The next section describes modified structure of the base model, used
elementary predictors and aggregated function. In the fifth section we present
an experimental study of the algorithm. Lastly we present our conclusions and
recommendations for further work.


2      Problem statement
We use the following notations:
    – w is the road link in a transport network;
    – j is the index of the specified vehicle;
    – s is the transport type (bus, tram, trolley), S is the set of transport types;
    – ∆j ∈ [0, 1] is the relative position of the vehicle j on the road link;
    – tc is the current time instant (at which predictions are calculated).
    The position of any vehicle in the transport network we define by the pair
(w, ∆).
    Let the vehicle j at the time tc be in the position (w0 , ∆0 ). We should estimate
the arrival time, after which the vehicle be at the stop with position (wK , ∆K ).
                 Structural Optimization of the Travel Time Prediction Model          3

    Let the route between two positions (w0 , ∆0 ) and (wK , ∆K ) includes the
following edges (w0 , ∆0 ) → w1 → w2 → · · · → wK−1 → (wK , ∆K ). Let Tjw (tc , t)
be the travel time along the edge w starting at the time t, assuming that the
prediction is calculated at the time tc .
    Assuming, that the vehicle j moves monotonically on each road link, the
predictive arrival time to the stop (wK , ∆K ) , calculated at the time tc , can be
written as follows:
            (w ,∆0 )→(wK ,∆K )
          Tj 0                   (tc ) = (1 − ∆0 )Tjw0 (tc , tc )
                                     K−1                                        
                                                           (w ,∆ )→(wk ,0)
                                     X
                                 +         Tjwk tc , tc + Tj 0 0           (tc )     (1)
                                     k=1
                                                                          
                                                      (w ,∆ )→(wK ,0)
                                 + ∆K TjwK tc , tc + Tj 0 0           (tc ) .

As it can be seen from the above expression, its main components are terms
Tjwk (· · · ) that characterize the travel time of the road links.
    For simplification, further we consider the travel time prediction problem at
the specified edge wk and the specified time t:

                                           T w (tc , t) .                            (2)

    In next section we briefly describe our prediction model proposed in [9].


3    The base model

Calculation of prediction (2) can be performed using different information usu-
ally established in urban public transport networks. Firstly we describe the dif-
ferent travel time estimations and control parameters that affect the predictive
travel time value (2):
    n             o
         w
 1. TΣ(s)     (t) is the averaged historical travel time of the edge w by the vehi-
                    s
    cles of the transport type s;
      w
 2. Tlast  (tc ) is the most recent travel time of the link w;
 3. T0w (t) is the normative travel time by the planning schedule;
      w
 4. Tv(j)  (tc ) is the estimated travel time by the mean speed of the specified
    vehicle;
                         w
 5. {(t − tj , Tj (tc ))}j is the set of precedents containing time instants and travel
    times of the edge w before the time tc ;
 6. τ w is the time elapsed since the last vehicle has passed the link w;
 7. t − tc is the required prediction horizon;
 8. γ(t) ∈ [0, 1] is the parameter defining the driving difficulty at the time t. This
    parameter takes into account the light and the weather conditions integrally;
 9. ρw (t) ∈ [0, 1] is the parameter defining the traffic density;
10. η w (t) is the parameter defining dynamic of the traffic flow on the edge w.
4         Structural Optimization of the Travel Time Prediction Model

    These values have different meaningful information that affects the way it is
used. Parameters 1-5 present individual or aggregated statistical ”precedents”
of the travel times. These values can be used (together or separately, directly
or indirectly) in the construction of the required estimate T w (tc , t) as some
realizations of this assessment. Parameters 6-10 also directly affect this estimate.
However, these parameters cannot be considered as the precedents of the travel
times. We use these parameters as the control parameters of the model (further
we use notation pi for them).
    To predict travel time T w (tc , t) we proposed to use a model based on the
combination of the elementary prediction algorithms [9]. Fig. 1 provides a scheme
of the proposed model.


                                 w
                               Tlast (tc ), T0w (t), TΣ(s)
                                                      w          w
                                                           (t), Tv(j) (t)


                     s=0                                               s = |S w | − 1
                                 w                                                       w
             {(t − tj , Tj (tc ))}j(s)         ···    ···           {(t − tj , Tj (tc ))}j(s)


      Algorithm 0      ···    Algorithm K − 1                 Algorithm 0        ···   Algorithm K − 1


      w,1                        w,K                                  w,1                         w,K
     T0                       T0                                    T |S w |−1                  T |S w |−1
      w,1                        w,K                                  w,1                         w,K
     t0                        t0                                    t|S w |−1                   t|S w |−1
                                                                                                             τw
                                                                                                             t − t0
                                         Aggregation function                                                γ(t)
                                                                                                             ρw (t)
                                                                                                             η w (t)
                                              Tbsw (tc , t)

                                         Fig. 1. The model scheme.


    The elementary prediction algorithms take as input a set of precedents
                     w                            w,k
{(t − tj , Tj (tc ))}j and return predictions T (tc , t) For this mapping we used
the following kernel functions:
                                          P
                             w,k            j ωk (t − tj )Tj (tc )
                           T s (tc , t) =    P                     ,
                                                j ωk (t − tj )
                                          P                                   (3)
                             w,k           j ωk (t − tj )tj (tc )
                            ts (tc , t) =   P                      .
                                               j ωk (t − tj )

   We used four kernels: rectangular kernel (k = 0), triangular kernel (k = 1),
exponential kernel (k = 2) and rational kernel (k = 3).
                Structural Optimization of the Travel Time Prediction Model              5

    As the aggregation function we used the following function:
                            K−1
                            XX           w,k
          Tbsw (tc , t) =          akrs T r (tc , t)+
                        k=0 r∈S                                                         (4)
                             w
                      + a1 Tlast (tc ) + a2 T0w (t) + a3 TΣ(s)
                                                          w              w
                                                               (t) + a4 Tv(j) (tc ) .

    Since our algorithm should be adaptive to the control parameters, we pro-
posed to use hierarchical regression similar to regression tree. Parameters of the
aggregation function (4) are defined for each subdomain of definition of the con-
trol parameters. Subdomains are calculated automatically using the hierarchical
regression with the following approach:
1) domain of definition for each control parameter pi is determined: [pmin
                                                                       i   , pmax
                                                                              i   ];
              min max
2) interval [pi , pi ] is splitted into two intervals by an average value.
   Fig. 2 shows an example of the hierarchical regression construction for two
control parameters.


                                                 Level 0


                                                 Level 1


                                                 Level 2


             Fig. 2. Example of the hierarchical regression construction.


    Proposed hierarchical regression is used for representing complex nonlinear
relationships between the target travel time value and the set of independent
variables.
    In the next section we describe main modifications of the model.


4    Modifications of the model
To increase accuracy and decrease computation complexity of the proposed
model, we modify used elementary prediction algorithms and the aggregation
function. Experiments also show that the information, affected the travel time
and described in the previous section, has redundancy. We decide to use only
significant parameters in our modified model.
    First of all, we added a new transport type car to an input data of the
modified model. We decided to use the following most significant information
that affect the predictive travel time value (2):
6       Structural Optimization of the Travel Time Prediction Model
    n           o
         w
 1.   TΣ(s)  (t) is the averaged historical travel time of the edge w by the vehi-
                  s
    cles of the transport type s;
                         w
 2. {(t − tj , Tj (tc ))}j is the set of precedents containing time instants and travel
    times of the edge w before the time tc ;
 3. t − tc is the required prediction horizon;
 4. γ(t) ∈ [0, 1] is the parameter defines the driving difficulty at the time t. This
    parameter takes into account the light and the weather conditions integrally;
 5. η w (t) is the parameter defining dynamics of the traffic flow on the edge w;
 6. ρw (t) ∈ [0, 1] is the parameter defining the traffic density .

   The elementary prediction algorithms (3) based on kernel functions have the
similar accuracy and, moreover, have high correlation. To avoid these disadvan-
tages we use the best one algorithm with the exponential kernel.
   As an aggregation function we use ridge regression (regression with L2 reg-
ularization) of the elementary prediction algorithms. Ridge regression penalizes
the euclidian norm of regression coefficients.
   We also modify the method for defining the intervals [pmin  i  , pmax
                                                                     i   ] of the
control parameters for the each node in the hierarchical regression:

1) for each interval [pmin
                       i    , pmax
                               i     ] we calculate the median value pmedian
                                                                          i  using
   histogram of the control parameter pi distribution;
2) divide each interval [pmin
                          i     , pmax
                                   i      ] into two using the median value:
     min max         min median
   [pi , pi ] → [pi , pi              ], [pmedian
                                            i     , pmax
                                                     i   ].


5     Test results and analysis

Main purpose of experiments is to compare proposed hierarchical regression
based model with other complex nonlinear methods, since in [9] we compare our
model only with elementary prediction algorithms. As such nonlinear method
we choose an artificial neural network, because this method is widely used for
the travel time prediction and require not so much time for training procedure
as support vector regression. Input dataset of the ANN contains both travel
time estimations and control parameter values. As the ANN we use feed forward
network with one hidden layer.
    Models has been tested with the data of all the bus routes in Samara, Russia.
The road network consists of 3387 edges. The number of the vehicles equipped
with GPS sensors is more than 1500; coordinates of the vehicles positions are
obtained every 30 seconds.
    Tests were performed on data of all the public vehicles positions in Samara
for one week. A validation set size is about 140 million records; a control set size
is about 20 million records.
    We compare the modified adaptive model with an elementary prediction
algorithm based on the exponential kernel, the averaged historical travel time
(statistics on the figures) and the neural network model. Number of hidden
units were selected by the criterion of the mean absolute error minimization of
                        Structural Optimization of the Travel Time Prediction Model                    7

the ANN on the training set. We used ANN with 3 hidden units in a hidden
layer. The exponential kernel width was also selected experimentally.
    The prediction accuracy is evaluated by computing the mean absolute error
(MAE) and the mean absolute percentage error (MAPE) depending on the pre-
diction horizon. Fig. 3 shows the experimental results on the training set, fig. 4
shows the experimental results on the control set.


           300                                                     0.25

                                                                   0.20
           200
MAE, sec


                                                                   0.15


                                                            MAPE
                                                                   0.10
           100
                                                                   0.05

            0                                                      0.00
                 0       10           20            30                    0     10            20      30
                      Prediction horizon, m                                   Prediction horizon, m


                       Exponential kernel      Statistics   Neural network    Adaptive model


       Fig. 3. MAE and MAPE depending on the prediction horizon for the training set.


           300                                                     0.25

                                                                   0.20
           200
MAE, sec


                                                                   0.15
                                                            MAPE


                                                                   0.10
           100
                                                                   0.05

             0                                                     0.00
                 0        10           20           30                    0     10            20      30
                       Prediction horizon, m                                  Prediction horizon, m

                       Exponential kernel      Statistics   Neural network     Adaptive model

           Fig. 4. MAE and MAPE depending on the prediction horizon for the control set.


   In terms of considered accuracy measure the proposed adaptive model based
on the hierarchical regression and the neural network model show similar results.
   Since the prediction model should be run on-line, the computation time of
the model is a critical issue. Next experiment shows the computation time of the
8         Structural Optimization of the Travel Time Prediction Model

adaptive and the neural network models for a different simultaneously processing
number of the vehicles, using laptop computer (Intel Core i5-3740 3.20 GHz, 8
GB RAM) (fig. 5).


                                     150
              Computation time, ms


                                     100


                                     50


                                      0
                                           0               20                    40          60
                                                                Number of vehicles

                                                         Neural network     Adaptive model


                                               Fig. 5. Computation time of the models.


   The processing time for one vehicle takes about 7 ms on average, mainly
depends on the required number of travel time predictions and is determined
by the route and the position of the vehicle. Computation time of the neural
network model is a little bit bigger than computation time of the adaptive model.
Nonetheless both models can be used to predict the arrival time of all public
vehicles in a large city in real time.


6      Conclusion and future work

In this paper we modify travel time prediction model based on adaptive combina-
tion of the elementary prediction algorithms. Our modifications concern the set
of elementary prediction algorithms and the aggregation function. The modified
model has the same advantages as the base model:

    – allows to include prediction algorithms of any type in the adaptive combi-
      nation;
    – has adaptability to the traffic situation changes;
    – has adaptability to the factors that has direct or indirect impact on the
      transport movement and / or prediction result.

  In experiments performed on data of all bus routes in Samara (Russia) the
modified adaptive model provides almost the same results as the neural network.
                Structural Optimization of the Travel Time Prediction Model          9

The proposed model doesn’t require so much time and memory resources for the
training as the neural network. This is the main advantage of our model.
    Comparison between the models of different classes, including time-series
models and nonlinear regression models, will be conducted as our future work.


Acknowledgments. The proposed base model and the modified model based
on hierarchical regression (sections 3 and 4) were developed with support from
the Russian Foundation for Basic Research grant 13-07-12103-ofi-m, grant 15-
07-01164-a, grant 16-37-00055-mol-a. The experimental results (section 5) were
obtained with support from the Russian Science Foundation grant 14-31-00014
Establishment of a Laboratory of Advanced Technology for Earth Remote Sens-
ing.


References
1. Moreira-Matias, L., Mendes-Moreira, J., De Sousa, J.F., Gama, J.: Improving Mass
   Transit Operations by Using AVL-Based Systems: A Survey. In: IEEE Transactions
   on Intelligent Transportation Systems, vol. 16 (4), pp. 1636-1653 (2015)
2. Hoogendoorn, S.P.: State-of-the-art of vehicular traffic flow modeling. In: Proceed-
   ings of the Institution of Mechanical Engineers. Part I: Journal of Systems and
   Control Engineering, vol. 215(4), pp. 283-303 (2001)
3. Cryer, J., Chan, K.: Time Series Analysis With Applications in R. Springer-Verlag,
   New York (2008)
4. Sun, H., Liu, H.X., Xiao, H., He, R.R., Ran, B.: Use of Local Linear Regression
   Model for Short-term Traffic Forecasting. In: Transportation Research Record, vol.
   1836, pp. 143150 (2003)
5. Chen, M., Liu, X., Xia, J., Chien, S.I.: A dynamic bus-arrival time prediction model
   based on APC data. In: Computer-Aided Civil and Infrastructure Engineering, vol.
   19(5), pp.364-376 (2004)
6. Jeong, R., Rilett, L.R.: Bus arrival time prediction using artificial neural network
   model. In: IEEE Conference on Intelligent Transportation Systems, Proceedings,
   ITSC, pp. 988-983 (2004)
7. Bin, Y., Zhongzhen, Y., Baozhen, Y.: Bus arrival time prediction using support vec-
   tor machines. In: Journal of Intelligent Transportation Systems: Technology, Plan-
   ning, and Operations, vol. 10 (4), pp. 151-158 (2007)
8. Wu, C.-H., Ho, J.-M., Lee, D.T.: Travel-time prediction with support vector regres-
   sion. In: IEEE Transactions on Intelligent Transportation Systems, vol. 5(4), pp.
   276-281 (2004)
9. Agafonov, A., Myasnikov, V.: An Adaptive Algorithm for Public Transport Arrival
   Time Prediction Based on Hierarhical Regression. In: IEEE Conference on Intelli-
   gent Transportation Systems, Proceedings, ITSC, pp. 2776-2781 (2015)