=Paper= {{Paper |id=Vol-3611/paper14 |storemode=property |title=Aplication of machine learning for improving contemporary ETA forecasting |pdfUrl=https://ceur-ws.org/Vol-3611/paper14.pdf |volume=Vol-3611 |authors=Gabrielė Kasputyte,Arnas Matusevičius,Tomas Krilavičius |dblpUrl=https://dblp.org/rec/conf/ivus/KasputyteMK22 }} ==Aplication of machine learning for improving contemporary ETA forecasting == https://ceur-ws.org/Vol-3611/paper14.pdf
                                A p pli c ati o n of m a c hi n e le a r ni n g f o r im p r o vi n g
                                c o nt e m p o r a r y E T A fo r e c a sti n g
                                G a bri el ė K as p ut yt ė 1, 3 ,† , Ar n as M at us e vi či us 1, 3 ,† a n d T o m as Kril a vi či us 2, 3 ,*
                                1
                                  V yt a ut as M a g n us U ni versit y, F ac ult y of I nf or m atics, De p art me nt of M at he m atics a n d St atistics, Vil ei k os street 8, L T- 4 4 4 0 4 K a u n as, Lit h u a ni a
                                2
                                  V yt a ut as M a g n us U ni versit y, F ac ult y of I nf or m atics, De p art me nt of A p plie d I nf or m atics, Vil ei k os street 8, L T- 4 4 4 0 4 K a u n as, Lit h u a ni a
                                3
                                  C e ntr e f or A p plie d Rese arc h a n d De vel o p me nt, Lit h u a ni a


                                                                                               A bstr a ct
                                                                                               T h e p o p ul arit y of r o a d tr a ns p ort is gr o wi n g e v er hi g h er i n t h e m o d er n d a y of a g e a n d is t h e m ost p o p ul ar m o d e f or tr a ns p orti n g
                                                                                               g o o ds [1 ]. Us u all y, tr a ns p ort ati o n i n cr e as es t h e pri m e c ost of t h e g o o ds or s er vi c es, t h us, t o i n cr e as e a c o m p a n y’s pr o fit,
                                                                                               e ff e cti v e v e hi cl e r o uti n g b e c o m es e ss e nti al. I n t his r es e ar c h, w e h a v e a n al ys e d t h e p ossi bilit y of i m pr o vi n g t h e f or e c ast of
                                                                                               esti m at e d ti m e of arri v al b y r a n ki n g t h e dri v ers b as e d o n t h eir b e h a vi o ur d at a a n d esti m ati n g d e vi ati o ns fr o m pl a n n e d arri v al
                                                                                               ti m e usi n g di ff er e nt m a c hi n e l e ar ni n g m et h o ds. T h e r a n ki n g w as p erf or m e d wit h T O P SI S a n d VI K O R m et h o ds, w hil e t h e
                                                                                               f or e c asti n g w as p erf or m e d usi n g fi v e m a c hi n e l e ar ni n g al g orit h ms: d e cisi o n tr e e, r a n d o m f or est, X G B o ost, S u p p ort Ve ct or
                                                                                               M a c hi n e a n d k - N e ar est N ei g h b o urs. T h e p erf or m a n c e of t h e f or e c asti n g m o d els w as e v al u at e d usi n g t h e a dj ust e d c o e ffi ci e nt
                                                                                               of d et er mi n ati o n, r o ot s q u ar e m e a n err or a n d m e a n a bs ol ut e err or m etri cs. It w as c o n cl u d e d t h at t h e VI K O R m et h o d s h o ul d
                                                                                               b e us e d t o r a n k t h e dri v ers. M or e o v er, r es e ar c h r es ults r e v e al e d t h at t h e b est f or e c asti n g p erf or m a n c e w as a c hi e v e d usi n g a n
                                                                                               e ns e m bl e m o d el b as e d o n r a n d o m f or est a n d S u p p ort Ve ct or M a c hi n e m o d els.

                                                                                                K e y w or ds
                                                                                                Esti m at e d ti m e of arri v al ( E T A), fr ei g ht tr a ns p ort, d ri v er’s s c or e, ra n ki n g, m a c hi n e l e ar ni n g, T O P SI S, VI K O R, D e cisi o n Tr e e,
                                                                                                R a n d o m F or est, X G B o ost, S u p p ort Ve ct or M a c hi n e, k - N e ar est N ei g h b o urs



                                1. I nt r o d u cti o n                                                                                                                                                                     tr o d u c es t h e d at as ets us e d i n t h e c urr e nt st u d y. S e cti o n 4
                                                                                                                                                                                                                            pr es e nts t h e s el e ct e d r a n ki n g a n d f or e c asti n g t e c h ni q u es.
                                T o i m pr o v e t h e tr a ns p ar e n c y of a s u p pl y c h ai n, its p ar-                                                                                                             E x p eri m e nt al r es ults ar e pr o vi d e d i n S e cti o n 5 . Fi n all y,
                                ti ci p a nts us e tr a ns p ort m a n a g e m e nt s yst e ms as w ell as                                                                                                                  c o n cl u di n g r e m ar ks a n d f ut ur e pl a ns ar e dis c uss e d i n
                                tr a c ki n g a n d m o nit ori n g s yst e ms. Ve hi cl es ar e g etti n g                                                                                                                 S e cti o n 6 .
                                e q ui p p e d wit h str o n g er mi cr o pr o c ess ors, l ar g er m e m or y
                                c a p a cit y a n d r e al-ti m e o p er ati n g s yst e ms. T h e n e wl y
                                i nst all e d t e c h n ol o gi c al pl atf or ms c a n us e m or e a d v a n c e d                                                                                                         2. R el at e d w o r k
                                a p pli c ati o ns of t h e o p er ati n g s yst e m, i n cl u di n g m o d el-
                                b as e d pr o c ess c o ntr ol f u n cti o ns, arti fi ci al i nt elli g e n c e,                                                                                                            T o b ett er u n d erst a n d w h at attri b ut es a n d m et h o ds c a n b e
                                a n d c o m pr e h e nsi v e c o m p ut ati o n.                                                                                                                                            us e d f or r a n ki n g dri v ers, r es e ar c h [2 ] w as st u di e d. T h e
                                    T h er ef or e, a n i n cr e asi n g tr e n d of i m pl e m e nti n g t h e                                                                                                             ai m of t his r es e ar c h w as t o c at e g ori z e dri v ers a c c or d-
                                l at est d e v el o p m e nts i n i nf or m ati o n a n d i nt er n et t e c h-                                                                                                             i n g t o t h eir ris k- pr o n e n ess b y a n al ysi n g a G P S- b as e d
                                n ol o gi es i n t h e tr a ns p ort s e ct or as w ell as a t e n d e n c y of                                                                                                             d e vi c e’s ur b a n tr a ffi c d at a. Hi er ar c hi c al Cl ust eri n g Al g o-
                                d e v el o pi n g n e w r es e ar c h- b as e d s yst e ms t h at c a n q ui c kl y                                                                                                         rit h m ( H C A) a n d Pri n ci p al C o m p o n e nt A n al ysis ( P C A)
                                a d a pt t o a n e v er- c h a n gi n g e n vir o n m e nt is o bs er v e d. T h e                                                                                                          w er e us e d f or t h e st atisti c al a n al ysis of t h e f oll o wi n g dri v-
                                g o al of t his st u d y is t o i d e ntif y m et h o ds, b est s uit e d f or                                                                                                              i n g p ar a m et ers: S p e e d o ver 6 0 k m/ h , S p e e d , Acc el er ati o n ,
                                s ol vi n g t h e esti m at e d ti m e of arri v al ( E T A) f or e c asti n g                                                                                                              P ositi ve acc el er ati o n , Br a ki n g a n d M ec h a nic al w or k . T h e
                                pr o bl e m.                                                                                                                                                                                a ut h ors of t his r es e ar c h c o n cl u d e t h at w hil e it is p ossi bl e
                                    T h e r est of t h e p a p er is or g a ni z e d as f oll o ws. R el at e d                                                                                                             t o cl assif y t h e dri v ers a c c or di n g t o t h es e p ar a m et ers, a
                                w or k i n t his ar e a is pr es e nt e d i n S e cti o n 2 . S e cti o n 3 i n-                                                                                                            l ot of e xt er n al f a ct ors, s u c h as t h e dri vi n g e n vir o n m e nt
                                                                                                                                                                                                                            or t h e c o n diti o n of t h e ass ess e d dri v er, ar e n ot t a k e n i nt o
                                I V U S 2 0 2 2: 2 7t h I nter n ati o n al C o nfere nce o n I nf or m ati o n Tec h n ol o g y,                                                                                           a c c o u nt.
                                M a y 1 2, 2 0 2 2, K a u n as, Lit h u a ni a                                                                                                                                                  A c o m p ar ati v e a n al ysis of t w o m ulti- crit eri a d e cisi o n-
                                *
                                  C orr es p o n di n g a ut h or.                                                                                                                                                          m a ki n g ( M C D M) m et h o ds T O P SI S a n d VI K O R is pr e-
                                †
                                  T h es e a ut h ors c o ntri b ut e d e q u all y.                                                                                                                                        s e nt e d i n [3 ]. It w as f o u n d t h at t h es e t w o m et h o ds us e
                                     g a bri el e. k as p ut yt e @ v d u.lt ( G. K as p ut yṫe);                                                                                                                          di ff er e nt ki n ds of n or m ali z ati o n ( T O P SI S us es v e ct or n or-
                                ar n as. m at us e vi ci us @ v d u.lt ( A. M at us e vi či us);                                                                                                                            m ali z ati o n, VI K O R – li n e ar) t o eli mi n at e t h e u nits of
                                t o m as. kril a vi ci us @ v d u.lt ( T. Kril a vi či us)
                                     0 0 0 0- 0 0 0 1- 8 5 0 9- 4 2 0 X ( T. Kril a vi či us)                                                                                                                               crit eri o n f u n cti o ns a n d us e di ff er e nt a g gr e g ati n g f u n c-
                                                                                       © 2 0 2 2 C o p yri g ht f or t his p a p er b y its a ut h ors. Us e p er mitt e d u n d er Cr e ati v e C o m m o ns Li c e ns e
                                                                                       Attri b uti o n 4. 0 I nt er n ati o n al ( C C B Y 4. 0).
                                                                                                                                                                                                                            ti o ns f or r a n ki n g. N o n et h el ess, b ot h m et h o ds w er e f o u n d
                                    CE U R
                                    W or k s h o p
                                    Pr o c e e di n g s
                                                          htt p:// c e ur- w s. or g
                                                          I S S N 1 6 1 3- 0 0 7 3
                                                                                       C E U R W or ks h o p Pr o c e e di n gs ( C E U R- W S. or g )                                                                      s uit a bl e f or s ol vi n g r a n ki n g pr o bl e ms.




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
    The use of machine learning (ML) techniques for the       3. Research data
ETA problem is discussed in [4] The authors applied
artificial neural networks (ANNs) and support vector re-      3.1. Data for ranking drivers
gression (SVR) to predict the time of arrival of container
                                                              The available readings from a vehicle monitoring system
ships. Distance to the destination, the timestamp, geo-
                                                              that can be used to evaluate a driver’s behaviour were
coordinates, and weather information have been chosen
                                                              extracted. The readings in this research cover the period
as features. It was shown that SVR had performed better
                                                              from August 21, 2020, to January 1, 2022, and up to 398
than ANNs and that the weather data did not have a sig-
                                                              observations representing different vehicles.
nificant impact on estimating the time of vessel arrivals.
                                                                 A dataset was constructed containing values for 7 at-
    In the [5] study, the 𝑘-Nearest neighbours (KNN), SVR,
                                                              tributes, namely Free-rolling distance, Engine overloaded
and the random forest algorithms were evaluated as meth-
                                                              distance, Highest gear distance, Excess idling, Overspeeding
ods for predicting the arrival time of open-pit trucks. A
                                                              time, Extreme braking events and Harsh braking events.
site-based approach was used as the position was only
measured at a few discrete nodes of the route network. It
was concluded that the random forest algorithm provides       3.2. Data for forecasting model
the best prediction results.
                                                              In this research, logistic transportation data was reviewed
    Ma et al. [6] proposed a tree-based integration method
                                                              and a dataset was created for the ETA forecasting models.
to predict traffic accidents by using different data vari-
                                                              The initial dataset includes 1758 observations and 13
ables. Predictions of the gradient boosting decision
                                                              variables. The obtained information is from August 21,
tree algorithm outperformed back propagation neural
                                                              2020 to January 24, 2022. A set of explanatory variables X,
networks, support vector machines, and random forest.
                                                              with vectors 𝑥1 , 𝑥2 , . . . , 𝑥13 , is obtained. The description
However, in this study, the nonlinear relationship be-
                                                              of these variables is presented in Table 1.
tween the influence characteristics and the predicted
value was not analysed.
    To improve travel time predictions, the author of [7]     Table 1
                                                              Set of explanatory variables
study applied the combination of random forest and gra-
dient boosting regression tree (GBRT) models. The aim         Variable     Description                         Variable type
was to study how reducing a large volume of raw GPS           𝑥1           Driver’s score                      Ordinal
data into a set of feature data affects high-quality travel   𝑥2           Tour beginning country              Categorical
time predictions. Only travel time observations from          𝑥3           Tour ending country                 Categorical
the previous departure time intervals were found to be        𝑥4           Number of intermediate stops        Discrete
                                                              𝑥5           Furthest country                    Categorical
beneficial features and were recommended by the author
                                                              𝑥6           Tour beginning month                Categorical
to be used as inputs when no other types of real-time         𝑥7           Tour beginning day                  Categorical
information (e.g. traffic flow, speed) is available. Also,    𝑥8           Vehicle height                      Continuous
it is noted, that trees in GBRT models were found to be       𝑥9           Vehicle width                       Continuous
consistently much shorter than those of random forest         𝑥10          Vehicle length                      Continuous
models, leading to shorter computation times.                 𝑥11          Vehicle weight                      Continuous
    To sum up, characterizing attributes of drivers in re-    𝑥12          Hours of service breaks             Continuous
search are usually derivative – data obtained from vehi-      𝑥13          Planned distance                    Continuous
cle monitoring devices represent their driving behaviour.
Since MCDM methods were found popular for conduct-              Let 𝑇𝑖 be the factual time when the 𝑖th cargo will be
ing ranking procedures, two easily comparable MCDM            delivered, and 𝑡𝑖 the planned time of delivery for the 𝑖th
methods will be used: TOPSIS and VIKOR. What con-             cargo. Then, the deviation from the planned time of deliv-
cerns the ETA problem, the accuracy of the ML models          ery for the 𝑖th cargo ∆𝑡𝑖 will be denoted as the difference
tested in reviewed research was inconsistent. Therefore,      between the planned and factual time of delivery:
a wide variety of ML methods suitable for the problem
and available data will be evaluated. Namely, decision                               ∆𝑡𝑖 = 𝑇𝑖 − 𝑡𝑖 ,                       (1)
tree, random forest, XGBoost, Support Vector Machine
(SVM) and KNN methods, as well as an ensemble of mod-         where 𝑖 = 1, 2, . . . , 𝑛. In that case, the explanatory
els, will be tested.                                          variable is denoted as:

                                                                                        𝑦𝑖 = ∆𝑡𝑖 .                         (2)

                                                              This variable is the goal of the forecasting problem.
4. Methodology                                                     4.2. TOPSIS method
                                                                   The basic principle of the TOPSIS method is that the cho-
4.1. VIKOR method
                                                                   sen alternative should have the shortest distance from the
The VIKOR method was introduced as one applicable                  ideal solution and the farthest distance from the negative-
technique to implement within MCDM. It focuses on                  ideal solution [9]. The TOPSIS procedure consists of the
ranking and selecting from a set of alternatives in the            following steps:
presence of conflicting criteria, and on proposing a com-              1. Calculate the normalized decision matrix. The
promise solution (one or more). The compromise ranking                    normalized value 𝑟𝑖𝑗 is calculated as
algorithm VIKOR has the following steps [8]:
    1. Determine the best 𝑓𝑖* and the worst 𝑓𝑖− values                                               𝑓𝑖𝑗
                                                                                           𝑟𝑖𝑗 = √︁∑︀                  ,    (9)
        of all criterion functions, 𝑖 = 1, 2, . . . , 𝑛. If the                                          𝐽    2
                                                                                                         𝑗=1 𝑓𝑖𝑗
        𝑖th function represents a benefit then:
                                                                          where 𝑗 = 1, . . . , 𝐽; 𝑖 = 1, . . . , 𝑛.
                𝑓𝑖* = max𝑓𝑖𝑗 ,              𝑓𝑖− = min𝑓𝑖𝑗 .   (3)
                            𝑗                        𝑗                 2. Calculate the weighted normalized decision ma-
    2. Compute the values 𝑆𝑗 and 𝑅𝑗 , 𝑗 = 1, 2, . . . , 𝐽,                trix. The weighted normalized value 𝑣𝑖𝑗 is calcu-
       by the relations                                                   lated as
                                   𝑛
                                                                                             𝑣𝑖𝑗 = 𝑤𝑖 · 𝑟𝑖𝑗 ,          (10)
                                  𝑓𝑖* − 𝑓𝑖𝑗
                                                                          where 𝑗 = 1, . . . , 𝐽; 𝑖 = 1, . . . , 𝑛, 𝑤𝑖 is
                                  ∑︁
                      𝑆𝑗 =             𝑤𝑖   ,           (4)
                                  𝑓𝑖* − 𝑓𝑖−                               the
                           𝑖=1
                                                                          ∑︀𝑛weight of the 𝑖th attribute or criterion, and
                                  𝑓𝑖* − 𝑓𝑖𝑗                                 𝑖=1 𝑤𝑖 = 1.
                              [︂             ]︂
                   𝑅𝑗 = max 𝑤𝑖 *                ,       (5)            3. Determine the ideal and negative-ideal solution.
                          𝑖       𝑓𝑖 − 𝑓𝑖−
       where 𝑤𝑖 are the weights of criteria, expressing                     𝐴* = {(max 𝑣𝑖𝑗 |𝑖 ∈ 𝐼 ′ ), (min 𝑣𝑖𝑗 |𝑖 ∈ 𝐼 ′′ )},
                                                                                       𝑗                       𝑗
       their relative importance.                                                                                          (11)
    3. Compute the values 𝑄𝑗 , 𝑗 = 1, 2, . . . , 𝐽, by the                  𝐴− = {(min 𝑣𝑖𝑗 |𝑖 ∈ 𝐼 ′ ), (max 𝑣𝑖𝑗 |𝑖 ∈ 𝐼 ′′ )},
       relation                                                                        𝑗                       𝑗
                                                                                                                           (12)
                    𝑆𝑗 − 𝑆 *             𝑅𝑗 − 𝑅 *
          𝑄𝑗 = 𝑣 −          *
                              + (1 − 𝑣) −           , (6)                 where 𝐼 ′ is associated with benefit criteria, and
                   𝑆 −𝑆                 𝑅 − 𝑅*                            𝐼 ′′ is associated with cost criteria.
       where                                                           4. Calculate the separation measures, using the 𝑛-
               𝑆 * = min𝑆𝑗 ,           𝑆 − = max𝑆𝑗 ,                      dimensional Euclidean distance. The separation
                        𝑗                       𝑗
                                                             (7)          of each alternative from the ideal solution is given
               𝑅* = min𝑅𝑗 ,            𝑅− = max𝑅𝑗 ,                       as
                        𝑗                        𝑗
                                                                                              ⎯
                                                                                              ⎸ 𝑛
       and 𝑣 is introduced as weight of the strategy of                                   *
                                                                                              ⎸∑︁
                                                                                        𝐷𝑗 = ⎷ (𝑣𝑖𝑗 − 𝑣𝑖* )2 ,             (13)
       "the majority of criteria" (or "the maximum group                                           𝑖=1
       utility"), here 𝑣 = 0.5.                                           where 𝑗 = 1, . . . , 𝐽.
    4. Rank the alternatives, sorting by the values 𝑆, 𝑅                  Similarly, the separation from the negative-ideal
       and 𝑄, in decreasing order. The results are three                  solution is given as
       ranking lists.
    5. Propose as a compromise solution the alternative
                                                                                               ⎯
                                                                                               ⎸ 𝑛
                                                                                               ⎸∑︁
       (𝑎′ ) which is ranked the best by the measure 𝑄                                𝐷𝑗− = ⎷ (𝑣𝑖𝑗 − 𝑣𝑖− )2 ,          (14)
       (minimum) if the following two conditions are                                               𝑖=1
       satisfied:
                                                                          where 𝑗 = 1, . . . , 𝐽.
         C1. "Acceptable advantage":                                   5. Calculate the relative closeness to the ideal solu-
                                𝑄(𝑎′′ ) − 𝑄(𝑎′ ) ⩾ 𝐷𝑄        (8)          tion. The relative closeness of the alternative 𝑎𝑗
                                                                          with respect to 𝐴* is defined as
              where 𝑎 is the alternative with second
                            ′′

              position in the ranking list by 𝑄; 𝐷𝑄 =                                                    𝐷𝑗−
              1/(𝐽 − 1); 𝐽 is the number of alternatives.                                  𝐶𝑗* =                   ,       (15)
                                                                                                   𝐷𝑗* + 𝐷𝑗−
          C2. "Acceptable stability in decision-making":
              Alternative 𝑎′ must also be the best ranked                 where 𝑗 = 1, . . . , 𝐽.
              by 𝑆 or/and 𝑅. Here, 𝑣 is the weight of the              6. Rank the preference order.
              decision-making strategy "the majority of                   Items 𝐶𝑗 are ordered in descending order. The
              criteria".                                                  highest number indicates the best solution.
4.3. Decision Tree                                             surpasses other ML algorithms by solving many data sci-
                                                               ence problems faster and more accurately than its coun-
Decision tree is a type of supervised learning algorithm
                                                               terparts. Also, this algorithm has additional protection
that can be used in both regression and classification prob-
                                                               from overfitting.
lems [10]. Each node is related to an attribute, whereas
                                                                  The objective function to be optimized is given by
the leaves of the tree represent the final solution as the
result of combining values of the attributes.                                             𝑛                   𝐾
   The splitting process is stopped after a particular stop-
                                                                                        ∑︁                   ∑︁
                                                                           𝑜𝑏𝑗(𝜃) =           𝑙(𝑦𝑖 , 𝑦ˆ𝑖 ) +     Ω(𝑓𝑘 ),            (16)
ping criterion is met. For example, a given threshold for                               𝑖=1                  𝑘=1
the minimum number of observations left in a node being
reached or a given threshold for the minimum change in where 𝑛 is the number of               ∑︀iterations,   𝑙(𝑦𝑖 , 𝑦ˆ𝑖 ) is the train-
the impurity measure not succeeding any more by any            ing  loss  function,  𝑦
                                                                                     ˆ 𝑖  =      𝐾
                                                                                                 𝑘=1    is the number of trees, Ω
variable can be a stopping criterion [11].                     is the regularization term, 𝑓𝑘 ∈ ℱ , and ℱ is the set of
   Let 𝐿 be the initial dataset made out of training sam- possible classification and regression trees. (𝑡)
ples with known dependent variable values. At first, the          Writing the prediction value at step 𝑡 as 𝑦ˆ𝑡 , gives
tree will be made of only a root node 𝑡1 which represents                           𝑡
the full set of variables. The objective is to split the nodes           𝑦ˆ𝑡 (𝑡) =
                                                                                   ∑︁
                                                                                         𝑓𝑘 (𝑥𝑖 ) = 𝑦ˆ𝑡 (𝑡−1) + 𝑓𝑡 (𝑥𝑖 ).           (17)
into two decision nodes until a terminal node is reached,                          𝑘=1
for example splitting 𝐿 into 𝑡𝐿 and 𝑡𝑅 , then splitting 𝑡𝐿
and 𝑡𝑅 into further sub-nodes until a stopping criterion          Next, a tree which optimizes our objective is chosen.
is met [12].
                                                                                 𝑛
                                                                                ∑︁                 ∑︁ 𝑡
                                                                   𝑜𝑏𝑗 (𝑡) =       𝑙(𝑦𝑖 , 𝑦ˆ𝑖 ) +         Ω(𝑓𝑖 ) =
4.4. Random Forest                                                                  𝑖=1               𝑖=1
                                                                                     𝑛
Random forest is a ML algorithm that constructs a multi-                   ∑︁
                                                                        =      𝑙(𝑦𝑖 , 𝑦ˆ𝑖 (𝑡−1) + 𝑓𝑡 (𝑥𝑖 )) + Ω(𝑓𝑡 ) + 𝑏,
tude of decision trees at training time. The main principle
                                                                           𝑖=1
of constructing a random forest is that it is formed by                                                                 (18)
combining solutions from binary decision trees made
using diverse subsets of the original dataset and subsets where 𝑏 is a constant.
containing randomly selected features from the feature         To minimize the probability of overfitting, the com-
set.                                                         plexity of the tree Ω(𝑓 ) is defined as
   Constructing small decision trees that only have a few
features takes up only a little of the processing time,                                              𝑇
                                                                                                1 ∑︁ 2
hence the majority of such trees’ solutions can be com-                      Ω(𝑓 ) = 𝛾𝑇 + 𝜆             𝑤𝑗 ,            (19)
                                                                                                2 𝑗=1
bined into a single strong classifier.
   Steps for constructing a random forest as pre- where
sented in [10] are as follows:
     1. First, assume that the number of cases in the train-    𝑓𝑡 (𝑥) = 𝑤𝑞(𝑥) , 𝑤 ∈ 𝑅𝑇 , 𝑞 : 𝑅𝑑 → {1, 2, . . . , 𝑇 }.
        ing set is 𝐾. Then, take a random sample of these                                                               (20)
        𝐾 cases, and use this sample as the training set     Here   𝑤 is the vector   of  scores on  leaves, 𝑞 is a function
        for constructing the tree.                           assigning each data point to the corresponding leaf, and
                                                             𝑇 is the number of leaves.
     2. If there are 𝑝 input variables, specify a number
        𝑚 < 𝑝 such that at each node, 𝑚 random vari-
        ables out of the 𝑝 can be selected. The best split 4.6. Support Vector Machine
        on these 𝑚 is used to split the node.
                                                             The goal of SVM is to find the maximum separating hy-
     3. Each tree is subsequently grown to the largest perplane that would have the maximum distance between
        extent possible and no pruning is needed.            the nearest training data objects [14]. A separating hy-
     4. Aggregate the predictions of the target trees to perplane can be written as:
        predict new data.
     5. Finally, a decision is made by the majority rule.                            WX + 𝑏 = 0,                        (21)

                                                                     where W is a weight vector, namely, W =
4.5. XGBoost
                                                                     {𝑤1 , 𝑤2 , . . . , 𝑤𝑛 }; X is a set of training data made of 𝑝
XGBoost is a ML algorithm that implements frameworks                 number of objects, 𝑛 number of attributes and an associ-
based on Gradient Boosted Decision Trees [13]. XGBoost               ated class label 𝑦𝑖 ; and 𝑏 is a scalar constant.
   The distance between hyperplanes, denoted as                4.8. Model evaluation metrics
2/||W||, has to be maximal. Consequently, this means
                                                               The significance of regression in a model is usually cal-
that ||W|| (the Euclidean norm of the vector W) has to
                                                               culated using the coefficient of determination [16]:
be minimized. To simplify calculations, the Euclidean                                        √︃
norm ||W|| can be swapped for ||W||2 /2. Thus, the ob-                                              𝜎 2 𝑟𝑒𝑠
jective function for this optimization problem is defined                   𝑅𝑦𝑥1 𝑥2 ...𝑥𝑘 = 1 − 2 ,                 (27)
                                                                                                     𝜎 𝑦
as:
                            1                                  where 𝜎 2 𝑟𝑒𝑠 is a residual dispersion from a forecast
                      min ||W||2 ,                    (22)
                       W,𝑏 2                                   model, 𝜎 2 𝑦 – dispersion of 𝑦.
                                                                  However, the adjusted coefficient of determination
         𝑦𝑖 (WX𝑇𝑖 + 𝑏) ≥ 1,         𝑖 = 1, . . . , 𝑝,   (23)
                                                               𝑅2 𝑎𝑑𝑗 is better suited for comparing regression models
where the constraint (23) ensures that all objects from        as it avoids the inaccuracy, caused by numerous factors
the training dataset will be positioned on the correct side    in the coefficient of determination [16]:
of the appropriate marginal hyperplane.                                                               𝑛−1
   The Lagrange multiplier strategy allows combining                      𝑅2 𝑎𝑑𝑗 = 1 − (1 − 𝑅2 ) ·             ,     (28)
                                                                                                    𝑛−𝑘−1
these two conditions into one:
                                                               where 𝑛 is the number of observations available for anal-
                            𝑝                                  ysis, 𝑘 is the number of variables.
           {︃                                          }︃
              1            ∑︁
 min max        ||W||2 −       𝛼𝑖 [𝑦𝑖 (WX𝑇𝑖 − 𝑏) − 1] .           Moreover, statisticians are used to measuring accuracy
 W,𝑏 𝛼≥0      2            𝑖=1                                 by computing mean square error (MSE), or its square
                                                       (24)    root conventionally abbreviated by RMSE (for root mean
   Kernel functions are used when the training dataset         square error). The latter is in the same units as the
needs to be transformed into a higher-dimensional space        measured variable and so is a better descriptive statistic.
due to the data being linearly inseparable.                    Moreover, it is the most popular evaluation metric used
                                                               in regression problems. RMSE follows an assumption
  𝐾(X𝑖 , X𝑗 ) = 𝜑(X𝑖 )𝜑(X𝑗 )𝑇 , ∀X𝑖 , X𝑗 ∈ X. (25)
                                                               that errors are unbiased and follow a normal distribution.
In this study, the Gaussian radial basis function kernel       RMSE metric is given by [17]:
was used:
                                                                                        ⎯
                                                                                        ⎸      𝑛
                                                                                        ⎸ 1 ∑︁
                                            2                               𝑅𝑀 𝑆𝐸 = ⎷            (𝑆𝑖 − 𝑂𝑖 )2 ,       (29)
             𝐾(X𝑖 , X𝑗 ) = e−𝛾||X𝑖 −X𝑗 || ,                                                𝑛 𝑖=1

where the 𝛾 value is derived from the following equation: where 𝑂𝑖 are the observations, 𝑆𝑖 are the predicted val-
                                                          ues of a variable.
          1                                                 Moreover, the average magnitude of the forecast errors
             ≈ MED𝑖,𝑗=1,...,𝑝 (||X𝑖 − X𝑗 ||).        (26)
          𝛾                                               can be measured by the mean absolute error:
                                                                                     𝑛
Here MED is the median. Usually parameter 𝛾 is found                              1 ∑︁
                                                                         𝑀 𝐴𝐸 =         |𝑆𝑖 − 𝑂𝑖 |.           (30)
through trial and error.                                                          𝑛 𝑖=1
                                                           In this case, the direction of errors is not being considered
4.7. 𝑘 -Nearest Neighbours                                 [17].
                                                              There are various ways to improve models depending
The KNN algorithm is a method based on objects likeness on the technique involved. The most popular way is
[15]. In other words, the principle is to find the prede- to construct ensemble models. Once there are multiple
fined number (𝑘) of training samples closest to the new models that produce a score for a particular outcome,
point. In the case of regression, the relationship between they can be combined to produce ensemble scores. For
the explanatory variables and the continuous dependent example, a new score can be calculated as the average
variable is approximated by estimating the average of the of two classifiers and then assess it as a further model.
observations, which together form the so-called neigh- Usually, the area under the curve improves for these
bourhood. Its size is determined using cross-validation ensemble models.
while minimizing the root mean square error.
   The Euclidean distance was used to calculate the dis-
tance between objects.                                     5. Results
                                                               5.1. Driver ranking
                                                               Selection of attribute weights. To begin the ranking
                                                               procedure, first attribute weights had to be established.
Since the drivers must be ranked in compliance with at- Table 3
tribute priorities dictated by a company, their importance Score distribution with different weight sets
was evaluated by an expert on a scale from 1 to 10. This is
presented in Table 2. Thus, we get the first set of weights:                       TOPSIS         VIKOR
                                                                         Score    𝑊1     𝑊2     𝑊1     𝑊2
Table 2                                                                     10    325    316    156    148
The first set of weights                                                     9     62     71    107    120
                                                                             8      6      7     67     63
                                                                             7      4      3     28     31
 Attribute                     Score   Weight     min/max                    6      0      0     16     12
 Free rolling distance             5   0.09         max                      5      0      0      6      8
 Engine overloaded distance       10   0.19         min                      4      1      1      9      6
 Highest gear distance             7   0.13         max                      3      0      0      3      5
 Excess idling                    10   0.19         min                      2      0      0      5      4
 Overspeeding time                10   0.19         min                      1      0      0      1      1
 Extreme braking events            8   0.15         min
 Harsh braking events              4   0.07         min
                                                         tested: decision tree, random forest, XGBoost, support
                                                         vector machine (SVM) and 𝑘-Nearest neighbours (KNN).
        𝑊1 = 𝑤1 + 3𝑤2 + 𝑤3 + 𝑤4 + 𝑤5 = 1,
                                                            Quantitative variables were normalized using min-max
where 𝑤1 = 0.09 is the weight of Free rolling distance,  normalization,   while the categorical variables were trans-
𝑤2 = 0.19 is the weight of Engine overloaded distance,   formed  and  added   to the models by replacing them with
Excess idling and Overspeeding time, 𝑤3 = 0.13 is the binary dummy variables.
weight of Highest gear distance, 𝑤4 = 0.15 is the weight    Therefore, when applying the random selection and
of Extreme braking events, 𝑤5 = 0.07 is the weight of    assignment   of the set indices to the test and training sets,
Harsh braking events.                                    75% of the dataset was assigned to the training sample and
   However, since the importance of attributes can be 25% to the test sample. Cross-validation was used for the
biased, a baseline weight model was also tested.         selection of optimal parameters in all five models. During
                                                         this procedure in the regression models, the sample data
                    𝑊2 = 7𝑤1 = 1,                        was divided into 10 groups.
                                                            Further, the optimal parameters of all models are de-
where 𝑤1 = 1/7 is the weight of each attribute.          termined:
                                                                 1. The decision tree model. The minimum num-
Results of ranking methods. Ranking of the drivers
                                                                    ber of observations that can be in a node was set
was performed using the generated sets of weights. Cri-
                                                                    to be seven. Furthermore, if a node is to be split,
terion values, computed by TOPSIS (𝐶𝑖 ) and VIKOR (𝑄)
                                                                    the minimum number of observations per node
methods, were used to rank the drivers. However, in
                                                                    has to be 20.
many instances the difference between two values of the
                                                                    A total of 11 splits were made. The hours of service
same criteria had been minute, hence the values were
                                                                    breaks, the tour beginning day, the beginning, end-
grouped. Values were grouped using a ten-point system.
                                                                    ing and furthest countries, as well as the planned
   Considering the results of different ranking methods
                                                                    tour distance impacted the creation of the model.
presented in Table 3, the method with the most logi-
cal ranking of the drivers was confirmed to be with the          2. The random forest model. The optimal number
VIKOR method. In addition, the first weight set should be           of randomly selected variables in each division
used when creating a dataset for the forecasting models,            of the random forest was set to 59 (this value had
since it would comply with the attribute priorities from            been determined based on a precision measure).
the company and no significant difference was observed              Whereas the number of trees is a basic size of 500.
between the two tested sets of weights.                          3. The XGBoost model. An optimal model is de-
                                                                    termined by the lowest value obtained for the
                                                                    RMSE error. The maximum tree depth value of
5.2. Forecasting models                                             0.3 was obtained. The higher this value is, the
For improving the forecast of ETA, it was enough to                 more complicated the model becomes. Also, the
forecast deviation from planned duration, because this              ratio of partial sample training cases is 0.75. In
variable had already been computed by routing service.              other words, the XGBoost method randomly se-
In that case, the goal was to forecast the deviation from           lects 75% of the training dataset prior to growing
planned tour duration. Overall five ML methods were                 the trees, which in return protects from an over-
       load of data. Such partial selection of a sample           𝑦ˆ𝑖𝑠𝑣𝑚 – of the SVM model. The adjusted coefficient of
       occurs once per each iteration. The maximum                determination of this ensemble model resulted in 0.7672.
       number of repetitions was determined to be 150.               Another way to form an ensemble of models is by using
    4. The 𝑘-Nearest neighbour model. The 𝑘 param-                the weighted sum method. In this type of ensemble, the
       eter for KNN model was established to be equal to          prediction value of the better model (in this case the
       4. This value was selected by changing the param-          random forest model) is determined to have a weighting
       eter value from 1 to 10 and determining which              coefficient 𝑐1 , that is less than 1, but not less than 0.5.
       parameter has the smallest RMSE value. The Eu-             However, the total amount of weights must be equal
       clidean distance measure was used to estimate              to one, therefore, the weighting coefficient of the other
       the distance between the points.                           model (SVM model) 𝑐2 shall be greater than zero, but less
    5. The SVM model. The number of support vectors               than 0.5. Then, the new predicted value could then be
       was established to be 1008 and the Gaussian radial         obtained as follows:
       basis function kernel was used.
                                                                                 𝑦ˆ𝑖 = 𝑐1 · 𝑦ˆ𝑖𝑟𝑓 + 𝑐2 · 𝑦ˆ𝑖𝑠𝑣𝑚 .          (32)
   The accuracy of all five constructed models was evalu-
ated by predicting values of deviation from planned tour          The equation
time for unseen test set data. The adjusted coefficient of                                𝑐1 = 1 − 𝑐2                      (33)
determination 𝑅𝑎𝑑𝑗 2
                      , RMSE, and MEA were calculated to          must be met, thus (32) can be written as:
determine and compare the suitability of the prediction
models. The results are presented in Table 4. It can be                       𝑦ˆ𝑖 = (1 − 𝑐2 ) · 𝑦ˆ𝑖𝑟𝑓 + 𝑐2 · 𝑦ˆ𝑖𝑠𝑣𝑚 .      (34)
seen that for all three accuracy measures, the best results
for predicting test data were obtained using a random for-           In order to find with which weight the adjusted coef-
est model, where the mean absolute difference between             ficient of determination of the ensemble model obtains
the predicted and actual values had been almost 684, the          the highest value, the value of 𝑐2 was being changed
square root of the average squared differences between            from 0.01, 0.02, 0.03, and so on to 0.49. The experiment
the predicted and actual values had been 1 120.15, and the        resulted in a maximum value of 𝑅𝑎𝑑𝑗   2
                                                                                                              (0.7795) when 𝑐2
adjusted coefficient of determination had been 77.57%.            was equal to 0.2. The second way of constructing an
XGBoost model yielded quite similar results, where 𝑅𝑎𝑑𝑗2          ensemble model resulted in a higher 𝑅𝑎𝑑𝑗    2
                                                                                                                  than the first
had been lower by only 3.3%, the MAE error had been               method, hence, the second was more suitable. There-
higher by 93.24 units and the RMSE had been higher by             fore, the expression of the final ensemble model was as
90.83. The worst prediction results were obtained using           follows:
KNN method, for which 𝑅𝑎𝑑𝑗   2
                                had been less than 25%.                         𝑦ˆ𝑖 = 0.8 · 𝑦ˆ𝑖𝑟𝑓 + 0.2 · 𝑦ˆ𝑖𝑠𝑣𝑚 .         (35)
                                                                  The forecast graph of the created ensemble model is pre-
Table 4                                                           sented in Fig. 1. Some outliers remained poorly predicted,
The accuracy of regression models                                 but the overall prediction is accurate. Metrics evaluating
      Model               𝑅2 𝑎𝑑𝑗       RMSE      MAE
      Decision tree       0.6666       1391.18   792.56
      Random forest       0.7757       1120.15   683.94
      XGBoost             0.7427       1210.98   777.18
      SVM                 0.6726       1408.97   867.82
      KNN                 0.2465       2105.53   1208.22

   A possibility to improve the models by forming an en-
semble model was observed, hence a decision was made
to try a combination of predictions from two models:
the random forest and SVM. Several combinations were
made. The first method estimated the average of the
forecasts of both models:
                                                                  Figure 1: Predicted values of the model ensemble
                          𝑦ˆ𝑖𝑟𝑓 + 𝑦ˆ𝑖𝑠𝑣𝑚
                  𝑦ˆ𝑖 =                     ,              (31)
                                   2                              the created ensemble model are presented in Table 5. In
where 𝑦ˆ𝑖 is the predicted value of the 𝑖th observation           comparison to the results of individual models (Table 4),
for the model ensemble, 𝑦ˆ𝑖𝑟𝑓 are the predicted values            higher accuracy could be observed in all three metrics of
of the 𝑖th observation of the random forest model and             the ensemble model. Nonetheless, the improvement in
                                                                  accuracy was not significant: the adjusted coefficient of
determination was higher than the best individual model        [3] S. Opricovic, G.-H. Tzeng, Compromise solu-
by only 0.38%, the RMSE was lower by 1.87, and MAE                 tion by mcdm methods: A comparative anal-
was lower by 0.47.                                                 ysis of vikor and topsis,            European Jour-
                                                                   nal of Operational Research 156 (2004) 445–
Table 5                                                            455. URL: https://www.sciencedirect.com/science/
The accuracy of the model                                          article/pii/S0377221703000201. doi:https://doi.
                                                                   org/10.1016/S0377-2217(03)00020-1.
      Model                 𝑅2 𝑎𝑑𝑗   RMSE      MAE
                                                               [4] I. Parolas, Eta prediction for containerships at
      Ensemble model        0.7795   1118.28   683.47
                                                                   the port of rotterdam using machine learning tech-
                                                                   niques, 2016.
                                                               [5] X. Sun, H. Zhang, F. Tian, L. Yang, The use of a
6. Conclusions                                                     machine learning method to predict the real-time
                                                                   link travel time of open-pit trucks, Mathematical
In this research, in order to improve the forecast of              Problems in Engineering 2018 (2018) 1–14. doi:10.
contemporary ETA, the possibility to rank the drivers              1155/2018/4368045.
based on their behaviour data and predict deviations from      [6] X. Ma, C. Ding, L. Sen, Y. Wang, Y. Wang, Priori-
planned arrival time using different ML methods were               tizing influential factors for freeway incident clear-
analysed. For this purpose, a dataset consisting of vehicle        ance time prediction using the gradient boosting
monitoring data was used for ranking the drivers with              decision trees method, IEEE Transactions on Intel-
TOPSIS and VIKOR methods. It was found that the results            ligent Transportation Systems 18 (2017) 2303–2310.
of the VIKOR method with the company’s attribute im-               doi:10.1109/TITS.2016.2635719.
portance weight set produced the most suitable drivers’        [7] X. Li, R. Bai, P. O. Siebers, C. Wagner, Travel time
scores. Then, these scores were used to supplement a               prediction in transport and logistics: Towards more
new dataset constructed for ML methods. Moreover, five             efficient vehicle gps data management using tree en-
methods: decision tree, random forest, XGBoost, SVM,               semble methods, VINE Journal of Information and
KNN, were used to create the deviation from the planned            Knowledge Management Systems 49 (2019) 277–
tour duration forecasting model. Finally, the ensemble             306. doi:10.1108/VJIKMS-11-2018-0102.
model based on the random forest and SVM resulted in           [8] D. Servaiṫe, R. Juozaitieṅe, T. Krilavičius,
the most accurate results (𝑅𝑎𝑑𝑗
                              2
                                  = 77.95%).                       Logistics service provider selection using
   In the future, it is planned to continue the construc-          topsis and vikor methods, 2020. URL: https:
tion of the improved ETA prediction model by including             //www.vdu.lt/cris/bitstream/20.500.12259/110824/
real-world parameters that a vehicle takes into account            2/ISSN1613-0073_2020_V_2698.PG_86-91.pdf.
while driving a certain route. For example, the need to        [9] S. Opricovic, G.-H. Tzeng, Compromise solu-
stop for mandatory driving breaks or filling up would be           tion by mcdm methods: A comparative anal-
considered.                                                        ysis of vikor and topsis,            European Jour-
                                                                   nal of Operational Research 156 (2004) 445–
                                                                   455. URL: https://www.sciencedirect.com/science/
Acknowledgments                                                    article/pii/S0377221703000201. doi:https://doi.
                                                                   org/10.1016/S0377-2217(03)00020-1.
We thank Tadas Motiejūnas, Darius Ežerskis, Viktorija        [10] J. Le, Decision trees in r, DataCamp (2018).
Naudužaiṫe, Donatas Šumyla and UAB Ruptela (https:                https://www.datacamp.com/community/tutorials/
//www.ruptela.com/) for cooperation and useful insights.           decision-trees-R.
                                                              [11] C. Strobl, J. Malley, G. Tutz, An introduction to
References                                                         recursive partitioning: Rationale, application, and
                                                                   characteristics of classification and regression trees,
 [1] Z. Wang, K. Fu, J. Ye, Learning to estimate the travel        bagging, and random forests, Psychological meth-
     time, 2018, pp. 858–866. doi:10.1145/3219819.                 ods 14 (2009) 323–48. doi:10.1037/a0016973.
     3219900.                                                 [12] G. Norkevičius, G. Raškinis, Garsu˛ trukṁes mod-
 [2] Z. Constantinescu, C. Marinoiu, M. Vladoiu, Driv-             eliavimas, klasifikavimo ir regresijos medžiais nau-
     ing style analysis using data mining techniques,              dojant didel̇es apimties garsyną, in: Informaciṅes
     International Journal of Computers, Communica-                technologijos 2007, 2006, pp. 52–66.
     tions & Control (IJCCC) V (2010) 654–663. doi:10.        [13] xgboost developers, xgboost Release 1.2.0-
     15837/ijccc.2010.5.2221.                                      SNAPSHOT, 2020. URL: https://buildmedia.
                                                                   readthedocs.org/media/pdf/xgboost/latest/
                                                                   xgboost.pdf.
[14] J. Han, M. Kamber, J. Pei, Data Mining: Concepts
     and Techniques, Second Edition (The Morgan Kauf-
     mann Series in Data Management Systems), 2 ed.,
     Morgan Kaufmann, 2006.
[15] Murni, R. Kosasih, A. F. Oeoen, T. Handhika, I. Sari,
     D. Lestari, Travel time estimation for destination
     in bali using knn-regression method with tensor-
     flow, IOP Conference Series: Materials Science
     and Engineering 854 (2020) 012061. doi:10.1088/
     1757-899X/854/1/012061.
[16] H. Altland, Regression analysis: Statistical mod-
     eling of a response variable,       Technometrics
     41 (2012) 367–368. doi:10.1080/00401706.1999.
     10485936.
[17] C. Chatfield,              Time-series forecast-
     ing,      Significance 2 (2005) 131–133. URL:
     https://rss.onlinelibrary.wiley.com/doi/abs/10.
     1111/j.1740-9713.2005.00117.x. doi:https://doi.
     org/10.1111/j.1740-9713.2005.00117.x.