A p pli c ati o n of m a c hi n e le a r ni n g f o r im p r o vi n g c o nt e m p o r a r y E T A fo r e c a sti n g G a bri el ė K as p ut yt ė 1, 3 ,† , Ar n as M at us e vi či us 1, 3 ,† a n d T o m as Kril a vi či us 2, 3 ,* 1 V yt a ut as M a g n us U ni versit y, F ac ult y of I nf or m atics, De p art me nt of M at he m atics a n d St atistics, Vil ei k os street 8, L T- 4 4 4 0 4 K a u n as, Lit h u a ni a 2 V yt a ut as M a g n us U ni versit y, F ac ult y of I nf or m atics, De p art me nt of A p plie d I nf or m atics, Vil ei k os street 8, L T- 4 4 4 0 4 K a u n as, Lit h u a ni a 3 C e ntr e f or A p plie d Rese arc h a n d De vel o p me nt, Lit h u a ni a A bstr a ct T h e p o p ul arit y of r o a d tr a ns p ort is gr o wi n g e v er hi g h er i n t h e m o d er n d a y of a g e a n d is t h e m ost p o p ul ar m o d e f or tr a ns p orti n g g o o ds [1 ]. Us u all y, tr a ns p ort ati o n i n cr e as es t h e pri m e c ost of t h e g o o ds or s er vi c es, t h us, t o i n cr e as e a c o m p a n y’s pr o fit, e ff e cti v e v e hi cl e r o uti n g b e c o m es e ss e nti al. I n t his r es e ar c h, w e h a v e a n al ys e d t h e p ossi bilit y of i m pr o vi n g t h e f or e c ast of esti m at e d ti m e of arri v al b y r a n ki n g t h e dri v ers b as e d o n t h eir b e h a vi o ur d at a a n d esti m ati n g d e vi ati o ns fr o m pl a n n e d arri v al ti m e usi n g di ff er e nt m a c hi n e l e ar ni n g m et h o ds. T h e r a n ki n g w as p erf or m e d wit h T O P SI S a n d VI K O R m et h o ds, w hil e t h e f or e c asti n g w as p erf or m e d usi n g fi v e m a c hi n e l e ar ni n g al g orit h ms: d e cisi o n tr e e, r a n d o m f or est, X G B o ost, S u p p ort Ve ct or M a c hi n e a n d k - N e ar est N ei g h b o urs. T h e p erf or m a n c e of t h e f or e c asti n g m o d els w as e v al u at e d usi n g t h e a dj ust e d c o e ffi ci e nt of d et er mi n ati o n, r o ot s q u ar e m e a n err or a n d m e a n a bs ol ut e err or m etri cs. It w as c o n cl u d e d t h at t h e VI K O R m et h o d s h o ul d b e us e d t o r a n k t h e dri v ers. M or e o v er, r es e ar c h r es ults r e v e al e d t h at t h e b est f or e c asti n g p erf or m a n c e w as a c hi e v e d usi n g a n e ns e m bl e m o d el b as e d o n r a n d o m f or est a n d S u p p ort Ve ct or M a c hi n e m o d els. K e y w or ds Esti m at e d ti m e of arri v al ( E T A), fr ei g ht tr a ns p ort, d ri v er’s s c or e, ra n ki n g, m a c hi n e l e ar ni n g, T O P SI S, VI K O R, D e cisi o n Tr e e, R a n d o m F or est, X G B o ost, S u p p ort Ve ct or M a c hi n e, k - N e ar est N ei g h b o urs 1. I nt r o d u cti o n tr o d u c es t h e d at as ets us e d i n t h e c urr e nt st u d y. S e cti o n 4 pr es e nts t h e s el e ct e d r a n ki n g a n d f or e c asti n g t e c h ni q u es. T o i m pr o v e t h e tr a ns p ar e n c y of a s u p pl y c h ai n, its p ar- E x p eri m e nt al r es ults ar e pr o vi d e d i n S e cti o n 5 . Fi n all y, ti ci p a nts us e tr a ns p ort m a n a g e m e nt s yst e ms as w ell as c o n cl u di n g r e m ar ks a n d f ut ur e pl a ns ar e dis c uss e d i n tr a c ki n g a n d m o nit ori n g s yst e ms. Ve hi cl es ar e g etti n g S e cti o n 6 . e q ui p p e d wit h str o n g er mi cr o pr o c ess ors, l ar g er m e m or y c a p a cit y a n d r e al-ti m e o p er ati n g s yst e ms. T h e n e wl y i nst all e d t e c h n ol o gi c al pl atf or ms c a n us e m or e a d v a n c e d 2. R el at e d w o r k a p pli c ati o ns of t h e o p er ati n g s yst e m, i n cl u di n g m o d el- b as e d pr o c ess c o ntr ol f u n cti o ns, arti fi ci al i nt elli g e n c e, T o b ett er u n d erst a n d w h at attri b ut es a n d m et h o ds c a n b e a n d c o m pr e h e nsi v e c o m p ut ati o n. us e d f or r a n ki n g dri v ers, r es e ar c h [2 ] w as st u di e d. T h e T h er ef or e, a n i n cr e asi n g tr e n d of i m pl e m e nti n g t h e ai m of t his r es e ar c h w as t o c at e g ori z e dri v ers a c c or d- l at est d e v el o p m e nts i n i nf or m ati o n a n d i nt er n et t e c h- i n g t o t h eir ris k- pr o n e n ess b y a n al ysi n g a G P S- b as e d n ol o gi es i n t h e tr a ns p ort s e ct or as w ell as a t e n d e n c y of d e vi c e’s ur b a n tr a ffi c d at a. Hi er ar c hi c al Cl ust eri n g Al g o- d e v el o pi n g n e w r es e ar c h- b as e d s yst e ms t h at c a n q ui c kl y rit h m ( H C A) a n d Pri n ci p al C o m p o n e nt A n al ysis ( P C A) a d a pt t o a n e v er- c h a n gi n g e n vir o n m e nt is o bs er v e d. T h e w er e us e d f or t h e st atisti c al a n al ysis of t h e f oll o wi n g dri v- g o al of t his st u d y is t o i d e ntif y m et h o ds, b est s uit e d f or i n g p ar a m et ers: S p e e d o ver 6 0 k m/ h , S p e e d , Acc el er ati o n , s ol vi n g t h e esti m at e d ti m e of arri v al ( E T A) f or e c asti n g P ositi ve acc el er ati o n , Br a ki n g a n d M ec h a nic al w or k . T h e pr o bl e m. a ut h ors of t his r es e ar c h c o n cl u d e t h at w hil e it is p ossi bl e T h e r est of t h e p a p er is or g a ni z e d as f oll o ws. R el at e d t o cl assif y t h e dri v ers a c c or di n g t o t h es e p ar a m et ers, a w or k i n t his ar e a is pr es e nt e d i n S e cti o n 2 . S e cti o n 3 i n- l ot of e xt er n al f a ct ors, s u c h as t h e dri vi n g e n vir o n m e nt or t h e c o n diti o n of t h e ass ess e d dri v er, ar e n ot t a k e n i nt o I V U S 2 0 2 2: 2 7t h I nter n ati o n al C o nfere nce o n I nf or m ati o n Tec h n ol o g y, a c c o u nt. M a y 1 2, 2 0 2 2, K a u n as, Lit h u a ni a A c o m p ar ati v e a n al ysis of t w o m ulti- crit eri a d e cisi o n- * C orr es p o n di n g a ut h or. m a ki n g ( M C D M) m et h o ds T O P SI S a n d VI K O R is pr e- † T h es e a ut h ors c o ntri b ut e d e q u all y. s e nt e d i n [3 ]. It w as f o u n d t h at t h es e t w o m et h o ds us e g a bri el e. k as p ut yt e @ v d u.lt ( G. K as p ut yṫe); di ff er e nt ki n ds of n or m ali z ati o n ( T O P SI S us es v e ct or n or- ar n as. m at us e vi ci us @ v d u.lt ( A. M at us e vi či us); m ali z ati o n, VI K O R – li n e ar) t o eli mi n at e t h e u nits of t o m as. kril a vi ci us @ v d u.lt ( T. Kril a vi či us) 0 0 0 0- 0 0 0 1- 8 5 0 9- 4 2 0 X ( T. Kril a vi či us) crit eri o n f u n cti o ns a n d us e di ff er e nt a g gr e g ati n g f u n c- © 2 0 2 2 C o p yri g ht f or t his p a p er b y its a ut h ors. Us e p er mitt e d u n d er Cr e ati v e C o m m o ns Li c e ns e Attri b uti o n 4. 0 I nt er n ati o n al ( C C B Y 4. 0). ti o ns f or r a n ki n g. N o n et h el ess, b ot h m et h o ds w er e f o u n d CE U R W or k s h o p Pr o c e e di n g s htt p:// c e ur- w s. or g I S S N 1 6 1 3- 0 0 7 3 C E U R W or ks h o p Pr o c e e di n gs ( C E U R- W S. or g ) s uit a bl e f or s ol vi n g r a n ki n g pr o bl e ms. CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings The use of machine learning (ML) techniques for the 3. Research data ETA problem is discussed in [4] The authors applied artificial neural networks (ANNs) and support vector re- 3.1. Data for ranking drivers gression (SVR) to predict the time of arrival of container The available readings from a vehicle monitoring system ships. Distance to the destination, the timestamp, geo- that can be used to evaluate a driver’s behaviour were coordinates, and weather information have been chosen extracted. The readings in this research cover the period as features. It was shown that SVR had performed better from August 21, 2020, to January 1, 2022, and up to 398 than ANNs and that the weather data did not have a sig- observations representing different vehicles. nificant impact on estimating the time of vessel arrivals. A dataset was constructed containing values for 7 at- In the [5] study, the 𝑘-Nearest neighbours (KNN), SVR, tributes, namely Free-rolling distance, Engine overloaded and the random forest algorithms were evaluated as meth- distance, Highest gear distance, Excess idling, Overspeeding ods for predicting the arrival time of open-pit trucks. A time, Extreme braking events and Harsh braking events. site-based approach was used as the position was only measured at a few discrete nodes of the route network. It was concluded that the random forest algorithm provides 3.2. Data for forecasting model the best prediction results. In this research, logistic transportation data was reviewed Ma et al. [6] proposed a tree-based integration method and a dataset was created for the ETA forecasting models. to predict traffic accidents by using different data vari- The initial dataset includes 1758 observations and 13 ables. Predictions of the gradient boosting decision variables. The obtained information is from August 21, tree algorithm outperformed back propagation neural 2020 to January 24, 2022. A set of explanatory variables X, networks, support vector machines, and random forest. with vectors 𝑥1 , 𝑥2 , . . . , 𝑥13 , is obtained. The description However, in this study, the nonlinear relationship be- of these variables is presented in Table 1. tween the influence characteristics and the predicted value was not analysed. To improve travel time predictions, the author of [7] Table 1 Set of explanatory variables study applied the combination of random forest and gra- dient boosting regression tree (GBRT) models. The aim Variable Description Variable type was to study how reducing a large volume of raw GPS 𝑥1 Driver’s score Ordinal data into a set of feature data affects high-quality travel 𝑥2 Tour beginning country Categorical time predictions. Only travel time observations from 𝑥3 Tour ending country Categorical the previous departure time intervals were found to be 𝑥4 Number of intermediate stops Discrete 𝑥5 Furthest country Categorical beneficial features and were recommended by the author 𝑥6 Tour beginning month Categorical to be used as inputs when no other types of real-time 𝑥7 Tour beginning day Categorical information (e.g. traffic flow, speed) is available. Also, 𝑥8 Vehicle height Continuous it is noted, that trees in GBRT models were found to be 𝑥9 Vehicle width Continuous consistently much shorter than those of random forest 𝑥10 Vehicle length Continuous models, leading to shorter computation times. 𝑥11 Vehicle weight Continuous To sum up, characterizing attributes of drivers in re- 𝑥12 Hours of service breaks Continuous search are usually derivative – data obtained from vehi- 𝑥13 Planned distance Continuous cle monitoring devices represent their driving behaviour. Since MCDM methods were found popular for conduct- Let 𝑇𝑖 be the factual time when the 𝑖th cargo will be ing ranking procedures, two easily comparable MCDM delivered, and 𝑡𝑖 the planned time of delivery for the 𝑖th methods will be used: TOPSIS and VIKOR. What con- cargo. Then, the deviation from the planned time of deliv- cerns the ETA problem, the accuracy of the ML models ery for the 𝑖th cargo ∆𝑡𝑖 will be denoted as the difference tested in reviewed research was inconsistent. Therefore, between the planned and factual time of delivery: a wide variety of ML methods suitable for the problem and available data will be evaluated. Namely, decision ∆𝑡𝑖 = 𝑇𝑖 − 𝑡𝑖 , (1) tree, random forest, XGBoost, Support Vector Machine (SVM) and KNN methods, as well as an ensemble of mod- where 𝑖 = 1, 2, . . . , 𝑛. In that case, the explanatory els, will be tested. variable is denoted as: 𝑦𝑖 = ∆𝑡𝑖 . (2) This variable is the goal of the forecasting problem. 4. Methodology 4.2. TOPSIS method The basic principle of the TOPSIS method is that the cho- 4.1. VIKOR method sen alternative should have the shortest distance from the The VIKOR method was introduced as one applicable ideal solution and the farthest distance from the negative- technique to implement within MCDM. It focuses on ideal solution [9]. The TOPSIS procedure consists of the ranking and selecting from a set of alternatives in the following steps: presence of conflicting criteria, and on proposing a com- 1. Calculate the normalized decision matrix. The promise solution (one or more). The compromise ranking normalized value 𝑟𝑖𝑗 is calculated as algorithm VIKOR has the following steps [8]: 1. Determine the best 𝑓𝑖* and the worst 𝑓𝑖− values 𝑓𝑖𝑗 𝑟𝑖𝑗 = √︁∑︀ , (9) of all criterion functions, 𝑖 = 1, 2, . . . , 𝑛. If the 𝐽 2 𝑗=1 𝑓𝑖𝑗 𝑖th function represents a benefit then: where 𝑗 = 1, . . . , 𝐽; 𝑖 = 1, . . . , 𝑛. 𝑓𝑖* = max𝑓𝑖𝑗 , 𝑓𝑖− = min𝑓𝑖𝑗 . (3) 𝑗 𝑗 2. Calculate the weighted normalized decision ma- 2. Compute the values 𝑆𝑗 and 𝑅𝑗 , 𝑗 = 1, 2, . . . , 𝐽, trix. The weighted normalized value 𝑣𝑖𝑗 is calcu- by the relations lated as 𝑛 𝑣𝑖𝑗 = 𝑤𝑖 · 𝑟𝑖𝑗 , (10) 𝑓𝑖* − 𝑓𝑖𝑗 where 𝑗 = 1, . . . , 𝐽; 𝑖 = 1, . . . , 𝑛, 𝑤𝑖 is ∑︁ 𝑆𝑗 = 𝑤𝑖 , (4) 𝑓𝑖* − 𝑓𝑖− the 𝑖=1 ∑︀𝑛weight of the 𝑖th attribute or criterion, and 𝑓𝑖* − 𝑓𝑖𝑗 𝑖=1 𝑤𝑖 = 1. [︂ ]︂ 𝑅𝑗 = max 𝑤𝑖 * , (5) 3. Determine the ideal and negative-ideal solution. 𝑖 𝑓𝑖 − 𝑓𝑖− where 𝑤𝑖 are the weights of criteria, expressing 𝐴* = {(max 𝑣𝑖𝑗 |𝑖 ∈ 𝐼 ′ ), (min 𝑣𝑖𝑗 |𝑖 ∈ 𝐼 ′′ )}, 𝑗 𝑗 their relative importance. (11) 3. Compute the values 𝑄𝑗 , 𝑗 = 1, 2, . . . , 𝐽, by the 𝐴− = {(min 𝑣𝑖𝑗 |𝑖 ∈ 𝐼 ′ ), (max 𝑣𝑖𝑗 |𝑖 ∈ 𝐼 ′′ )}, relation 𝑗 𝑗 (12) 𝑆𝑗 − 𝑆 * 𝑅𝑗 − 𝑅 * 𝑄𝑗 = 𝑣 − * + (1 − 𝑣) − , (6) where 𝐼 ′ is associated with benefit criteria, and 𝑆 −𝑆 𝑅 − 𝑅* 𝐼 ′′ is associated with cost criteria. where 4. Calculate the separation measures, using the 𝑛- 𝑆 * = min𝑆𝑗 , 𝑆 − = max𝑆𝑗 , dimensional Euclidean distance. The separation 𝑗 𝑗 (7) of each alternative from the ideal solution is given 𝑅* = min𝑅𝑗 , 𝑅− = max𝑅𝑗 , as 𝑗 𝑗 ⎯ ⎸ 𝑛 and 𝑣 is introduced as weight of the strategy of * ⎸∑︁ 𝐷𝑗 = ⎷ (𝑣𝑖𝑗 − 𝑣𝑖* )2 , (13) "the majority of criteria" (or "the maximum group 𝑖=1 utility"), here 𝑣 = 0.5. where 𝑗 = 1, . . . , 𝐽. 4. Rank the alternatives, sorting by the values 𝑆, 𝑅 Similarly, the separation from the negative-ideal and 𝑄, in decreasing order. The results are three solution is given as ranking lists. 5. Propose as a compromise solution the alternative ⎯ ⎸ 𝑛 ⎸∑︁ (𝑎′ ) which is ranked the best by the measure 𝑄 𝐷𝑗− = ⎷ (𝑣𝑖𝑗 − 𝑣𝑖− )2 , (14) (minimum) if the following two conditions are 𝑖=1 satisfied: where 𝑗 = 1, . . . , 𝐽. C1. "Acceptable advantage": 5. Calculate the relative closeness to the ideal solu- 𝑄(𝑎′′ ) − 𝑄(𝑎′ ) ⩾ 𝐷𝑄 (8) tion. The relative closeness of the alternative 𝑎𝑗 with respect to 𝐴* is defined as where 𝑎 is the alternative with second ′′ position in the ranking list by 𝑄; 𝐷𝑄 = 𝐷𝑗− 1/(𝐽 − 1); 𝐽 is the number of alternatives. 𝐶𝑗* = , (15) 𝐷𝑗* + 𝐷𝑗− C2. "Acceptable stability in decision-making": Alternative 𝑎′ must also be the best ranked where 𝑗 = 1, . . . , 𝐽. by 𝑆 or/and 𝑅. Here, 𝑣 is the weight of the 6. Rank the preference order. decision-making strategy "the majority of Items 𝐶𝑗 are ordered in descending order. The criteria". highest number indicates the best solution. 4.3. Decision Tree surpasses other ML algorithms by solving many data sci- ence problems faster and more accurately than its coun- Decision tree is a type of supervised learning algorithm terparts. Also, this algorithm has additional protection that can be used in both regression and classification prob- from overfitting. lems [10]. Each node is related to an attribute, whereas The objective function to be optimized is given by the leaves of the tree represent the final solution as the result of combining values of the attributes. 𝑛 𝐾 The splitting process is stopped after a particular stop- ∑︁ ∑︁ 𝑜𝑏𝑗(𝜃) = 𝑙(𝑦𝑖 , 𝑦ˆ𝑖 ) + Ω(𝑓𝑘 ), (16) ping criterion is met. For example, a given threshold for 𝑖=1 𝑘=1 the minimum number of observations left in a node being reached or a given threshold for the minimum change in where 𝑛 is the number of ∑︀iterations, 𝑙(𝑦𝑖 , 𝑦ˆ𝑖 ) is the train- the impurity measure not succeeding any more by any ing loss function, 𝑦 ˆ 𝑖 = 𝐾 𝑘=1 is the number of trees, Ω variable can be a stopping criterion [11]. is the regularization term, 𝑓𝑘 ∈ ℱ , and ℱ is the set of Let 𝐿 be the initial dataset made out of training sam- possible classification and regression trees. (𝑡) ples with known dependent variable values. At first, the Writing the prediction value at step 𝑡 as 𝑦ˆ𝑡 , gives tree will be made of only a root node 𝑡1 which represents 𝑡 the full set of variables. The objective is to split the nodes 𝑦ˆ𝑡 (𝑡) = ∑︁ 𝑓𝑘 (𝑥𝑖 ) = 𝑦ˆ𝑡 (𝑡−1) + 𝑓𝑡 (𝑥𝑖 ). (17) into two decision nodes until a terminal node is reached, 𝑘=1 for example splitting 𝐿 into 𝑡𝐿 and 𝑡𝑅 , then splitting 𝑡𝐿 and 𝑡𝑅 into further sub-nodes until a stopping criterion Next, a tree which optimizes our objective is chosen. is met [12]. 𝑛 ∑︁ ∑︁ 𝑡 𝑜𝑏𝑗 (𝑡) = 𝑙(𝑦𝑖 , 𝑦ˆ𝑖 ) + Ω(𝑓𝑖 ) = 4.4. Random Forest 𝑖=1 𝑖=1 𝑛 Random forest is a ML algorithm that constructs a multi- ∑︁ = 𝑙(𝑦𝑖 , 𝑦ˆ𝑖 (𝑡−1) + 𝑓𝑡 (𝑥𝑖 )) + Ω(𝑓𝑡 ) + 𝑏, tude of decision trees at training time. The main principle 𝑖=1 of constructing a random forest is that it is formed by (18) combining solutions from binary decision trees made using diverse subsets of the original dataset and subsets where 𝑏 is a constant. containing randomly selected features from the feature To minimize the probability of overfitting, the com- set. plexity of the tree Ω(𝑓 ) is defined as Constructing small decision trees that only have a few features takes up only a little of the processing time, 𝑇 1 ∑︁ 2 hence the majority of such trees’ solutions can be com- Ω(𝑓 ) = 𝛾𝑇 + 𝜆 𝑤𝑗 , (19) 2 𝑗=1 bined into a single strong classifier. Steps for constructing a random forest as pre- where sented in [10] are as follows: 1. First, assume that the number of cases in the train- 𝑓𝑡 (𝑥) = 𝑤𝑞(𝑥) , 𝑤 ∈ 𝑅𝑇 , 𝑞 : 𝑅𝑑 → {1, 2, . . . , 𝑇 }. ing set is 𝐾. Then, take a random sample of these (20) 𝐾 cases, and use this sample as the training set Here 𝑤 is the vector of scores on leaves, 𝑞 is a function for constructing the tree. assigning each data point to the corresponding leaf, and 𝑇 is the number of leaves. 2. If there are 𝑝 input variables, specify a number 𝑚 < 𝑝 such that at each node, 𝑚 random vari- ables out of the 𝑝 can be selected. The best split 4.6. Support Vector Machine on these 𝑚 is used to split the node. The goal of SVM is to find the maximum separating hy- 3. Each tree is subsequently grown to the largest perplane that would have the maximum distance between extent possible and no pruning is needed. the nearest training data objects [14]. A separating hy- 4. Aggregate the predictions of the target trees to perplane can be written as: predict new data. 5. Finally, a decision is made by the majority rule. WX + 𝑏 = 0, (21) where W is a weight vector, namely, W = 4.5. XGBoost {𝑤1 , 𝑤2 , . . . , 𝑤𝑛 }; X is a set of training data made of 𝑝 XGBoost is a ML algorithm that implements frameworks number of objects, 𝑛 number of attributes and an associ- based on Gradient Boosted Decision Trees [13]. XGBoost ated class label 𝑦𝑖 ; and 𝑏 is a scalar constant. The distance between hyperplanes, denoted as 4.8. Model evaluation metrics 2/||W||, has to be maximal. Consequently, this means The significance of regression in a model is usually cal- that ||W|| (the Euclidean norm of the vector W) has to culated using the coefficient of determination [16]: be minimized. To simplify calculations, the Euclidean √︃ norm ||W|| can be swapped for ||W||2 /2. Thus, the ob- 𝜎 2 𝑟𝑒𝑠 jective function for this optimization problem is defined 𝑅𝑦𝑥1 𝑥2 ...𝑥𝑘 = 1 − 2 , (27) 𝜎 𝑦 as: 1 where 𝜎 2 𝑟𝑒𝑠 is a residual dispersion from a forecast min ||W||2 , (22) W,𝑏 2 model, 𝜎 2 𝑦 – dispersion of 𝑦. However, the adjusted coefficient of determination 𝑦𝑖 (WX𝑇𝑖 + 𝑏) ≥ 1, 𝑖 = 1, . . . , 𝑝, (23) 𝑅2 𝑎𝑑𝑗 is better suited for comparing regression models where the constraint (23) ensures that all objects from as it avoids the inaccuracy, caused by numerous factors the training dataset will be positioned on the correct side in the coefficient of determination [16]: of the appropriate marginal hyperplane. 𝑛−1 The Lagrange multiplier strategy allows combining 𝑅2 𝑎𝑑𝑗 = 1 − (1 − 𝑅2 ) · , (28) 𝑛−𝑘−1 these two conditions into one: where 𝑛 is the number of observations available for anal- 𝑝 ysis, 𝑘 is the number of variables. {︃ }︃ 1 ∑︁ min max ||W||2 − 𝛼𝑖 [𝑦𝑖 (WX𝑇𝑖 − 𝑏) − 1] . Moreover, statisticians are used to measuring accuracy W,𝑏 𝛼≥0 2 𝑖=1 by computing mean square error (MSE), or its square (24) root conventionally abbreviated by RMSE (for root mean Kernel functions are used when the training dataset square error). The latter is in the same units as the needs to be transformed into a higher-dimensional space measured variable and so is a better descriptive statistic. due to the data being linearly inseparable. Moreover, it is the most popular evaluation metric used in regression problems. RMSE follows an assumption 𝐾(X𝑖 , X𝑗 ) = 𝜑(X𝑖 )𝜑(X𝑗 )𝑇 , ∀X𝑖 , X𝑗 ∈ X. (25) that errors are unbiased and follow a normal distribution. In this study, the Gaussian radial basis function kernel RMSE metric is given by [17]: was used: ⎯ ⎸ 𝑛 ⎸ 1 ∑︁ 2 𝑅𝑀 𝑆𝐸 = ⎷ (𝑆𝑖 − 𝑂𝑖 )2 , (29) 𝐾(X𝑖 , X𝑗 ) = e−𝛾||X𝑖 −X𝑗 || , 𝑛 𝑖=1 where the 𝛾 value is derived from the following equation: where 𝑂𝑖 are the observations, 𝑆𝑖 are the predicted val- ues of a variable. 1 Moreover, the average magnitude of the forecast errors ≈ MED𝑖,𝑗=1,...,𝑝 (||X𝑖 − X𝑗 ||). (26) 𝛾 can be measured by the mean absolute error: 𝑛 Here MED is the median. Usually parameter 𝛾 is found 1 ∑︁ 𝑀 𝐴𝐸 = |𝑆𝑖 − 𝑂𝑖 |. (30) through trial and error. 𝑛 𝑖=1 In this case, the direction of errors is not being considered 4.7. 𝑘 -Nearest Neighbours [17]. There are various ways to improve models depending The KNN algorithm is a method based on objects likeness on the technique involved. The most popular way is [15]. In other words, the principle is to find the prede- to construct ensemble models. Once there are multiple fined number (𝑘) of training samples closest to the new models that produce a score for a particular outcome, point. In the case of regression, the relationship between they can be combined to produce ensemble scores. For the explanatory variables and the continuous dependent example, a new score can be calculated as the average variable is approximated by estimating the average of the of two classifiers and then assess it as a further model. observations, which together form the so-called neigh- Usually, the area under the curve improves for these bourhood. Its size is determined using cross-validation ensemble models. while minimizing the root mean square error. The Euclidean distance was used to calculate the dis- tance between objects. 5. Results 5.1. Driver ranking Selection of attribute weights. To begin the ranking procedure, first attribute weights had to be established. Since the drivers must be ranked in compliance with at- Table 3 tribute priorities dictated by a company, their importance Score distribution with different weight sets was evaluated by an expert on a scale from 1 to 10. This is presented in Table 2. Thus, we get the first set of weights: TOPSIS VIKOR Score 𝑊1 𝑊2 𝑊1 𝑊2 Table 2 10 325 316 156 148 The first set of weights 9 62 71 107 120 8 6 7 67 63 7 4 3 28 31 Attribute Score Weight min/max 6 0 0 16 12 Free rolling distance 5 0.09 max 5 0 0 6 8 Engine overloaded distance 10 0.19 min 4 1 1 9 6 Highest gear distance 7 0.13 max 3 0 0 3 5 Excess idling 10 0.19 min 2 0 0 5 4 Overspeeding time 10 0.19 min 1 0 0 1 1 Extreme braking events 8 0.15 min Harsh braking events 4 0.07 min tested: decision tree, random forest, XGBoost, support vector machine (SVM) and 𝑘-Nearest neighbours (KNN). 𝑊1 = 𝑤1 + 3𝑤2 + 𝑤3 + 𝑤4 + 𝑤5 = 1, Quantitative variables were normalized using min-max where 𝑤1 = 0.09 is the weight of Free rolling distance, normalization, while the categorical variables were trans- 𝑤2 = 0.19 is the weight of Engine overloaded distance, formed and added to the models by replacing them with Excess idling and Overspeeding time, 𝑤3 = 0.13 is the binary dummy variables. weight of Highest gear distance, 𝑤4 = 0.15 is the weight Therefore, when applying the random selection and of Extreme braking events, 𝑤5 = 0.07 is the weight of assignment of the set indices to the test and training sets, Harsh braking events. 75% of the dataset was assigned to the training sample and However, since the importance of attributes can be 25% to the test sample. Cross-validation was used for the biased, a baseline weight model was also tested. selection of optimal parameters in all five models. During this procedure in the regression models, the sample data 𝑊2 = 7𝑤1 = 1, was divided into 10 groups. Further, the optimal parameters of all models are de- where 𝑤1 = 1/7 is the weight of each attribute. termined: 1. The decision tree model. The minimum num- Results of ranking methods. Ranking of the drivers ber of observations that can be in a node was set was performed using the generated sets of weights. Cri- to be seven. Furthermore, if a node is to be split, terion values, computed by TOPSIS (𝐶𝑖 ) and VIKOR (𝑄) the minimum number of observations per node methods, were used to rank the drivers. However, in has to be 20. many instances the difference between two values of the A total of 11 splits were made. The hours of service same criteria had been minute, hence the values were breaks, the tour beginning day, the beginning, end- grouped. Values were grouped using a ten-point system. ing and furthest countries, as well as the planned Considering the results of different ranking methods tour distance impacted the creation of the model. presented in Table 3, the method with the most logi- cal ranking of the drivers was confirmed to be with the 2. The random forest model. The optimal number VIKOR method. In addition, the first weight set should be of randomly selected variables in each division used when creating a dataset for the forecasting models, of the random forest was set to 59 (this value had since it would comply with the attribute priorities from been determined based on a precision measure). the company and no significant difference was observed Whereas the number of trees is a basic size of 500. between the two tested sets of weights. 3. The XGBoost model. An optimal model is de- termined by the lowest value obtained for the RMSE error. The maximum tree depth value of 5.2. Forecasting models 0.3 was obtained. The higher this value is, the For improving the forecast of ETA, it was enough to more complicated the model becomes. Also, the forecast deviation from planned duration, because this ratio of partial sample training cases is 0.75. In variable had already been computed by routing service. other words, the XGBoost method randomly se- In that case, the goal was to forecast the deviation from lects 75% of the training dataset prior to growing planned tour duration. Overall five ML methods were the trees, which in return protects from an over- load of data. Such partial selection of a sample 𝑦ˆ𝑖𝑠𝑣𝑚 – of the SVM model. The adjusted coefficient of occurs once per each iteration. The maximum determination of this ensemble model resulted in 0.7672. number of repetitions was determined to be 150. Another way to form an ensemble of models is by using 4. The 𝑘-Nearest neighbour model. The 𝑘 param- the weighted sum method. In this type of ensemble, the eter for KNN model was established to be equal to prediction value of the better model (in this case the 4. This value was selected by changing the param- random forest model) is determined to have a weighting eter value from 1 to 10 and determining which coefficient 𝑐1 , that is less than 1, but not less than 0.5. parameter has the smallest RMSE value. The Eu- However, the total amount of weights must be equal clidean distance measure was used to estimate to one, therefore, the weighting coefficient of the other the distance between the points. model (SVM model) 𝑐2 shall be greater than zero, but less 5. The SVM model. The number of support vectors than 0.5. Then, the new predicted value could then be was established to be 1008 and the Gaussian radial obtained as follows: basis function kernel was used. 𝑦ˆ𝑖 = 𝑐1 · 𝑦ˆ𝑖𝑟𝑓 + 𝑐2 · 𝑦ˆ𝑖𝑠𝑣𝑚 . (32) The accuracy of all five constructed models was evalu- ated by predicting values of deviation from planned tour The equation time for unseen test set data. The adjusted coefficient of 𝑐1 = 1 − 𝑐2 (33) determination 𝑅𝑎𝑑𝑗 2 , RMSE, and MEA were calculated to must be met, thus (32) can be written as: determine and compare the suitability of the prediction models. The results are presented in Table 4. It can be 𝑦ˆ𝑖 = (1 − 𝑐2 ) · 𝑦ˆ𝑖𝑟𝑓 + 𝑐2 · 𝑦ˆ𝑖𝑠𝑣𝑚 . (34) seen that for all three accuracy measures, the best results for predicting test data were obtained using a random for- In order to find with which weight the adjusted coef- est model, where the mean absolute difference between ficient of determination of the ensemble model obtains the predicted and actual values had been almost 684, the the highest value, the value of 𝑐2 was being changed square root of the average squared differences between from 0.01, 0.02, 0.03, and so on to 0.49. The experiment the predicted and actual values had been 1 120.15, and the resulted in a maximum value of 𝑅𝑎𝑑𝑗 2 (0.7795) when 𝑐2 adjusted coefficient of determination had been 77.57%. was equal to 0.2. The second way of constructing an XGBoost model yielded quite similar results, where 𝑅𝑎𝑑𝑗2 ensemble model resulted in a higher 𝑅𝑎𝑑𝑗 2 than the first had been lower by only 3.3%, the MAE error had been method, hence, the second was more suitable. There- higher by 93.24 units and the RMSE had been higher by fore, the expression of the final ensemble model was as 90.83. The worst prediction results were obtained using follows: KNN method, for which 𝑅𝑎𝑑𝑗 2 had been less than 25%. 𝑦ˆ𝑖 = 0.8 · 𝑦ˆ𝑖𝑟𝑓 + 0.2 · 𝑦ˆ𝑖𝑠𝑣𝑚 . (35) The forecast graph of the created ensemble model is pre- Table 4 sented in Fig. 1. Some outliers remained poorly predicted, The accuracy of regression models but the overall prediction is accurate. Metrics evaluating Model 𝑅2 𝑎𝑑𝑗 RMSE MAE Decision tree 0.6666 1391.18 792.56 Random forest 0.7757 1120.15 683.94 XGBoost 0.7427 1210.98 777.18 SVM 0.6726 1408.97 867.82 KNN 0.2465 2105.53 1208.22 A possibility to improve the models by forming an en- semble model was observed, hence a decision was made to try a combination of predictions from two models: the random forest and SVM. Several combinations were made. The first method estimated the average of the forecasts of both models: Figure 1: Predicted values of the model ensemble 𝑦ˆ𝑖𝑟𝑓 + 𝑦ˆ𝑖𝑠𝑣𝑚 𝑦ˆ𝑖 = , (31) 2 the created ensemble model are presented in Table 5. In where 𝑦ˆ𝑖 is the predicted value of the 𝑖th observation comparison to the results of individual models (Table 4), for the model ensemble, 𝑦ˆ𝑖𝑟𝑓 are the predicted values higher accuracy could be observed in all three metrics of of the 𝑖th observation of the random forest model and the ensemble model. Nonetheless, the improvement in accuracy was not significant: the adjusted coefficient of determination was higher than the best individual model [3] S. Opricovic, G.-H. Tzeng, Compromise solu- by only 0.38%, the RMSE was lower by 1.87, and MAE tion by mcdm methods: A comparative anal- was lower by 0.47. ysis of vikor and topsis, European Jour- nal of Operational Research 156 (2004) 445– Table 5 455. URL: https://www.sciencedirect.com/science/ The accuracy of the model article/pii/S0377221703000201. doi:https://doi. org/10.1016/S0377-2217(03)00020-1. Model 𝑅2 𝑎𝑑𝑗 RMSE MAE [4] I. Parolas, Eta prediction for containerships at Ensemble model 0.7795 1118.28 683.47 the port of rotterdam using machine learning tech- niques, 2016. [5] X. Sun, H. Zhang, F. Tian, L. Yang, The use of a 6. Conclusions machine learning method to predict the real-time link travel time of open-pit trucks, Mathematical In this research, in order to improve the forecast of Problems in Engineering 2018 (2018) 1–14. doi:10. contemporary ETA, the possibility to rank the drivers 1155/2018/4368045. based on their behaviour data and predict deviations from [6] X. Ma, C. Ding, L. Sen, Y. Wang, Y. Wang, Priori- planned arrival time using different ML methods were tizing influential factors for freeway incident clear- analysed. For this purpose, a dataset consisting of vehicle ance time prediction using the gradient boosting monitoring data was used for ranking the drivers with decision trees method, IEEE Transactions on Intel- TOPSIS and VIKOR methods. It was found that the results ligent Transportation Systems 18 (2017) 2303–2310. of the VIKOR method with the company’s attribute im- doi:10.1109/TITS.2016.2635719. portance weight set produced the most suitable drivers’ [7] X. Li, R. Bai, P. O. Siebers, C. Wagner, Travel time scores. Then, these scores were used to supplement a prediction in transport and logistics: Towards more new dataset constructed for ML methods. Moreover, five efficient vehicle gps data management using tree en- methods: decision tree, random forest, XGBoost, SVM, semble methods, VINE Journal of Information and KNN, were used to create the deviation from the planned Knowledge Management Systems 49 (2019) 277– tour duration forecasting model. Finally, the ensemble 306. doi:10.1108/VJIKMS-11-2018-0102. model based on the random forest and SVM resulted in [8] D. Servaiṫe, R. Juozaitieṅe, T. Krilavičius, the most accurate results (𝑅𝑎𝑑𝑗 2 = 77.95%). Logistics service provider selection using In the future, it is planned to continue the construc- topsis and vikor methods, 2020. URL: https: tion of the improved ETA prediction model by including //www.vdu.lt/cris/bitstream/20.500.12259/110824/ real-world parameters that a vehicle takes into account 2/ISSN1613-0073_2020_V_2698.PG_86-91.pdf. while driving a certain route. For example, the need to [9] S. Opricovic, G.-H. Tzeng, Compromise solu- stop for mandatory driving breaks or filling up would be tion by mcdm methods: A comparative anal- considered. ysis of vikor and topsis, European Jour- nal of Operational Research 156 (2004) 445– 455. URL: https://www.sciencedirect.com/science/ Acknowledgments article/pii/S0377221703000201. doi:https://doi. org/10.1016/S0377-2217(03)00020-1. We thank Tadas Motiejūnas, Darius Ežerskis, Viktorija [10] J. Le, Decision trees in r, DataCamp (2018). Naudužaiṫe, Donatas Šumyla and UAB Ruptela (https: https://www.datacamp.com/community/tutorials/ //www.ruptela.com/) for cooperation and useful insights. decision-trees-R. [11] C. Strobl, J. Malley, G. Tutz, An introduction to References recursive partitioning: Rationale, application, and characteristics of classification and regression trees, [1] Z. Wang, K. Fu, J. Ye, Learning to estimate the travel bagging, and random forests, Psychological meth- time, 2018, pp. 858–866. doi:10.1145/3219819. ods 14 (2009) 323–48. doi:10.1037/a0016973. 3219900. [12] G. Norkevičius, G. Raškinis, Garsu˛ trukṁes mod- [2] Z. Constantinescu, C. Marinoiu, M. Vladoiu, Driv- eliavimas, klasifikavimo ir regresijos medžiais nau- ing style analysis using data mining techniques, dojant didel̇es apimties garsyną, in: Informaciṅes International Journal of Computers, Communica- technologijos 2007, 2006, pp. 52–66. tions & Control (IJCCC) V (2010) 654–663. doi:10. [13] xgboost developers, xgboost Release 1.2.0- 15837/ijccc.2010.5.2221. SNAPSHOT, 2020. URL: https://buildmedia. readthedocs.org/media/pdf/xgboost/latest/ xgboost.pdf. [14] J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques, Second Edition (The Morgan Kauf- mann Series in Data Management Systems), 2 ed., Morgan Kaufmann, 2006. [15] Murni, R. Kosasih, A. F. Oeoen, T. Handhika, I. Sari, D. Lestari, Travel time estimation for destination in bali using knn-regression method with tensor- flow, IOP Conference Series: Materials Science and Engineering 854 (2020) 012061. doi:10.1088/ 1757-899X/854/1/012061. [16] H. Altland, Regression analysis: Statistical mod- eling of a response variable, Technometrics 41 (2012) 367–368. doi:10.1080/00401706.1999. 10485936. [17] C. Chatfield, Time-series forecast- ing, Significance 2 (2005) 131–133. URL: https://rss.onlinelibrary.wiley.com/doi/abs/10. 1111/j.1740-9713.2005.00117.x. doi:https://doi. org/10.1111/j.1740-9713.2005.00117.x.