=Paper=
{{Paper
|id=Vol-2542/MOD-KI2
|storemode=property
|title=A Methodology for Assisting Customers Service Selection in Freight Transportation by Data Analytics
|pdfUrl=https://ceur-ws.org/Vol-2542/MOD-KI2.pdf
|volume=Vol-2542
|authors=Aysel Biyik,Bruno Albert Neumann-Saavedra,Dirk Christian Mattfeld
|dblpUrl=https://dblp.org/rec/conf/modellierung/BiyikNM20
}}
==A Methodology for Assisting Customers Service Selection in Freight Transportation by Data Analytics==
Joint Proceedings of Modellierung 2020 Short, Workshop and Tools & Demo Papers 144 Workshop on Models in AI A Methodology for Assisting Customers Service Selection in Freight Transportation by Data Analytics Aysel Biyik,1 Bruno Albert Neumann-Saavedra,1 Dirk Christian Mattfeld1 Abstract: Over the past several decades, logistics activities have been sourced by freight logistics providers (FLPs). However, by the development of digital technologies, digital freight platforms (DFPs) has started to substitute FLPs. Due to the complexity of the freight transportation domain, it is challenging to provide customer-oriented delivery solutions. In such a complex process, which involves many actors, it is necessary to develop a methodology to assist customer service selection. This methodology should be able to represent the multi-modal network adequately with respect to customer preferences, learn from customer’s past choices via data and generate accurate and quick solutions to satisfy them. Therefore, this paper proposes a methodology that uses data analytics, predictive analytics, to estimate the cost in multi-modal freight transportation concerning customer preferences. Given some customer preferences, the proposed methodology is able to match a variety of services as closely as possible. Keywords: digital transformation; digital freight platforms; multi-modal freight transportation; customer expectations; data analytics; predictive analytics 1 Introduction In recent years, digital platform companies have emerged as a key player in the logistics market to fulfil customer expectations more efficiently. In the field of freight transportation, digital freight platforms (DFPs) play the role of intermediate between sender, carriers, and customers. DFPs aim to substitute human interactions with an automated process for offering convenient multi-modal transport services to stakeholders. Indeed, the acceptance of such DFPs is rapidly growing to such an extent that they have been starting to replace traditional freight logistics providers (FLPs). For instance, digital platforms such as Cargomatic [1] enable cost-efficient, real-time, and on-demand arrangements of transports that cut into the domain of logistics services. Reports indicate that FLPs focusing on standardized services such as transportation and warehousing are likely to lose significant market share to new transport technologies and customized service solutions, see for example [2]. For a more detailed discussion about the impact of digital platforms on FLPs, the reader is referred to [3]. One major challenge in adopting DFPs is to meet customer expectations of service by automatic procedures rather than human expertise in such a complex process, which involves 1 Technische Universität Braunschweig, Decision Support Group, Mühlenpfordtstraße 23, 38106 Braunschweig a.biyik@tu-braunschweig.de Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Assisting Customers Service Selection in Freight Transportation 145 many actors that have a different perspective of service evaluation. Traditional service providers have advantages utilizing long-time experience in industry and customer. However, as the nature of the service is complex, providing proper service to customers with a quick response is challenging with the human experience. One reason for this is that there are several criteria a customer may consider selecting transport services. These criteria include cost, CO2 emission, transport mode, flexibility, and time reliability of these services [2]. Every customer weights these criteria differently according to his or her preferences. To address this challenge, the field of operations research presents a rich literature on multi- objective optimization models on multi-modal transport to reach optimal solutions with respect to customer preferences [4]; [5]; [6]. Nevertheless, finding optimal solutions for real-word sized instances of such models typically involves large computational times and is not unbiased, as the weighting of these criteria is different for each customer. Thus, it is challenging that a DFPs can provide customers with a quick response to their requests by using optimization models. Indeed, providing a quick response is essential for the success of DFPs, as customers do not intend to spend so much time to get suitable results from the digital platforms. These situations encourage researchers to benefit from other methods like data analytics in such a complex domain. Therefore, it is proposed a methodology that can learn from customer’s past choices to generate quick solutions of acceptable accuracy. Thus, by embedding proposed methodology to the DFPs, provides a reliable estimation of the quality of the offered service. This study proposes a methodology by using data analytics to predict a solution that satisfies customer preferences most. The aim is to quickly generate a solution as well as an accurate estimation of their actual quality. We rely on a mathematical model to generate instances. Given the individual preferences of a customer, we can generate the optimal solution based on an instance. Then, we build a predictive model by machine learning aiming at assessing relationships among criteria. Thus, from a pool of solutions of such an instance, a predictive model determines routing options with respect to customer preferences without the need for conducting an optimization process. We test our methodology on a case study based on portions of the transportation network in Shanghai-Singapore. We consider six criteria a customer may concern when selecting a service: cost, delivery time, distance, CO2 emission, mode of transportation, number of transshipment. Statistical techniques have been applied to depict patterns, correlations, and other insights among parameters. The tendency of the relation of parameters has led us to implement Multiple Linear Regression (MLR). A cross-validation method has been applied to prevent overfitting. A model can make accurate predictions on unseen data; we observe that it can generalize from the training set to the test set without any help from a human. Besides, computational experiments show that our methodology reduces the long computational time given a large set of criteria to consider when determining routing decisions. The proposed methodology enables to match a variety of services as closely as possible with respect to customer preferences. The rest of this paper is organized as follows. Section 2 outlines the proposed approach and methodology. Section 3 provides a brief overview of the case study to evaluate the 146 Biyik et al. experimental results of the proposed approach on portions of the transportation network in Shanghai-Singapore. 2 Proposed Methodology The overall approach is conducted by the predictive analytic and mathematical model. Predictive analytics covers a variety of statistical techniques from data mining, predictive modelling, and machine learning that analyse current and historical data to make predictions about the future. Predictive analytics does not only assist in creating practically useful models, but they also play an important role alongside explanatory modelling in theory building and theory testing [9]. When building the predictive model required dataset is generated by using the mathematical model. The proposed approach is given in Figure 1. It consists of the following layers systematically. Fig. 1: The proposed methodology. A. Identification of Customer Preference Elements This step is the foundation of the mathematical model as well as the predictive model. Customer expectations or preferences should be selected and represented accurately. In this study, it is defined based on previous literature. B. Identification of Physical Transportation Network Elements This step comprises the use of data derived from GPS (global positioning system), sensor detection of the traffic status and incidents, among others. C. Mathematical Model (Service Graph) In this step, we design the optimization model aiming at determining a routing solution which satisfies customer expectations at most. We use the collected data to generate real-world based instances of the mathematical model as well as alternative route solutions according to customer requests. D. Data Collection and Preparation This dataset is constructed by taking samples from the routing solution space of the model. The initial number of variables is usually large to Assisting Customers Service Selection in Freight Transportation 147 capture new sources of information and new relationships. The explanation for each variable is based on combining theory, domain knowledge, and exploratory analysis. E. Data Analysis In this step, the data is aggregated to reduce their dimension and to handle outliers. Due to the often large number of predictors, reducing the dimension can help reduce sampling variance (even at the cost of increasing bias), and in turn increase predictive accuracy [7]. Statistical techniques are used to analyse how the dependent variable changes with the variation of any of the independent variables keeping other variables fixed. Correlation analysis is used to measure the strength and direction of the association among variables. In addition, visualization provides the means for learning about different measurements of quality as well as associations derived from predictive modelling. F. Predictive Model We design a predictive model to estimate the quality of a solution that potentially answers customer preferences at most. Machine-learning algorithms based on supervised learning are applied for the predictive model. Given data in the form of examples with labels, we can feed a learning algorithm that pairs these example-labels one by one. Thus, we allow the algorithm to predict the label for each example, giving it feedback as to whether it predicted the right answer or not. Over time, the algorithm will learn to approximate the exact nature of the relationship between examples and their labels. When fully trained, the supervised learning algorithm will be able to observe a new, never-before-seen example and predict a good label for it. G. Evaluation, Validation and Model Selection This part is applied to certain methods for evaluating the predictive power of a model by such as MSE (the mean square error), R2 (measure of goodness-of-fit of the model or accuracy of the model). Predictive power is related to an empirical model’s ability to predict new observations accurately [7]. For evaluating, it can be applied the method such as portioning dataset or crossvalidation. Over-fitting is a major focus in predictive analytics [8]; [9]. Assessing over-fitting is achieved by comparing the performance of the training and holdout sets. Then, the model selection step is applied, is aimed at finding the right level of model complexity that balances bias and variance, in order to achieve high predictive accuracy. 3 Case Study and Results The proposed approach is evaluated in a case study. The transportation network presented in [10] is considered. The network includes four cities (Shanghai, Wuxi, Singapore, Malaysia), 16 ports and 4 transportation modes. A product originates from different cities and has different destinations. Each city has four airports, railway stations, seaports and warehouses. There are in total 50 direct routes connecting different ports. Each route has a specific transportation mode, transportation cost, delivery time and distance. Each transportation mode has a specific capacity, speed and CO2 emission level. Breadth-first search algorithm was used to generate routes. Then, 50 different alternative route samples have been obtained to correspond to the request of the customer. In the data 148 Biyik et al. analysis step, it was found that costs and CO2 emissions are directly proportional to the origin-destination distance. It was also observed that CO2 emissions do not increase with respect to the number of transshipment. If environmental efficiency is of utmost importance for a customer, he or she will usually prefer the ship option, even if it involves large delivery times. The predictive model is constructed by using MLR analysis. MLR is a suitable and efficient method for our continuous dataset due to the tendency of the relation of parameters. According to the ANOVA test, the effect of the number of transshipment has not a significant contribution to the model, and it can be removed from the model in order to decrease the complexity of the model. A common measure is the coefficient of determination R2 , which measures the fraction of the total variation in the dependent variable that is captured by the model [11]. The higher value, the better our model fits the data. R2 is obtained 90.90% and adjust R2 is obtained 90.09%. By omitting the statistically insignificant parameter from the model, R2 and adjusted R2 is obtained 90.65% and 90.04%, respectively. Moreover, the normality test has been done and it was found that the residuals of the model were normally distributed, and the variance was homogeneous. It means that the test results are usually reliable when the sample is large enough. Fig. 2: The performance of MLR; 10 observations for actual and predicted cost value, and comparison of the actual versus predicted cost in 50 observations. To prevent overfitting, cross-validation method is applied, the data are randomly partitioned into 10 mutually exclusive subsets and the algorithm is run 10 times, with each run on a different set of 9 subsets joined as a training set and with testing done on the remaining subset. The 10 runs thus produce 10 different parameter sets for the algorithm, and the prediction performances of these runs can be compared to each other. After cross-validation, the model performance is that R2 is 91.6% and MSE is 0.008. Actual and predicted cost for 10 samples Assisting Customers Service Selection in Freight Transportation 149 is given in Figure 2. The graphical display, to compare the actual versus the predicted cost, is given in Figure 2. The predictive model is constructed to estimate transportation cost in multi-modal freight transportation network without long computational time. To apply these techniques through the case study, Minitab, Python, and Rapid-Miner were used [12]; [13]. The case study results motivate further research. For the next step, at the modelling, other well-known machine-learning algorithms such as Neural Network and Support Vector Machine will be implemented to compare their performance, and to choose the best. Then, the number of parameters in the dataset will be extended regard to different customer preferences such as service quality attributes. Bibliography [1] Cargomatic Homepage, https://www.cargomatic.com/lncs. Last accessed 4 Oct 2019 [2] Handfield, R. and Straube, F. and Pfohl, H. and Wieland, A.: Embracing global logistics complexity to drive market advantage. DVV Media Group GmbH, BVL International ,(2013) [3] Hofmann, E., Osterwalder, F.: Third-party logistics providers in the digital age: towards a new competitive arena?. Logistics 1(2), 9 (2017) [4] Bektas, T., Crainic, TG.: A brief overview of intermodal transportation. Logistics Engineering Handbook. In G. D. Taylor edn. Taylor and Francis Group, Boca Raton, FL, USA (2008) [5] Caris, A., Macharis, C., Janssens, GK.: Planning problems in intermodal freight transport: accomplishments and prospects. Transportation Planning and Technology 31(3), 277–302 (2008) [6] Crainic, TG., Kim, K.: Intermodal transportation. Handbooks in operations research and manage- ment science 14, 467–537 (2007) [7] Shmueli, G., Koppius, O.R.: Predictive analytics in information systems research. MIS quarterly , 553–572 (2011) [8] Stone, M.: Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B (Methodological) 36(2), 111–133 (1974) [9] Hastie, T., Tibshirani, R., Friedman, J., Franklin, J.: The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer 27(2), 83–85 (2005) [10] Multimodal Transportation Network, https://github.com/hzjken/multimodal- transportation-optimization. Last accessed 1 Feb 2019 [11] VanderPlas, J.: Python Data Science Handbook: Essential Tools for Working with Data. O’Reilly Media, Inc. (2016) [12] Minitab, https://www.minitab.com/en-us/. Last accessed 1 Dec 2019 [13] RapidMiner, https://rapidminer.com/. Last accessed 1 Dec 2019