Optimization of urban freight transportation based on evolutionary modelling E A Gladchenko1, O N Saprykin1 and A N Tikhonov1 1 Samara National Research University, Moskovskoe Shosse 34А, Samara, Russia, 443086 e-mail: liz-ok_033@mail.ru, saprykinon@ssau.ru Abstract. Logistics problems require special attention, because every year they become more complicated and multivariable. On the one hand, a supply chain management includes incessant monitoring of such issues as requests elaboration, paths determination, routing of shipments, multimodal choice, set up of transhipments, fleet choice and maintenance, warehousing, packaging and others. On the other hand, dozens of people are involved in the logistics process. All these moments complicate the decision-making that is why data driven decisions are required nowadays. As well as shipment problems are NP-hard, the heuristic methods should be applied to resolve them. In this article we propose a genetic algorithm to solve the complex problem that consists of the Travelling Salesman Problem combined with the Knapsack Problem. We have developed an urban freight transportation model which is focused on the minimization of the underway time as well as on the maximization of the truck’s loading. A significant contribution in our method is the census of traffic frequency by using traffic zoning. The developed approach has been implemented using the Python programming language in the Zeppelin environment. The first version of the system has been approved in the city of Samara (Russia) with test demand dataset. 1. Introduction Nowadays, the competition between logistics companies is extremely high. To stay alive each of them competes for clients, strives to reduce costs, improve quality and efficiency of work, in other words they try to optimize the process. In the 21st century, such technological advances as Big Data or Internet of Things [1], are available to be used for the above listed purposes. The potential of Big Data techniques with state-of-the-art software can significantly enhance the data-driven decision-making process. Logistics optimization is a complex and dynamic problem that requires a systematic approach. On the one hand, this is determined by the large membership involved in the process: shippers, carriers, recipients, third party logistic managers and others. Their decisions and actions should be well coordinated, that is why it is necessary to have a good communication. On the other hand, the supply chain consists of several stages. The first is a distribution that includes tracking the demand level and types of commodities, booking and consolidation of requests. Transportation operations take place at the second stage which includes layout of the route, fleet choice, schedule preparing and adjustment, organization of the multimodal transshipment. This is the most sensitive and vast part of the logistics process, because of the wide variety of stochastic issues that need to be predicted. Warehousing, with V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) Data Science E A Gladchenko, O N Saprykin and A N Tikhonov the best depots locations, as well as using the interior automation equipment and mechanic means, is the third stage. The last one is packaging, which is especially considerable in perishable, fragile and out of gauge goods shipments. Outlined stages are not independent and affect each other. Thus, routing is NP-hard optimization problem that can be formally solved only in limited cases and has low performance. To avoid these limitations, heuristic methods are commonly applied. This paper is focused on the genetic analysis in terms of integrated transportation optimization. The rest of the paper is organized as follows. Section 2 gives a brief background to the recent research, both theoretic and empirical; and problem statement. Section 3 contains basic model’s features, presents methodological description, including genetic analysis. In Section 4 the implementation is displayed. Section 5 concludes and summarizes the paper. 2. Background 2.1. Literature Review There is a huge amount of research related to supply chain optimization. For example, M. Grazia Speranza [1] outlines the main tendencies in the logistics management, one of which is the complex data handling. The author points out three directions of the extension: systemic direction (the most applicable for this research), collaborative direction and dynamic direction. The paper presents some feasible ways of using Big Data in terms of transportation process optimization. Big Data technologies make possible to build predictive models of transportation processes too [2]. Generally, articles which describe different methodologies, rested on the model formulization. For instance, Shui Lam and Burkhard Englert [3] represent a basic mathematical model, which takes into account only two shipment offers. The proposed method compares shipment schedules, transportation cost and terminal charges that helps to find out optimal deliveries. Nevertheless, heuristic analysis allows estimating multiple decisions applying Genetic Algorithms [4]. Zbigniew Michalewicz [5] describes the processes of evolving with the parents’ selection and the producing of offspring. This study contains several modes of application Mutation and Crossover operators, as well as the representation in matrix encoded data. For example, genetic analysis, with the unique combination of operators, was used by Gintaras Vaira [6] to solve the Rich Vehicle Routing Problem (RVRP). The general VRP is different from the typical VPR, because it restricts extra constraints for a task, a vehicle, a cargo and so on. The genetic algorithm allows resolving VRP with pick-up and deliveries as well as VRP with time windows. As we have already noted the supply chain optimization is a complex subject that is why solving only the routing problem apart from real life constraints is not efficient. The next research tries to take this fact into account. The aim of the next model is not to create new optimal routes, but to find the best combination of existing routes and services in maritime transportation. Anantaram Balakrishnan and Christian Vad Karsten [7] report the LP-based heuristic procedure for containerized cargo that may require the transshipment during the delivery. The approach is based on a selection from the limited number of possible transshipments using the augmented multi-commodity flow network. Another complicated problem for freight operators is to control the booking rate to provide the maximum profit and execute all shipments when the product demand and resource capacities are not determined. The stochastic resource allocation problem which is formulated by Xinchang Wang [8] as a stochastic programming (STOC) model that incorporates three important features: random resource capacities, stochastic product demands, and network effects. The algorithm embeds a nonlinear optimization solver that uses an iterative-search type algorithm. Furthermore, there is a strong relationship between the shipment size and fleet size. Lots of papers propose various models to find the optimal mapping function. One of them is based on the discrete- continuous econometric model where the shipment size is a continuous variable and the size of the vehicle fleet is a discrete variable [9]. Results that are represented in the paper prove a hypothesis: with the increase of the distance and total demand it is become more efficient to exploit heavier, except for old vehicles. V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 96 Data Science E A Gladchenko, O N Saprykin and A N Tikhonov Latent Class Analysis (LCA) is also applicable to shipment size optimization. Raphael Piendl, Gernot Liedtke and Tilman Matteis [10] combine LCA and minimization of the total logistics costs. The proposed model divides transport cases into classes according to similar response patterns. Four homogeneous segments were identified which were integrated as an additional component to the systematic framework of the total logistics cost formulation. Concepción Román, Ana Isabel Arencibia, María Feo-Valero [11] represent fascinating research, that outlines the quantification of attribute cut-offs combined with the estimation of taste heterogeneity through the Latent Class Model (LCM). The analysis is based on the empirical data about decision-making in the area of logistics management and based on the specification of the penalized utility. Another application of the heuristic algorithm was introduced by Wu-Yang Liu and Li-Ning Xing [12] to reduce costs in the processing of express logistics system as a type of capacitated arc routing problem (CARP). The first step in the model is performing architecture optimization. The second step is assuming extended random path-scanning (drawing between 5 Rules). The paper illustrates the dependence between the number of depots, vehicles, total cost and different characteristics, as vehicles load, speed, acquisition cost, depot construction post, maximum service time, etc. Nevertheless, when the optimal number of deports is identified it can be not conformed to the best way between stores and final destinations, their location should be taken into account in this case. Xiang Hua, Xiao Hu and Wuwei Yuan [13] present a mean to solve this problem by applying the Adaptive particle swarm optimization (APSO). The APSO is a supplement of nonlinear inertia weight as well as time-varying acceleration coefficients as opposite to the Particle swarm optimization (PSO), which is a common method of decision-making. This fact allows a quick and accurate finding of the global optimum. As a result, the best locations may be determined using the knowledge of customer’s locations and demand. However, logistics managers more often deal with existing depots, so the real problem is to create optimal paths under existing circumstances. To achieve the maximum effect, the logistic problem should be solved as a complex task, involving different aspects of the transportation process. One of the papers that consider this conception is a review written by Pieter Vansteenwegen, Wouter Souffriau and Dirk Van Oudheusden [14]. The authors outline different combinations of logics problems, which are put together into the Orienteering Problem (OP). They investigate in more detail three of them (Team OP, OP with time windows, Team OP with time windows), give several examples of their mathematical formalisations and ways of solving. Another example of solving the complex logistic task is the study of Mohamed El Yafrani and Belaïd Ahiod [15]. They look into the Travelling Thief Problem (TTP) which includes two interdependence combinatorial optimization subproblems: the Travelling Salesman Problem (TSP) and the Knapsack Problem (KP). The authors compare iterative neighborhood algorithms to other methods, present deterministic procedures for each problem combination reinforce their assumptions by experimental results. However, recent research in the considered area does not take into account city traffic flows. This constraint is very vital for modern cities with high motorization level. Our study extends the described above conception, and focused on the routing of the city freight trucking in urban traffic conditions. To solve this problem the transportation planning techniques were involved in a model and combined with an evolutionary approach. 2.2. Problem statement The study addresses the following problem: identify the optimal route from the distribution center to destination points with the fleet’s capacity constraint and taking into account urban traffic. The problem differs from a typical TSP, because the route is based on the road traffic. This supplement promises to minimize the delivery time and consequently the total logistics costs. In addition, the described problem requires combining the TSP with KP. To solve the described above problem we propose the evolutionary approach that is based on a modified genetic algorithm. This approach allows reflecting the latent organizing process of a complex technical system that is close to nature system [16]. The input data for the algorithm are city transport infrastructure and transportation demand. The idea is that we have a city where vehicle density is varying in different time-windows and districts. It means that the underway time of one route during V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 97 Data Science E A Gladchenko, O N Saprykin and A N Tikhonov various times of the day will be different. This information can be provided by modelling system that is based on the origin-destination matrices. Therefore, the algorithm takes into account origin and destination locations, schedule time, transportation demand, fleet capabilities. Moreover, to find the optimal loading the KP is solving on every iteration of TSP algorithm. The KP algorithm is based on the genetic approach too. The overall solution represents the combination of two genetic algorithms: the KP genetic algorithm is nested into the TSP genetic algorithm. 3. Methodology 3.1. Mathematical Model The model describes the city freighting planning for helping logistics managers in decision-making, i.e. to optimize the whole route (TSP), with maximum trucks loading (KP). A solution consists of determining the final routes of delivery with the sequence of destination points which start and end in the depots. The goal is to minimize the underway time and to maximize the truck’s loading. To get an accurate and realistic solution the impact of the urban vehicle density should be taken into account. However, traffic datasets for most of the cities are unobtainable. For this reason, we use Transport Zoning and macroscopic transport simulation to make our model more flexible. The process of transportation simulation of the city is based on the gravity law. Marc Barthelemy [17] presents detailed information about creating origin-destination matrices and processing mobility networks. Simulation allows estimating the flow value between different areas in the city by using population and infrastructure datasets. Figure 1 shows a visual representation of the model that takes into account traffic zones. The micromodel that is based on this traffic zones allocation shows different transportation velocity on different branches of the road network. We propose the space-time model because the velocity depends on time-windows too. Therefore, we can predict the average velocity of trucks during transportation. Figure 1. Example of the delivery route that starts and ends in the depot (1-5-7-6-8-4-1). In our paper we are using the following designation indicators: • 𝑁 is the number of destination points. • 𝑖 is the origin point, (i=1…N). • 𝑗 is the destination point, (j=1…N). • 𝑑𝑗 is the quantity demand in the point 𝑗, (j=2…N). • 𝑙𝑖𝑗 is the distance between i and j, (i=1…N), (j=1…N). • 𝑙𝑖𝑗 = 0 if 𝑖 = 𝑗, which means there is no distance between the same origin-destination points. • 𝑡𝑠 is the average service time that is needed for unloading one package, is a set value, e.g. 𝑡𝑠 = 2 min. • 𝑡𝑒 is the average extra time in the destination point, for instance drawing up of documents etc., is a set value, e.g. 𝑡𝑒 = 7 min. V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 98 Data Science E A Gladchenko, O N Saprykin and A N Tikhonov • 𝑐 is the capacity of the truck. • 𝑧 is the transport zone of the city (z=1…Z). 𝑉𝑧𝚤 , ���� ���� 𝚥 • 𝑉𝑧 is the average speed in z-th zone. • 𝑀𝑘 is the matrix chromosome with the whole route, (k=1…K). • 𝑥𝑖𝑗 is the variable for the matrix. • 𝑃𝑘 is the vector form of the matrix’s chromosome (k=1…K). • 𝑝𝑛𝑃 is the value (number) of the destination point, where n is an atomic number of the gene in the P-th chromosome. The fitness function (FF) for the TSP is 𝑇𝑘 shows the total underway time of the route, where the sequence of m is a reverse sequence of loading the kth truck in the depot. FF1 for TSP looks like: 𝑝𝑛−2 𝑍 2 × 𝑙𝑗,𝑗+1 2 × 𝑙𝑝𝑛−1 ,𝑝𝑛 𝑇𝑘 = ( � � � 𝚥 + 𝑑𝑗+1 × 𝑡𝑠 + 𝑡𝑒� + 𝑝 → 𝑚𝑖𝑛 (1) ��� 𝑉 + ������ 𝑉 𝚥+1 ������� 𝑉 𝑛−1 + ����� 𝑉 𝑝𝑛 𝑗=𝑝1 𝑧=1 𝑧 𝑧 𝑧 𝑧 The constraint is: 𝑝𝑛−1 � 𝑑𝑗 ≤ 𝑐 (2) 𝑗=𝑝2 It means that the total quantity of freight during one route cannot exceed the capacity of a truck. To relate the points to transport zones (z), we assign a zone number to each origin and destination points. The fitness function for the Knapsack Problem (FF2): 𝑝𝑛−1 � 𝑑𝑗 → 𝑚𝑎𝑥 (3) 𝑗=𝑝2 (3) can be treated as the more loading leads to a better economic effect for the final logistics costs. 3.2. Genetic Algorithm To solve the designated in Section II B problem we adopted genetic analysis which is based on the evolutionary process. Genetic modelling is one among many heuristic ways in decision-making that uses encoded information, where variables have discrete values 0 and 1. Basic stages of the program are chromosomes generation and selection. Genotypes consist of genes with straight positions. When the first population is built, these potential solutions are tested by the fitness function, where the best ones are selected for the following iterations using mutation and/or crossover operators. We randomly create the first K “parents” populations that are shaped into Binary Matrixes Chromosomes. Each matrix represents the encoded route, includes rows i and columns j, if 𝑥𝑖𝑗 =1 there is a sub-path between i and j, otherwise 𝑥𝑖𝑗 =0. Matrices are completed with some special conditions, such as: 𝑥𝑖𝑗 ∈ {0,1}; 𝑖 = 1 … 𝑁, 𝑗 = 1 … 𝑁, (4) 𝑥1𝑗 = 𝑥𝑖1 = 1; 𝑖 = 1 … 𝑁, 𝑗 = 1 … 𝑁, (5) 𝑥𝑖𝑗 = 0; ∀(𝑖 = 𝑗); 𝑖 = 1 … 𝑁, 𝑗 = 1 … 𝑁, (6) 𝑁 � 𝑥𝑖𝑗 ≤ 1; ∀𝑗 = 1 … 𝑁, (7) 𝑖=1 𝑁 � 𝑥𝑖𝑗 ≤ 1; ∀𝑖 = 1 … 𝑁. (8) 𝑗=1 For example, figure 2 displays the matrix for 𝑀1 . V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 99 Data Science E A Gladchenko, O N Saprykin and A N Tikhonov The route from figure 1 can be treated as vector 𝑃1 : (1-5-7-6-8-4-1), where 𝑝1𝑃 = 𝑝𝑛𝑃 = 1 in any way. It is because the delivery is always started and ended in the depot, which number is 𝑖 = 𝑗 = 1. j 1 2 3 4 5 6 7 8 … N-1 N i 1 0 0 0 0 1 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 4 1 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 1 0 0 0 6 0 0 0 0 0 0 0 1 0 0 7 0 0 0 0 0 1 0 0 0 0 8 0 0 0 1 0 0 0 0 0 0 … N-1 0 0 0 0 0 0 0 0 0 0 N 0 0 0 0 0 0 0 0 0 0 Figure 2 Example of the Binary Matrix (𝑀1). After that, the FF1 (1) is calculated for all K individuals, and the algorithm checks the constraint (2). Some of the structures from the first population will be removed due to inconsistency with the capacity limitation. After these steps, the standard algorithm may proceed to the next iterations, but our approach contains additional KP step. The rest of K vectors are estimated by FF2 (3). Figure 3 shows an example of one of the possible solutions of the KP task. Figure 3. Example of the truck loading for the route (1-5-7-6-8-4-1): a) the third layer of loading; b) the second layer of loading; c) the first layer of loading. Subsequently the best representatives 𝑃𝑘 are selected for further simulations with 𝑀𝑘 using the crossover operator. The outcome of this procedure is the new offspring, which is used for the next loop iteration. Figure 4 represents the schema of the proposed genetic algorithm. 4. Implementation The solution is implemented as Python notebook that allows working with the model iteratively and improving it step by step. As a host environment, we are using Zeppelin, because it provides capabilities to build scalable interactive notebooks with a rich user interface. The developed software is implemented as a Docker container, which contains all configured modules and packages and can be deployed on most common operating systems. The developed scripts and notebooks are stored in the Bitbucket git repository. Figure 5 presents the architecture of the developed solution. The Python programming language is a flexible tool for systems prototyping. There are a lot of packages for solving different applied and scientific problems. To work with spatial data in our project the following Python packages were used: V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 100 Data Science E A Gladchenko, O N Saprykin and A N Tikhonov • Shapely (low level processing of geometries). • GeoPandas (processing of data frames with geometry column). • NeworkX (working with spatial graphs). • OSMnx (loading network for some region from OpenStreetMap resource). • Matplotlib (power tool for plotting with supporting of geographic maps). Figure 4. The schema of the heuristic algorithm. V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 101 Data Science E A Gladchenko, O N Saprykin and A N Tikhonov Figure 5. Schema of the solution environment. A genetic algorithm was implemented using Deap package that contains genetic operators and means to construct any evolutional model. Fitness function (1) was implemented as a custom function and registered in Deap framework as “evaluate” parameter. Inside the main evolution loop that solves TSP task, we have added the nested evolution loop that solves KP task for each “individual” that is selected as a population for the next generation. Therefore, the developed Python script fully implements the method proposed in Section III. The solution was debugged using the map of the city of Samara (Russia). The information about dividing the city into traffic zones and building Origin-Destination matrices was taken from the previous study [18]. For other task parameters (origin and destination points, transportation demand, freight fleet and its capacity) we have used the test data. We have used SUMO as a simulation engine in our research (figure 6). Figure 6. Simulation of transportation flows in Samara. 5. Conclusion The genetic analysis is used as a tool for freight transportation decision making support. The urban freight transportation model allows us to determinate the optimal origin-destination route of the delivery by minimizing underway time and maximizing truck’s loading. The main advantage of the proposed approach is applying of transport zoning, which promises to bring solutions closer to the real-life situation. The further research may take into account time windows of deliveries and fleet capability differences. Moreover, the genetic algorithm requires tuning to get optimal execution performance. Additionally, the simulation stage should be used to test the model performance within the urban traffic circumstances. V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 102 Data Science E A Gladchenko, O N Saprykin and A N Tikhonov 6. References [1] Speranza M G 2016 Trends in transportation and logistics European Journal of Operational Research 264 830-836 [2] Agafonov A A, Yumaganov A S and Myasnikov V V 2018 Big Data analysis in a geoinformatic problem of short-term traffic flow forecasting based on a k nearest neighbors method Computer Optics 42(6) 1101-1111 DOI: 10.18287/2412-6179-2018-42-6-1101-1111 [3] Lam S and Englert B 2009 Cost-Based Optimization of Shipment Scheduling IFAC Proceedings Volumes 42(15) 143-148 [4] Mokshin A V, Mokshin V V and Sharnin L M 2019 Adaptive genetic algorithms used to analyze behavior of complex system Communications in Nonlinear Science and Numerical Simulation 71 174-186 [5] Michalewicz Z 1996 Genetic Algorithms + Dara Srtuctures = Evolution Programs (Spring- Verlag Berlin Heidelberg, New York) [6] Vaira G 2014 Genetic algorithm for vehicle routing problem Doctoral Dissertation, Technological Sciences, Informatics Engineering (Vilnius) [7] Balakrishnan An and Karsten C V 2017 Container shipping service selection and cargo routing with transshipment limits European Journal of Operational Research 263 652-663 [8] Wang X 2016 Stochastic resource allocation for containerized cargo transportation networks when capacities are uncertain Transportation Research Part E 93 334-357 [9] Abate M and Jong G 2013 The optimal shipment size and truck size choice – The allocation of trucks across hauls Transportation Research Part A 59 262-277 [10] Piendl R, Liedtke G and Matteis T 2016 A logit model for shipment size choice with latent classes – Empirical findings for Germany Transportation Research Part A 102 188-201 [11] Román C, Arencibia An Is and Feo-Valero M 2016 A latent class model with attribute cut-offs to analyse modal choice for freight transport Transportation Research Part A 102 212-227 [12] Liu W-Y and Xing L-N 2013 The double layer optimization problem to express logistics systems and its heuristic algorithm Expert Systems with Applications 41 237-245 [13] Hua X, Hub X and Yuan W 2016 Research optimization on logistics distribution centerlocation based on adaptive particle swarm algorithm Optik 127 8443-8450 [14] Vansteenwegen P, Souffriau W and Oudheusden D V 2010 The orienteering problem: A survey European Journal of Operational Research 209 1-10 [15] Yafrani E M and Ahiod B 2016 A local search based approach for solving the Travelling Thief Problem: The pros and cons Applied Soft Computing 52 795-804 [16] Saprykina O and Saprykin O 2017 Transport Infrastructure Optimization Method Based on a Memetic Algorithm Proceedings of the IEEE 20th International Conference on Intelligent Transportation Systems (Yokohama, Japan) 1636-1641 [17] Barthelemy M 2010 Spatial Networks Physics Reports 499 1-101 [18] Saprykina O and Saprykin O 2017 Validation of Transport Infrastructure Changes via Microscopic Simulation: A Case Study for the City of Samara, Russia Proceedings of the 5th IEEE International Conference on Models and Technologies for Intelligent Transportation Systems (Naples, Italy) 788-793 V International Conference on "Information Technology and Nanotechnology" (ITNT-2019) 103