Review of Neural Networks Application in UAV Routing Problems Maksym Ogurtsov V.M. Glushkov Institute of Cybernetics of National Academy of Sciences of Ukraine, Akademika Glushkova Avenue, 40, Kyiv, 03187, Ukraine Abstract The paper reviews the ways and methods to the neural networks (NN) usage for solving combinatorial optimization (CO) problems, appearing when using unmanned aerial vehicles and based on their specifics. It is determined that the use of neural networks (with supervised learning, reinforcement learning and deep learning) is possible for many types of CO routing problems (salesman's problem, VRP problem in different versions, etc.) and other unmanned aerial vehicles CO problems. NN with reinforcement learning, as well as recurrent NN with controlled learning may be successfully used to directly solve combinatorial optimization problems mentioned above, or to adjust combinatorial optimization algorithms parameters. Keywords 1 Neural networks, combinatorial optimization, VRP, unmanned aerial vehicles, deep learning 1. Introduction Today, decision support tools are used more and more often all over the world, for example, in transportation, military affairs, supply chains and logistics, energy, finance, and operations planning [1, 2]. But the use of neural networks (NN) in these tools is quite limited yet [3, 4]. Research and development of decision support tools, which are also called administrative analytics, began during World War II as an initiative to use mathematics and computer science to help military planners to make decisions [5]. Problem statement: the analysis of the possible ways to use algorithms of machine learning (ML) in NN to solve problems of combinatorial optimization (CO) [2], appearing when using unmanned aerial vehicles (UAVs), consisting of UAVs, their operators, etc., for which the distribution of initial data is unknown in advance [3] to find out possibility of use ML for solving such problems. Typically, the use of NN is well suited for processing signals of natural origin, for which it is difficult to perform formalization and clear mathematical formulation [3], because the distribution of initial data in this case cannot be determined analytically, for example, image processing, text, voice, or molecule analysis. etc. The same applies for the UAV CO problems, especially – UAV routing problems. Significant progress has recently been made with the use of machine learning – through in-depth learning [6]. Deep learning provides better results than classical algorithms when used in multidimensional spaces with large amounts of input data. Routing problems that arise when using UAVs has led to the appearance of numerous studies conducted in recent decades. There were many attempts (including successful ones) to solve the problems of CO with the use of ML. They were considered by relevant researchers and publications all over the world [3, 4, 7, 8, 9, 10, 11, 12]. World experience in the UAV development shows that in the next 10-15 years UAVs will be able to perform most of the tasks that are currently solved by manned aerial vehicles. And in Ukraine, due to the difficult political and economic situation, a deeply rational solution will be to use the methods of CO to increase the efficiency of such tasks for UAVs. Moreover, the rapid expansion of UAV applications in recent years has created a need for new classes of problems, like route building for swarms of UAVs, performing same task as a team etc. Information Technology and Implementation (IT&I-2021), December 01–03, 2021, Kyiv, Ukraine EMAIL: ogurtsov.maksym@incyb.kiev.ua (A. 1) ORCID 0000-0002-6167-5111 (A. 1) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 45 2. Analysis of recent research and publications As it was mentioned earlier, recently, significant progress has been made with the use of ML – through in-depth training [3]. The possible use of ML to solve the CO problems was considered by both foreign [3, 4, 7, 8, 9, 10, 11] and domestic researchers [12, 13, 14, 15]. We will focus on such discrete optimization problems as integer optimization with constraints, i.e., optimization problems with integer / binary variables, which is applicable for routing optimization problems, actual for UAVs. Although not all such problems are difficult to solve (for example, the problem of finding the shortest path), we will focus on NP-complex problems of CO, specific for UAVs. For these problems, the existence of an algorithm whose operating time is a polynomial of the size of the input data is considered unlikely. However, in practice, CA algorithms can find solutions to problems containing up to millions of variables, parameters, and constraints. How NP-complex problems can be solved in a reasonable time? Let us consider the classical example of an NP-complex problem, applicable for UAV routing task – the traveling salesman problem (TSP). TSP may be defined in a graph where we look for a minimum length solution by visiting each node once and only once [16]. For the UAVs TSP is applicable for the tasks, when we have one UAV, that must visit some set of coordinates (for delivering something, surveillance, or destroying targets if we are talking about military applications). Let us focus on the Euclidean traveling salesman problem [17]. In this version, each node has its assigned coordinates in Euclidean two-dimensional space (or more generally, in a vector space with an arbitrary number of dimensions), and the objective function for the segment connecting the two nodes is the Euclidean distance between them. For most cases Euclidean two-dimensional space is applicable for UAVs, because although they are changing their altitude during flight, mostly it is happening during taking off and landing and while on the route most of UAVs are holding more-less the same altitude. So Euclidean two-dimensional space is completely applicable for the UAV routes building. Although theoretically Euclidean example of the problem is as complex as the general case of the TSP, the approximate solution can be effectively found in Euclidean space using the structure of graphs [18]. But sometimes we should consider that for UAVs three-dimensional Euclidean space should be used instead of two-dimensional to get valid solutions, for example, when we are looking for solution of the problem in the mountains, or delivery task in the city with skyscrapers (because in such cases sometimes UAV will have to change its altitude between two route points even more than distance between these two points in two-dimensional space. But for most cases of UAV’ specific tasks two-dimensional space would be enough. There is a large amount of literature on heuristic algorithms, i.e., algorithms designed to calculate "fairly good in practice" solutions of CO problems without guaranteeing optimality. These heuristic methods playing a central role in CO for UAVs and some of them will be considered in this paper. The most popular are the following [19, 20]. We should notice two main motivations for using ML: approximation and the creation of new policies. According to current research, next types of NN are applicable for solving UAV’ CO problems: • Hopfield Network • Supervised learning • Unsupervised learning • Reinforcement learning • Deep learning There are two popular ways for the ML use to solve CO problems (and not only for UAV’ specific ones) – with and without expert knowledge. In this case, the decision-making function is called a policy that, given all available information (state of the environment, if the information is sufficient to fully characterize the environment at this step), issues output (possibly stochastically), which should be performed in the next step of the algorithm. Policy is a function that we want to teach with the help of ML. Consider how the above two motivations determine the parameters of learning. 3. Selection of previously unsolved parts of the overall problem The purpose of this work is to determine the applicability of NN to solve routing problems that arise when using UAVs, i.e., to highlight promising areas of research of the NN usage to solve the CO problems. NN can be successfully used for CA tasks that arise when using UAVs, primarily – when 46 planning routes for UAVs and especially – for UAV groups. This is because the task of constructing routes and allocating points to be visited between vehicles is a classic CO task – the task of VRP (Vehicle Routing Problem) [21]. And such a task constantly arises when planning operations using UAVs. 4. Novelty Although, as mentioned above, the use of NN for solving the CO problems is actively studied all over the world, its use for UAV-specific tasks has not been studied enough to date. 5. General scientific significance The use of NN to solve the UAV-specific CA problems can improve the quality of the resulting solutions, which will increase the efficiency of the UAVs use in general and potentially – to expand the UAVs scope. 6. NN types, applicable for UAV-specific CO problems NN can help improve UAV-specific CO algorithms (and CO algorithms in general) in two ways. First, the researcher can use expert knowledge of the CO algorithm to replace resource-intensive calculations with rapid approximation. NN can be used to calculate such approximations. The challenge is to explore the space for possible solutions and thus be able to choose the appropriate policies that have led to the best of these solutions. Second, from the point of view of using NN to solve CO problems, NN can decompose the problem into smaller, simpler problems. For example, if we have group of UAVs that should visit set of points on the map (VRP problem), NN may be used to divide set of target points between UAVs. Then we would have to solve just a set of TSP problems (one per each UAV), which is much simpler task. Let us consider features, specific for UAVs CO problems. Mostly such tasks are TSP or VRP problems, some other types also may be considered (for example, choosing optimal set of UAVs for completing some certain task), but this is outside the scope of this work. Therefore, for TSP/VRP problems, specific for UAVs, next features should be considered: • UAVs have limited time of work (limited maximum route length) – and if ground-based vehicle may wait some time without losing fuel (for example, in VRP with time windows when vehicle arrived at the point before time window start), for UAVs in most cases it is not possible – it can’t just land anywhere so would have to spend accumulator resource even while remaining on the same location; • For some tasks third dimension (altitude) should be taken into account, as it was described earlier; • Weather conditions affect UAVs more than ground vehicles or bigger airplanes, for example, wind (especially for small, lighter UAVs), cold (may affect UAVs accumulator capacity greatly) etc.; • UAV swarms are using more and more often, and therefore their specifics should be considered (which differs much from usual vehicles paradigm). Now let us consider the main types of NN that can be used to solve UAV-specific CO problems. 6.1. Hopfield Network The first attempt to formalize the description of nerve cell function for applying in computational algorithms was a model proposed in 1943 by McCulloch and Pitts. The McCulloch-Pitts model became the starting point for building a simple unidirectional neural network called a perceptron. Such a network was proposed and researched by Rosenblatt in the late 1950s and early 1960s. And in 1960, Adaline systems (Adaptive Linear Neuron) were proposed. Later, the Hopfield neural network was suggested. The Hopfield neural network is a fully connected NN with a symmetric matrix of connections [22]. If we modify the objective function of this NN so that it is equivalent to, for 47 example, the objective function of the salesman problem for UAV, and use Lagrange multipliers to set penalties for violating the constraints of the problem, then such a network can be used to solve not only TSP but multiple other CO problems. Each neuron in the system yi can take one of two states (like the output of a neuron with a threshold activation function): 1 yi   . (1)  1 A limitation of Hopfield networks is that they are sensitive to hyperparameters and parameter initialization [23]. Although such neural networks have several promising properties, their practical application has remained limited, in most cases only as a research work. For absolute most of the UAV-specific CO tasks they are now non-applicable. 6.2. Supervised learning In supervised learning during the input, we have a set of input data (functions) / target pairs, the task is to find the function of converting input data into output, which for each set of input data provides output that meets the specified constraints and has the value of the objective function closest to the target. Finding such a function is called learning and may be completed by solving an optimization problem on a family of available functions. The loss function, i.e., the degree of discrepancy between the source and target values, can be selected depending on the task (regression, classification, etc.) and optimization methods. Mathematically speaking, let X and Y, which have a common probability distribution P, be random variables that represent the input data and the objective function. Let ℓ be a minimization function of losses and let {fθ | θ ∈ Rp} is a family of machine learning models for optimization (in this case, parametric). The problem of controlled learning is defined as: min EX ,Y P (Y , f ( X )). (2)  R p Supervised learning is completely applicable for UAV-specific CO tasks, first, for various versions of VRP problem, such as VRP with time windows, VRP with dynamic depos and so on. The only limitation that supervised learning have for UAV-specific problems is a general flaw of the supervised learning: pupil in most cases can’t overcome his teacher. It means that if target values for the problem are non-optimal, then NN after learning completion would also have non-optimal results. So, for complicated CO problems, for which we don’t have optimal solutions, supervised learning may not be the best solution to use. 6.3. Unsupervised learning In unsupervised learning [24], [25] the NN does not have the goals of the problem it solves. Instead, it tries to get some characteristics of the general distribution of random variables observed during solving the problem. Because unattended, unsupervised learning has so far received little attention in whole CO context (not even speaking about its application for UAV-specific tasks) and its immediate use seems difficult, this method would not be considered in detail. Its research and application remain mostly theoretical. 6.4. Reinforcement learning As it was said before, controlled learning is not applicable to the most UAV-specific CO problems because it does not have access to the optimal distribution of source data. However, you can compare the quality of the solution set using an evaluation data set for the validation and determine the reward for the learning algorithm. Therefore, it is possible to adhere to the paradigm of neural reinforcement learning (RL) to solve the UAV-specific CO problems. Moreover, even when using optimal solutions as labeled data to optimize controlled learning, the quality of the results is quite poor compared to machine RL, which receives a corresponding reward for good solutions [4, 26-27]. 48 Defining the reward function is not always easy, and UAV-specific tasks are not the exception. Because RL is built on a dynamic process, it is naturally able to anticipate states / transitions that lead to future rewards. However, the above setting of rewards is difficult and not always effective, as it does not allow learning until the agent (randomly or through advanced approaches) solves the problem. Therefore, at the first step of algorithm we would have to get some solution, that later we would improve. In addition, when an approximate policy is applied (for example, using linear functions), learning does not guarantee finding the optimal solution and may easily fall to local lows, so it should be considered as well. 6.5. Deep learning Deep learning is a method of constructing NN, consisting of the large number of layers. In the case of the simplest NN architecture, a NN with a direct connection, also called a multilayer perceptron (MLP), the input data is sequentially transmitted through several layers. For each layer on the input vector, an affine transformation is applied, followed by the application of a nonlinear scalar function (called the activation function). The output from the layer, called the intermediate activation function, is passed to the next layer. All affine transformations are independent and presented in practice as different matrices of coefficients. They are trainable, i.e., optimized using a stochastic gradient descent (SGD), an optimization algorithm used to minimize the selected loss function. Stochasticity comes from the limited number of data points used to calculate losses before applying a gradient update. Deep neural networks (DNNs) may be difficult to optimize, especially for UAV-specific VRP tasks, so several methods and approaches have been developed to make o better optimization, often by changing the architectural aspects of the network. Because DNNs have a large dimension amount, i.e., they can correspond to essentially any data set, they are prone to excessive content problem. DNMs are also subject to active regularization. Learning such networks with SGD also regulates them through gradient noise, which makes DNNs generally reliable for excessive content problems. In addition, there are many hyperparameters for DNN and various combinations of them are evaluated (this approach is known as hyperparameter optimization). DNNs move away from more traditional machine learning methods, using all available raw input data, such as all image pixels, to learn, while traditional ML usually requires the use of a limited number of features specific to a particular task. Deep learning researchers have developed various methods for adapting to a variety of structured input data in such a way as to process variable-sized data structures, such as variable-length sequences, which is perfectly applicable for the UAV-specific VRP tasks. Consider the following modern techniques. 6.5.1. Recurrent neural networks The first presented architecture, applicable for UAV-specific CO tasks is recurrent neural networks (RNN). RNNs can work on sequential data by exchanging parameters at different stages of the sequence. More precisely, the same NN block is consistently applied at each step of the sequence, i.e., with the same values of architecture and parameters at each step of time. The specificity of such a network is the presence of repeating layers: layers that take as input both the activation vector from the previous layer and the actual activation vector at the previous stage of the sequence (it is called the hidden state vector), as illustrated in Figure 1. On the left, a black square indicates a one-step delay. On the right, the same RNN is shown unfolded. Three sets of parameters U, V and W are presented and reused at each step. 6.5.2. Attention mechanism Another important technique that makes the UAV-specific problem invariant to input data sizes is the attention mechanism [29, 30]. It can be used for data processing, where each point of the output data corresponds to a set of input data. In this context, parameter exchange is used to consider the fact that different sets of input do not have to be the same dimension. The attention mechanism is used to request information about all the elements in the set and combine this information for further 49 processing in the NN, as shown in Figure 2. The classical mechanism of attention – the query q is calculated for a set of values (vi)i. The affinity function f is used for pairs of queries and sets of input data values. If it includes some parameters, the mechanism of attention is amenable to learning. Figure 1: Recurrent neural networks architecture The affinity function takes a request at the input (which is any kind of contextual information and informs where to focus) and a representation of the input set element (both are activation vectors) and outputs a scalar. This process is repeated over all elements of the set for the same query. The obtained scalarized source data are normalized (for example, using the softmax function) and used to determine the weighted sum of the representations of the elements in the set. This weighted amount, in turn, can be used in the NN. Figure 2: NN attention mechanism The attention mechanism can be used to construct NNs of graphs, i.e., NNs capable of processing structured input data of graphs [31] for building UAVs’ routes. In this architecture, each node pays attention to the set of its neighbors. The process is repeated several times to further collect information about the nodes. Deep learning and RNN can be used for supervised, unsupervised or reinforced learning [30]. 7. Approaches of using NN for solving UAV-specific CO problems There are several approaches of using NN for solving UAV-specific CO problems. Let us start with the RNN. 7.1. Usage of RNN Neural CO can be used to solve UAV-specific CO problems using RL. Let us consider two approaches based on gradient policy [32]. The first approach, called pre-training with reinforcement (PTR), uses a training kit to optimize a RNN, which performs parameterization of stochastic policy on UAVs routes test dataset, using the expected reward as a goal. During the tests, the reward is fixed, 50 which provides conclusions about the quality of solutions based on the use of greedy sampling. The second approach, called active search, does not involve prior RNN training. The search begins with a random policy and iteratively optimizes the RNN parameters on a single UAVs routes test instance, also using the expected reward as a goal, while maintaining the best solution selected during the search. The combination of PTR and active search works best in practice [4]. 7.2. Approximation and creation of new policies There are different ways to use ML to solve UAV-specific CO problems. There are two main motivations for using ML: approximation and the creation of new policies. There are also various ways to combine ML and traditional algorithmic elements. As it was mentioned earlier, there are two main ways of ML usage – with and without expertise knowledge. Both motivations are identified within the MDP states/actions described in the reinforced learning section. In this case, the decision-making function is called a policy that, given all available information (state of the environment, if the information is sufficient to fully characterize the environment in the MDP at this step), issues output (possibly stochastically), which should be performed in the next step of the algorithm. Policy is a function that we want to teach with the help of ML. In the case of using ML to obtain approximate solutions, for example, for UAVs VRP problem, policy is often taught using simulation training, through demonstrations, with expert behavior shown (demonstrated) by the ML model. This process is illustrated in Figure 3. When using this approach, the NN is trained not to optimize performance, but to blindly follow the actions of the expert. Figure 3: The process of the NN simulation training When using simulation training (not only for UAV-specific tasks, but in general), the policy learns to reproduce the actions of expert policy by minimizing some differences in the space of action. In the case where the goal is to create new policies, i.e., to optimize the solution-finding function from the scratch, as it is often needed for complicated UAV-specific problems, the policy can be learned through reinforcement learning, as shown in Figure 4. Even if we present the problem of learning under the main MDP through RL, this does not limit the use of basic RL algorithms (approximate dynamic programming and gradient policy) to maximize the expected rewards amount. Alternative optimization methods, such as greedy algorithms, genetic algorithms, direct/local search, can also be used to solve RL problems. Figure 4: Experience-based RL In the case of RL a reward signal is used, no expert participates in the training process; only the maximization of the expected amount of future rewards (return) matters. It is important to understand that when using simulation training, the NN learns with controlled goals set by the expert for each action (and without reward), whereas in the case of experiential learning, the NN learns by receiving a reward (possibly with a delay) and using reinforced training (without an expert). When using simulation training, the NN is taught what to do, while in training with reinforced NN, it is recommended to quickly accumulate rewards. The distinction between these two parameters is a much more complex and broader issue than the information provided here, and should be considered in general, not only for mentioned UAV-specific tasks. Results of approaches to use NN for solving UAV-specific CO problems comparison are presented in Table 1. 51 7.3. The method of branches and boundaries NN can be used to improve the quality of solutions (including for TSP and various VRP problems, specific for UAVs) by improving the current local solution with cutting planes (boundaries) [33]. Therefore, it is proposed to determine the improvement associated with the consideration of specific submatrices. This will also affect the quality of the sections, which can be separated from the same submatrices. In this context, controlled (simulated) offline learning is used to approach the optimal solution of the CO problem related to the choice of submatrix, and subsequently trained NN can be quickly applied to select the most promising submatrices without the computational burden of solving NP-hard tasks. Of course, the most promising submatrices correspond to the most promising cut-out planes and, according to [33], it is possible to train NNs exclusively to solve the problem, to add only the most promising cut-off planes. Table 1 The results of approaches to use NN for solving UAV-specific CO problems comparison ML usage approach Pros Cons Where to use Simulation Can avoid local Demands expert Well known CO problems training minimums knowledge for whole training with known optimal May be trained quicker process (not applicable for distribution of source data than with RL complicated exemplars of UAV VRP problems) Can solve problems Mostly demands more CO problems with Reinforcement with unknown optimal time for training, than unknown optimal learning data distribution simulation training distribution of source data No need of expert (complicated exemplars of knowledge UAV VRP problems) Among many heuristics, a well-functioning approach is the use of strong branching [34]. When using it, before making any decision on branching, the strong branching algorithm performs one step forward, pre-considering the approximate branching on many potential variables, calculates the current solution by the maximum likelihood method to obtain a potential improvement of the current solution, and finally the use of strong branching together with the method of branches and borders provides the best improvement of the output. However, even if you do not study all the variables, but only approximate the current value of the solution, it is still a computationally expensive strategy. 7.4. NN practical application for UAV-specific CO problems Considering the task of the TSP for UAV route, presented on the graph described above, it is easy to make a greedy heuristic that builds a route, consistently selecting nodes among those that have not yet been visited, and therefore, building a permutation. The choice of the nearest node is an intuitive choice of criterion, but in most cases, it is far from optimal. Therefore, a greedy heuristic structure is built, where the node selection policy is studied using NN on a graph, a type of NN capable of processing at the input of a graph of any limited size using a message transmission mechanism [9]. Thus, to select each node, the input of the NN getting a graphical image of the problem, supplemented by functions that indicate which of the nodes has already been visited. At the output of the NN issues the selected next node. This selected node is used to train the network through RL, and the partial length of the route is used as a reward. Of course, there are much better algorithms that allow you to get better solutions. However, it should be noted that given the above information, NN can be used to identify new, potentially more effective policies for UAV-specific CO problems [35]. On a two-dimensional Euclidean graph, described above, containing up to 100 vertices that UAV should use, the RL for the TSP significantly exceeds the controlled approach to learning [28] and allows to obtain results close to optimal, if enough time was used for learning. These results provide 52 insights into how NN can be used as a tool to solve UAV-specific CO problems, especially those for which heuristics are difficult to develop. On KnapSack problem (another UAV-specific CO problem, applicable for big delivery UAVs) which is NP-hard, pretrained NN with RL and active search able to successfully solve KNAP50, KNAP100 and KNAP200 problems – all instances to optimality [4]. 8. Conclusions Thus, the paper reviews the approaches to the use of NN in the UAV-specific CO problems. It is determined that the use of NN (including deep learning NN) is possible in the tasks of CO for UAVs in routing problems, so they can be used to solve problems that arise when planning the operation of UAVs. At the same time, recurrent NNs with nonparametric normalized exponential functions of controlled learning are successfully used to solve the problems of CO. It has been determined that the most effective to date is the use of RL training to solve such problems. This method is recommended to use. Prospects for the use of research results: preparing mathematical model of the specific UAV- specific CA problem, developing an algorithm for solving this problem using the proposed NN architecture and comparing the results with the results of classical methods for solving such CA problem. 9. References [1] A. Naseem, S. T. H. Shah, S. A. Khan, A. W. Malik 2017 Decision support system for optimum decision making process in threat evaluation and weapon assignment: Current status, challenges and future directions Annual reviews in control 43, 169-187. https://doi.org/10.1016/j.arcontrol.2017.03.003 [2] L. F. Hulianytskyi, I. I. Riasna 2017 Formalization and classification of combinatorial optimization problems Optimization Methods and Applications (eds. Butenko S., Pardalos P. M., Shylo V.), Cham: Springer International Publishing AG 239–250 https://doi.org/10.1007/978-3-319-68640-0_11 [3] Y. Bengio, A. Lodi, A. Prouvost 2020 Machine Learning for Combinatorial Optimization: a Methodological Tour d'Horizon European Journal of Operational Research 290(2), pp. 405- 421 https://doi.org/10.1016/j.ejor.2020.07.063 [4] I. Bello et al. 2016 Neural combinatorial optimization with reinforcement learning, arXiv preprint arXiv: 1611.09940. [5] M. Fortun, S. S. Schweber 1993 Scientists and the legacy of World War ІІ: The case of operations research (or) Social Studies of Science 23(4), pp. 595–642 https://doi.org/10.1177/030631293023004001 [6] Y. LeCun, Y. Bengio, G. Hinton 2015 Deep learning Nature, 521(7553), pp. 436-444 https://doi.org/10.1038/nature14539 [7] L. Wang, , S. Li, F. Tian, F. Xiuju 2004 A noisy chaotic neural network for solving combinatorial optimization problems: Stochastic chaotic simulated annealing, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 34(5), pp. 2119-2125 https://doi.org/10.1109/tsmcb.2004.829778 [8] E. Khalil et al. 2017 Learning combinatorial optimization algorithms over graphs Advances in Neural Information Processing Systems 6348-6358 https://doi.org/10.5555/3295222.3295382 [9] Z. Li, Q. Chen, V. Koltun 2018 Combinatorial optimization with graph convolutional networks and guided tree search Advances in Neural Information Processing Systems 539-548. [10] K. Lei, P. Guo, Y. Wang, X. Wu, W. Zhao 2021 Solve routing problems with a residual edge- graph attention neural network arXiv preprint arXiv:2105.02730. [11] K. Braekers, K.Ramaekers, I. V. Nieuwenh 2016 The Vehicle Routing Problem: State of the Art Classification and Review Computers & Industrial Engineering 99, 300–313 https://doi.org/10.1016/j.cie.2015.12.007 [12] V. P. Horbulin, L. F. Hulianytskyi, I. V. Sergienko 2020 Optimization of UAV Team Routes in the Presence of Alternative and Dynamic Depots Cybernetics and Systems Analysis 56(2), pp. 195–203 https://doi.org/10.1007/s10559-020-00235-8 [13] O. Turchyn 2007 Comparative analysis of metaheuristics solving combinatorial optimization 53 problems 2007 9th International Conference - The Experience of Designing and Applications of CAD Systems in Microelectronics. IEEE https://doi.org/10.1109/cadsm.2007.4297548 [14] A. Oliinyk, I. Fedorchenko, A. Stepanenko, M. Rud, D. Goncharenko 2019 Combinatorial optimization problems solving based on evolutionary approach, 2019 IEEE 15th International Conference on the Experience of Designing and Application of CAD Systems (CADSM) pp. 41-45 https://doi.org/10.1109/cadsm.2019.8779290 [15] I. V. Sergienko, L. F. Hulianytskyi, S. I. Sirenko 2009 Classification of applied methods of combinatorial optimization Cybernetics and Systems Analysis 45(5), pp. 732-741 https://doi.org/10.1007/s10559-009-9134-0 [16] S. Lin 1965 Computer solutions of the traveling salesman problem Bell System Technical Journal 44(10), pp. 2245-2269 https://doi.org/10.1002/j.1538-7305.1965.tb04146.x [17] A. Hamacher, C. Moll 1996 The Euclidian Traveling Salesman Selection Problem Operations Research Proceedings 1995, Springer, Berlin, Heidelberg 54-59 https://doi.org/10.1007/978- 3-642-80117-4_10 [18] R. C. Larson, A. R. Odoni 1981 Urban operations research No. Monograph. [19] M. Fischetti, A. Lodi 2011 Heuristics in Mixed Integer Programming Wiley Online Library 3, 2199–2204 https://doi.org/10.1002/9780470400531.eorms0376 [20] M. Gendreau, J. Y. Potvin (Ed.) 2010 Handbook of metaheuristics, volume 2 Springer https://doi.org/10.1007/978-3-319-91086-4 [21] D.Naddef, G. Rinaldi 2002 Branch-and-cut algorithms for the capacitated VRP The vehicle routing problem. Society for Industrial and Applied Mathematics, pp. 53-84 https://doi.org/10.1137/1.9780898718515.ch3 [22] J. J. Hopfield, D. W. Tank 1985 ”Neural” computation of decisions in optimization problems Biological Cybernetics 52(3), pp. 141–152 https://doi.org/10.1007/BF00339943 [23] G. V. Wilson, G. S. Pawley 1988 On the stability of the travelling salesman problem algorithm of hopfield and tank Biological Cybernetics 58(1), pp. 63–70 https://doi.org/10.1007/bf00363956 [24] Bishop, Christopher M. 2006 Pattern Recognition Machine Learning 128(9). [25] K. P. Murphy 2012 Machine Learning: A Probabilistic Perspective MIT press https://doi.org/10.5555/2380985 [26] O. Alagoz, H. Hsu, A. J. Schaefer, M. S. Roberts 2010 Markov decision processes: a tool for sequential decision making under uncertainty Medical Decision Making, 30(4), pp. 474-483 https://doi.org/10.1177/0272989x09353194 [27] R. S. Sutton, A. G. Barto 2018 Reinforcement Learning: An Introduction MIT press Cambridge, second edition https://doi.org/10.5555/3312046 [28] O. Vinyals, M. Fortunato, N. Jaitly 2015 Pointer networks Advances in Neural Information Processing Systems, pp. 2692–2700. [29] A. Vaswani et al. 2017 Attention is All you Need Advances in neural information processing systems 30, pp. 5998–6008. [30] I. Goodfellow, Y. Bengio, A. Courville 2016 Deep Learning Cambridge: MIT press, 1(2) https://doi.org/10.1007/s10710-017-9314-z [31] P. Veliˇckovi´c et al. 2018 Graph attention networks International Conference on Learning Representations https://doi.org/10.17863/CAM.48429 [32] R. J. Williams 1992 Simple statistical gradient-following algorithms for connectionist reinforcement learning Machine learning 8(3-4), рp. 229-256 https://doi.org/10.1007/BF00992696 [33] R. Baltean-Lugojan, R. Misener, P. Bonami, A. Tramontani 2018 Strong sparse cut selection via trained neural nets for quadratic semidefinite outer-approximations Technical report, Imperial College, London. [34] D. Applegate, R. Bixby, V. Chv´atal, and W.Cook 2007 The traveling salesman problem. A computational study Princeton University Press https://doi.org/10.1515/9781400841103 [35] V.Korolyov, M. Ogurtsov and A. Khodzinsky 2021 Statement of the Problem of Complete Set of UAV Group on the Basis of Models of Granular Calculations and Fuzzy Logic Cybernetics and Computer Technologies 2, pp. 25–38 https://doi.org/10.34229/2707-451X.21.2.3 54