Review of Neural Networks Application in UAV Routing
Problems
Maksym Ogurtsov
V.M. Glushkov Institute of Cybernetics of National Academy of Sciences of Ukraine, Akademika Glushkova
Avenue, 40, Kyiv, 03187, Ukraine

                 Abstract
                 The paper reviews the ways and methods to the neural networks (NN) usage for solving
                 combinatorial optimization (CO) problems, appearing when using unmanned aerial vehicles
                 and based on their specifics. It is determined that the use of neural networks (with supervised
                 learning, reinforcement learning and deep learning) is possible for many types of CO routing
                 problems (salesman's problem, VRP problem in different versions, etc.) and other unmanned
                 aerial vehicles CO problems. NN with reinforcement learning, as well as recurrent NN with
                 controlled learning may be successfully used to directly solve combinatorial optimization
                 problems mentioned above, or to adjust combinatorial optimization algorithms parameters.

                 Keywords 1
                 Neural networks, combinatorial optimization, VRP, unmanned aerial vehicles, deep learning

1. Introduction
    Today, decision support tools are used more and more often all over the world, for example, in
transportation, military affairs, supply chains and logistics, energy, finance, and operations planning
[1, 2]. But the use of neural networks (NN) in these tools is quite limited yet [3, 4].
    Research and development of decision support tools, which are also called administrative
analytics, began during World War II as an initiative to use mathematics and computer science to help
military planners to make decisions [5]. Problem statement: the analysis of the possible ways to use
algorithms of machine learning (ML) in NN to solve problems of combinatorial optimization (CO)
[2], appearing when using unmanned aerial vehicles (UAVs), consisting of UAVs, their operators,
etc., for which the distribution of initial data is unknown in advance [3] to find out possibility of use
ML for solving such problems. Typically, the use of NN is well suited for processing signals of
natural origin, for which it is difficult to perform formalization and clear mathematical formulation
[3], because the distribution of initial data in this case cannot be determined analytically, for example,
image processing, text, voice, or molecule analysis. etc. The same applies for the UAV CO problems,
especially – UAV routing problems. Significant progress has recently been made with the use of
machine learning – through in-depth learning [6]. Deep learning provides better results than classical
algorithms when used in multidimensional spaces with large amounts of input data.
    Routing problems that arise when using UAVs has led to the appearance of numerous studies
conducted in recent decades. There were many attempts (including successful ones) to solve the
problems of CO with the use of ML. They were considered by relevant researchers and publications
all over the world [3, 4, 7, 8, 9, 10, 11, 12]. World experience in the UAV development shows that in
the next 10-15 years UAVs will be able to perform most of the tasks that are currently solved by
manned aerial vehicles. And in Ukraine, due to the difficult political and economic situation, a deeply
rational solution will be to use the methods of CO to increase the efficiency of such tasks for UAVs.
Moreover, the rapid expansion of UAV applications in recent years has created a need for new classes
of problems, like route building for swarms of UAVs, performing same task as a team etc.

Information Technology and Implementation (IT&I-2021), December 01–03, 2021, Kyiv, Ukraine
EMAIL: ogurtsov.maksym@incyb.kiev.ua (A. 1)
ORCID 0000-0002-6167-5111 (A. 1)
            © 2022 Copyright for this paper by its authors.
            Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
            CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                             45
2. Analysis of recent research and publications
    As it was mentioned earlier, recently, significant progress has been made with the use of ML –
through in-depth training [3]. The possible use of ML to solve the CO problems was considered by
both foreign [3, 4, 7, 8, 9, 10, 11] and domestic researchers [12, 13, 14, 15]. We will focus on such
discrete optimization problems as integer optimization with constraints, i.e., optimization problems
with integer / binary variables, which is applicable for routing optimization problems, actual for
UAVs. Although not all such problems are difficult to solve (for example, the problem of finding the
shortest path), we will focus on NP-complex problems of CO, specific for UAVs. For these problems,
the existence of an algorithm whose operating time is a polynomial of the size of the input data is
considered unlikely. However, in practice, CA algorithms can find solutions to problems containing
up to millions of variables, parameters, and constraints.
    How NP-complex problems can be solved in a reasonable time? Let us consider the classical
example of an NP-complex problem, applicable for UAV routing task – the traveling salesman
problem (TSP). TSP may be defined in a graph where we look for a minimum length solution by
visiting each node once and only once [16]. For the UAVs TSP is applicable for the tasks, when we
have one UAV, that must visit some set of coordinates (for delivering something, surveillance, or
destroying targets if we are talking about military applications). Let us focus on the Euclidean
traveling salesman problem [17]. In this version, each node has its assigned coordinates in Euclidean
two-dimensional space (or more generally, in a vector space with an arbitrary number of dimensions),
and the objective function for the segment connecting the two nodes is the Euclidean distance
between them. For most cases Euclidean two-dimensional space is applicable for UAVs, because
although they are changing their altitude during flight, mostly it is happening during taking off and
landing and while on the route most of UAVs are holding more-less the same altitude. So Euclidean
two-dimensional space is completely applicable for the UAV routes building.
    Although theoretically Euclidean example of the problem is as complex as the general case of the
TSP, the approximate solution can be effectively found in Euclidean space using the structure of
graphs [18]. But sometimes we should consider that for UAVs three-dimensional Euclidean space
should be used instead of two-dimensional to get valid solutions, for example, when we are looking
for solution of the problem in the mountains, or delivery task in the city with skyscrapers (because in
such cases sometimes UAV will have to change its altitude between two route points even more than
distance between these two points in two-dimensional space. But for most cases of UAV’ specific
tasks two-dimensional space would be enough. There is a large amount of literature on heuristic
algorithms, i.e., algorithms designed to calculate "fairly good in practice" solutions of CO problems
without guaranteeing optimality. These heuristic methods playing a central role in CO for UAVs and
some of them will be considered in this paper. The most popular are the following [19, 20]. We
should notice two main motivations for using ML: approximation and the creation of new policies.
    According to current research, next types of NN are applicable for solving UAV’ CO problems:
    • Hopfield Network
    • Supervised learning
    • Unsupervised learning
    • Reinforcement learning
    • Deep learning
    There are two popular ways for the ML use to solve CO problems (and not only for UAV’ specific
ones) – with and without expert knowledge. In this case, the decision-making function is called a
policy that, given all available information (state of the environment, if the information is sufficient to
fully characterize the environment at this step), issues output (possibly stochastically), which should
be performed in the next step of the algorithm. Policy is a function that we want to teach with the help
of ML. Consider how the above two motivations determine the parameters of learning.

3. Selection of previously unsolved parts of the overall problem
    The purpose of this work is to determine the applicability of NN to solve routing problems that
arise when using UAVs, i.e., to highlight promising areas of research of the NN usage to solve the CO
problems. NN can be successfully used for CA tasks that arise when using UAVs, primarily – when

                                                                                                        46
planning routes for UAVs and especially – for UAV groups. This is because the task of constructing
routes and allocating points to be visited between vehicles is a classic CO task – the task of VRP
(Vehicle Routing Problem) [21]. And such a task constantly arises when planning operations using
UAVs.

4. Novelty
   Although, as mentioned above, the use of NN for solving the CO problems is actively studied all
over the world, its use for UAV-specific tasks has not been studied enough to date.

5. General scientific significance
   The use of NN to solve the UAV-specific CA problems can improve the quality of the resulting
solutions, which will increase the efficiency of the UAVs use in general and potentially – to expand
the UAVs scope.

6. NN types, applicable for UAV-specific CO problems
   NN can help improve UAV-specific CO algorithms (and CO algorithms in general) in two ways.
First, the researcher can use expert knowledge of the CO algorithm to replace resource-intensive
calculations with rapid approximation. NN can be used to calculate such approximations. The
challenge is to explore the space for possible solutions and thus be able to choose the appropriate
policies that have led to the best of these solutions. Second, from the point of view of using NN to
solve CO problems, NN can decompose the problem into smaller, simpler problems. For example, if
we have group of UAVs that should visit set of points on the map (VRP problem), NN may be used to
divide set of target points between UAVs. Then we would have to solve just a set of TSP problems
(one per each UAV), which is much simpler task.
   Let us consider features, specific for UAVs CO problems. Mostly such tasks are TSP or VRP
problems, some other types also may be considered (for example, choosing optimal set of UAVs for
completing some certain task), but this is outside the scope of this work. Therefore, for TSP/VRP
problems, specific for UAVs, next features should be considered:
   • UAVs have limited time of work (limited maximum route length) – and if ground-based
        vehicle may wait some time without losing fuel (for example, in VRP with time windows
        when vehicle arrived at the point before time window start), for UAVs in most cases it is not
        possible – it can’t just land anywhere so would have to spend accumulator resource even while
        remaining on the same location;
   • For some tasks third dimension (altitude) should be taken into account, as it was described
        earlier;
   • Weather conditions affect UAVs more than ground vehicles or bigger airplanes, for example,
        wind (especially for small, lighter UAVs), cold (may affect UAVs accumulator capacity
        greatly) etc.;
   • UAV swarms are using more and more often, and therefore their specifics should be
        considered (which differs much from usual vehicles paradigm).
   Now let us consider the main types of NN that can be used to solve UAV-specific CO problems.

6.1.    Hopfield Network
   The first attempt to formalize the description of nerve cell function for applying in computational
algorithms was a model proposed in 1943 by McCulloch and Pitts. The McCulloch-Pitts model
became the starting point for building a simple unidirectional neural network called a perceptron.
Such a network was proposed and researched by Rosenblatt in the late 1950s and early 1960s. And in
1960, Adaline systems (Adaptive Linear Neuron) were proposed. Later, the Hopfield neural network
was suggested. The Hopfield neural network is a fully connected NN with a symmetric matrix of
connections [22]. If we modify the objective function of this NN so that it is equivalent to, for


                                                                                                   47
example, the objective function of the salesman problem for UAV, and use Lagrange multipliers to
set penalties for violating the constraints of the problem, then such a network can be used to solve not
only TSP but multiple other CO problems. Each neuron in the system yi can take one of two states
(like the output of a neuron with a threshold activation function):
                                                          1
                                                     yi   .                                        (1)
                                                           1
    A limitation of Hopfield networks is that they are sensitive to hyperparameters and parameter
initialization [23]. Although such neural networks have several promising properties, their practical
application has remained limited, in most cases only as a research work. For absolute most of the
UAV-specific CO tasks they are now non-applicable.

6.2.    Supervised learning
   In supervised learning during the input, we have a set of input data (functions) / target pairs, the
task is to find the function of converting input data into output, which for each set of input data
provides output that meets the specified constraints and has the value of the objective function closest
to the target. Finding such a function is called learning and may be completed by solving an
optimization problem on a family of available functions. The loss function, i.e., the degree of
discrepancy between the source and target values, can be selected depending on the task (regression,
classification, etc.) and optimization methods.
   Mathematically speaking, let X and Y, which have a common probability distribution P, be random
variables that represent the input data and the objective function. Let ℓ be a minimization function of
losses and let {fθ | θ ∈ Rp} is a family of machine learning models for optimization (in this case,
parametric). The problem of controlled learning is defined as:
                                            min EX ,Y P (Y , f ( X )).                              (2)
                                             R p

   Supervised learning is completely applicable for UAV-specific CO tasks, first, for various versions
of VRP problem, such as VRP with time windows, VRP with dynamic depos and so on. The only
limitation that supervised learning have for UAV-specific problems is a general flaw of the supervised
learning: pupil in most cases can’t overcome his teacher. It means that if target values for the problem
are non-optimal, then NN after learning completion would also have non-optimal results. So, for
complicated CO problems, for which we don’t have optimal solutions, supervised learning may not be
the best solution to use.

6.3.    Unsupervised learning
    In unsupervised learning [24], [25] the NN does not have the goals of the problem it solves.
Instead, it tries to get some characteristics of the general distribution of random variables observed
during solving the problem. Because unattended, unsupervised learning has so far received little
attention in whole CO context (not even speaking about its application for UAV-specific tasks) and its
immediate use seems difficult, this method would not be considered in detail. Its research and
application remain mostly theoretical.

6.4.    Reinforcement learning
    As it was said before, controlled learning is not applicable to the most UAV-specific CO problems
because it does not have access to the optimal distribution of source data. However, you can compare
the quality of the solution set using an evaluation data set for the validation and determine the reward
for the learning algorithm. Therefore, it is possible to adhere to the paradigm of neural reinforcement
learning (RL) to solve the UAV-specific CO problems. Moreover, even when using optimal solutions
as labeled data to optimize controlled learning, the quality of the results is quite poor compared to
machine RL, which receives a corresponding reward for good solutions [4, 26-27].


                                                                                                     48
    Defining the reward function is not always easy, and UAV-specific tasks are not the exception.
Because RL is built on a dynamic process, it is naturally able to anticipate states / transitions that lead
to future rewards. However, the above setting of rewards is difficult and not always effective, as it
does not allow learning until the agent (randomly or through advanced approaches) solves the
problem. Therefore, at the first step of algorithm we would have to get some solution, that later we
would improve. In addition, when an approximate policy is applied (for example, using linear
functions), learning does not guarantee finding the optimal solution and may easily fall to local lows,
so it should be considered as well.

6.5.    Deep learning
    Deep learning is a method of constructing NN, consisting of the large number of layers. In the case
of the simplest NN architecture, a NN with a direct connection, also called a multilayer perceptron
(MLP), the input data is sequentially transmitted through several layers. For each layer on the input
vector, an affine transformation is applied, followed by the application of a nonlinear scalar function
(called the activation function). The output from the layer, called the intermediate activation function,
is passed to the next layer. All affine transformations are independent and presented in practice as
different matrices of coefficients. They are trainable, i.e., optimized using a stochastic gradient
descent (SGD), an optimization algorithm used to minimize the selected loss function. Stochasticity
comes from the limited number of data points used to calculate losses before applying a gradient
update. Deep neural networks (DNNs) may be difficult to optimize, especially for UAV-specific VRP
tasks, so several methods and approaches have been developed to make o better optimization, often
by changing the architectural aspects of the network. Because DNNs have a large dimension amount,
i.e., they can correspond to essentially any data set, they are prone to excessive content problem.
DNMs are also subject to active regularization. Learning such networks with SGD also regulates them
through gradient noise, which makes DNNs generally reliable for excessive content problems. In
addition, there are many hyperparameters for DNN and various combinations of them are evaluated
(this approach is known as hyperparameter optimization). DNNs move away from more traditional
machine learning methods, using all available raw input data, such as all image pixels, to learn, while
traditional ML usually requires the use of a limited number of features specific to a particular task.
    Deep learning researchers have developed various methods for adapting to a variety of structured
input data in such a way as to process variable-sized data structures, such as variable-length
sequences, which is perfectly applicable for the UAV-specific VRP tasks.
    Consider the following modern techniques.

6.5.1. Recurrent neural networks
   The first presented architecture, applicable for UAV-specific CO tasks is recurrent neural networks
(RNN). RNNs can work on sequential data by exchanging parameters at different stages of the
sequence. More precisely, the same NN block is consistently applied at each step of the sequence, i.e.,
with the same values of architecture and parameters at each step of time. The specificity of such a
network is the presence of repeating layers: layers that take as input both the activation vector from
the previous layer and the actual activation vector at the previous stage of the sequence (it is called the
hidden state vector), as illustrated in Figure 1. On the left, a black square indicates a one-step delay.
On the right, the same RNN is shown unfolded. Three sets of parameters U, V and W are presented
and reused at each step.

6.5.2. Attention mechanism
   Another important technique that makes the UAV-specific problem invariant to input data sizes is
the attention mechanism [29, 30]. It can be used for data processing, where each point of the output
data corresponds to a set of input data. In this context, parameter exchange is used to consider the fact
that different sets of input do not have to be the same dimension. The attention mechanism is used to
request information about all the elements in the set and combine this information for further


                                                                                                        49
processing in the NN, as shown in Figure 2. The classical mechanism of attention – the query q is
calculated for a set of values (vi)i. The affinity function f is used for pairs of queries and sets of input
data values. If it includes some parameters, the mechanism of attention is amenable to learning.


       Figure 1: Recurrent neural networks architecture
   The affinity function takes a request at the input (which is any kind of contextual information and
informs where to focus) and a representation of the input set element (both are activation vectors) and
outputs a scalar. This process is repeated over all elements of the set for the same query. The obtained
scalarized source data are normalized (for example, using the softmax function) and used to determine
the weighted sum of the representations of the elements in the set. This weighted amount, in turn, can
be used in the NN.


       Figure 2: NN attention mechanism
    The attention mechanism can be used to construct NNs of graphs, i.e., NNs capable of processing
structured input data of graphs [31] for building UAVs’ routes. In this architecture, each node pays
attention to the set of its neighbors. The process is repeated several times to further collect
information about the nodes.
    Deep learning and RNN can be used for supervised, unsupervised or reinforced learning [30].

7. Approaches of using NN for solving UAV-specific CO problems
   There are several approaches of using NN for solving UAV-specific CO problems. Let us start
with the RNN.
7.1.    Usage of RNN
   Neural CO can be used to solve UAV-specific CO problems using RL. Let us consider two
approaches based on gradient policy [32]. The first approach, called pre-training with reinforcement
(PTR), uses a training kit to optimize a RNN, which performs parameterization of stochastic policy on
UAVs routes test dataset, using the expected reward as a goal. During the tests, the reward is fixed,

                                                                                                         50
which provides conclusions about the quality of solutions based on the use of greedy sampling.
    The second approach, called active search, does not involve prior RNN training. The search begins
with a random policy and iteratively optimizes the RNN parameters on a single UAVs routes test
instance, also using the expected reward as a goal, while maintaining the best solution selected during
the search. The combination of PTR and active search works best in practice [4].

7.2.    Approximation and creation of new policies
   There are different ways to use ML to solve UAV-specific CO problems. There are two main
motivations for using ML: approximation and the creation of new policies. There are also various
ways to combine ML and traditional algorithmic elements.
   As it was mentioned earlier, there are two main ways of ML usage – with and without expertise
knowledge. Both motivations are identified within the MDP states/actions described in the reinforced
learning section. In this case, the decision-making function is called a policy that, given all available
information (state of the environment, if the information is sufficient to fully characterize the
environment in the MDP at this step), issues output (possibly stochastically), which should be
performed in the next step of the algorithm. Policy is a function that we want to teach with the help of
ML. In the case of using ML to obtain approximate solutions, for example, for UAVs VRP problem,
policy is often taught using simulation training, through demonstrations, with expert behavior shown
(demonstrated) by the ML model. This process is illustrated in Figure 3. When using this approach,
the NN is trained not to optimize performance, but to blindly follow the actions of the expert.


       Figure 3: The process of the NN simulation training
    When using simulation training (not only for UAV-specific tasks, but in general), the policy learns
to reproduce the actions of expert policy by minimizing some differences in the space of action.
    In the case where the goal is to create new policies, i.e., to optimize the solution-finding function
from the scratch, as it is often needed for complicated UAV-specific problems, the policy can be
learned through reinforcement learning, as shown in Figure 4. Even if we present the problem of
learning under the main MDP through RL, this does not limit the use of basic RL algorithms
(approximate dynamic programming and gradient policy) to maximize the expected rewards amount.
Alternative optimization methods, such as greedy algorithms, genetic algorithms, direct/local search,
can also be used to solve RL problems.


       Figure 4: Experience-based RL
   In the case of RL a reward signal is used, no expert participates in the training process; only the
maximization of the expected amount of future rewards (return) matters.
   It is important to understand that when using simulation training, the NN learns with controlled
goals set by the expert for each action (and without reward), whereas in the case of experiential
learning, the NN learns by receiving a reward (possibly with a delay) and using reinforced training
(without an expert). When using simulation training, the NN is taught what to do, while in training
with reinforced NN, it is recommended to quickly accumulate rewards. The distinction between these
two parameters is a much more complex and broader issue than the information provided here, and
should be considered in general, not only for mentioned UAV-specific tasks. Results of approaches to
use NN for solving UAV-specific CO problems comparison are presented in Table 1.

                                                                                                      51
7.3.     The method of branches and boundaries
   NN can be used to improve the quality of solutions (including for TSP and various VRP problems,
specific for UAVs) by improving the current local solution with cutting planes (boundaries) [33].
Therefore, it is proposed to determine the improvement associated with the consideration of specific
submatrices. This will also affect the quality of the sections, which can be separated from the same
submatrices. In this context, controlled (simulated) offline learning is used to approach the optimal
solution of the CO problem related to the choice of submatrix, and subsequently trained NN can be
quickly applied to select the most promising submatrices without the computational burden of solving
NP-hard tasks. Of course, the most promising submatrices correspond to the most promising cut-out
planes and, according to [33], it is possible to train NNs exclusively to solve the problem, to add only
the most promising cut-off planes.

Table 1
The results of approaches to use NN for solving UAV-specific CO problems comparison

ML usage approach               Pros                       Cons                     Where to use
       Simulation         Can avoid local           Demands expert          Well known CO problems
       training           minimums           knowledge for whole training     with known optimal
                       May be trained quicker process (not applicable for distribution of source data
                         than with RL          complicated exemplars of
                                                 UAV VRP problems)
                         Can solve problems         Mostly demands more              CO problems with
    Reinforcement     with unknown optimal        time for training, than          unknown optimal
     learning            data distribution          simulation training       distribution of source data
                          No need of expert                                   (complicated exemplars of
                            knowledge                                             UAV VRP problems)
   Among many heuristics, a well-functioning approach is the use of strong branching [34]. When
using it, before making any decision on branching, the strong branching algorithm performs one step
forward, pre-considering the approximate branching on many potential variables, calculates the
current solution by the maximum likelihood method to obtain a potential improvement of the current
solution, and finally the use of strong branching together with the method of branches and borders
provides the best improvement of the output. However, even if you do not study all the variables, but
only approximate the current value of the solution, it is still a computationally expensive strategy.

7.4.     NN practical application for UAV-specific CO problems
    Considering the task of the TSP for UAV route, presented on the graph described above, it is easy
to make a greedy heuristic that builds a route, consistently selecting nodes among those that have not
yet been visited, and therefore, building a permutation. The choice of the nearest node is an intuitive
choice of criterion, but in most cases, it is far from optimal. Therefore, a greedy heuristic structure is
built, where the node selection policy is studied using NN on a graph, a type of NN capable of
processing at the input of a graph of any limited size using a message transmission mechanism [9].
Thus, to select each node, the input of the NN getting a graphical image of the problem, supplemented
by functions that indicate which of the nodes has already been visited. At the output of the NN issues
the selected next node. This selected node is used to train the network through RL, and the partial
length of the route is used as a reward. Of course, there are much better algorithms that allow you to
get better solutions. However, it should be noted that given the above information, NN can be used to
identify new, potentially more effective policies for UAV-specific CO problems [35].
    On a two-dimensional Euclidean graph, described above, containing up to 100 vertices that UAV
should use, the RL for the TSP significantly exceeds the controlled approach to learning [28] and
allows to obtain results close to optimal, if enough time was used for learning. These results provide

                                                                                                       52
insights into how NN can be used as a tool to solve UAV-specific CO problems, especially those for
which heuristics are difficult to develop. On KnapSack problem (another UAV-specific CO problem,
applicable for big delivery UAVs) which is NP-hard, pretrained NN with RL and active search able to
successfully solve KNAP50, KNAP100 and KNAP200 problems – all instances to optimality [4].

8. Conclusions
    Thus, the paper reviews the approaches to the use of NN in the UAV-specific CO problems. It is
determined that the use of NN (including deep learning NN) is possible in the tasks of CO for UAVs
in routing problems, so they can be used to solve problems that arise when planning the operation of
UAVs. At the same time, recurrent NNs with nonparametric normalized exponential functions of
controlled learning are successfully used to solve the problems of CO. It has been determined that the
most effective to date is the use of RL training to solve such problems. This method is recommended
to use. Prospects for the use of research results: preparing mathematical model of the specific UAV-
specific CA problem, developing an algorithm for solving this problem using the proposed NN
architecture and comparing the results with the results of classical methods for solving such CA
problem.

9. References
 [1] A. Naseem, S. T. H. Shah, S. A. Khan, A. W. Malik 2017 Decision support system for
      optimum decision making process in threat evaluation and weapon assignment: Current status,
      challenges and future directions Annual reviews in control 43, 169-187.
      https://doi.org/10.1016/j.arcontrol.2017.03.003
 [2] L. F. Hulianytskyi, I. I. Riasna 2017 Formalization and classification of combinatorial
      optimization problems Optimization Methods and Applications (eds. Butenko S., Pardalos P.
      M.,     Shylo     V.),   Cham:      Springer    International Publishing    AG    239–250
      https://doi.org/10.1007/978-3-319-68640-0_11
 [3] Y. Bengio, A. Lodi, A. Prouvost 2020 Machine Learning for Combinatorial Optimization: a
      Methodological Tour d'Horizon European Journal of Operational Research 290(2), pp. 405-
      421 https://doi.org/10.1016/j.ejor.2020.07.063
 [4] I. Bello et al. 2016 Neural combinatorial optimization with reinforcement learning, arXiv
      preprint arXiv: 1611.09940.
 [5] M. Fortun, S. S. Schweber 1993 Scientists and the legacy of World War ІІ: The case of
      operations research (or) Social Studies of Science 23(4), pp. 595–642
      https://doi.org/10.1177/030631293023004001
 [6] Y. LeCun, Y. Bengio, G. Hinton 2015 Deep learning Nature, 521(7553), pp. 436-444
      https://doi.org/10.1038/nature14539
 [7] L. Wang, , S. Li, F. Tian, F. Xiuju 2004 A noisy chaotic neural network for solving
      combinatorial optimization problems: Stochastic chaotic simulated annealing, IEEE
      Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 34(5), pp. 2119-2125
      https://doi.org/10.1109/tsmcb.2004.829778
 [8] E. Khalil et al. 2017 Learning combinatorial optimization algorithms over graphs Advances in
      Neural Information Processing Systems 6348-6358 https://doi.org/10.5555/3295222.3295382
 [9] Z. Li, Q. Chen, V. Koltun 2018 Combinatorial optimization with graph convolutional
      networks and guided tree search Advances in Neural Information Processing Systems 539-548.
 [10] K. Lei, P. Guo, Y. Wang, X. Wu, W. Zhao 2021 Solve routing problems with a residual edge-
      graph attention neural network arXiv preprint arXiv:2105.02730.
 [11] K. Braekers, K.Ramaekers, I. V. Nieuwenh 2016 The Vehicle Routing Problem: State of the
      Art Classification and Review Computers & Industrial Engineering 99, 300–313
      https://doi.org/10.1016/j.cie.2015.12.007
 [12] V. P. Horbulin, L. F. Hulianytskyi, I. V. Sergienko 2020 Optimization of UAV Team Routes
      in the Presence of Alternative and Dynamic Depots Cybernetics and Systems Analysis 56(2),
      pp. 195–203 https://doi.org/10.1007/s10559-020-00235-8
 [13] O. Turchyn 2007 Comparative analysis of metaheuristics solving combinatorial optimization

                                                                                                   53
     problems 2007 9th International Conference - The Experience of Designing and Applications
     of CAD Systems in Microelectronics. IEEE https://doi.org/10.1109/cadsm.2007.4297548
[14] A. Oliinyk, I. Fedorchenko, A. Stepanenko, M. Rud, D. Goncharenko 2019 Combinatorial
     optimization problems solving based on evolutionary approach, 2019 IEEE 15th International
     Conference on the Experience of Designing and Application of CAD Systems (CADSM) pp.
     41-45 https://doi.org/10.1109/cadsm.2019.8779290
[15] I. V. Sergienko, L. F. Hulianytskyi, S. I. Sirenko 2009 Classification of applied methods of
     combinatorial optimization Cybernetics and Systems Analysis 45(5), pp. 732-741
     https://doi.org/10.1007/s10559-009-9134-0
[16] S. Lin 1965 Computer solutions of the traveling salesman problem Bell System Technical
     Journal 44(10), pp. 2245-2269 https://doi.org/10.1002/j.1538-7305.1965.tb04146.x
[17] A. Hamacher, C. Moll 1996 The Euclidian Traveling Salesman Selection Problem Operations
     Research Proceedings 1995, Springer, Berlin, Heidelberg 54-59 https://doi.org/10.1007/978-
     3-642-80117-4_10
[18] R. C. Larson, A. R. Odoni 1981 Urban operations research No. Monograph.
[19] M. Fischetti, A. Lodi 2011 Heuristics in Mixed Integer Programming Wiley Online Library 3,
     2199–2204 https://doi.org/10.1002/9780470400531.eorms0376
[20] M. Gendreau, J. Y. Potvin (Ed.) 2010 Handbook of metaheuristics, volume 2 Springer
     https://doi.org/10.1007/978-3-319-91086-4
[21] D.Naddef, G. Rinaldi 2002 Branch-and-cut algorithms for the capacitated VRP The vehicle
     routing problem. Society for Industrial and Applied Mathematics, pp. 53-84
     https://doi.org/10.1137/1.9780898718515.ch3
[22] J. J. Hopfield, D. W. Tank 1985 ”Neural” computation of decisions in optimization problems
     Biological Cybernetics 52(3), pp. 141–152 https://doi.org/10.1007/BF00339943
[23] G. V. Wilson, G. S. Pawley 1988 On the stability of the travelling salesman problem algorithm
     of      hopfield     and      tank    Biological      Cybernetics      58(1),     pp.   63–70
     https://doi.org/10.1007/bf00363956
[24] Bishop, Christopher M. 2006 Pattern Recognition Machine Learning 128(9).
[25] K. P. Murphy 2012 Machine Learning: A Probabilistic Perspective MIT press
     https://doi.org/10.5555/2380985
[26] O. Alagoz, H. Hsu, A. J. Schaefer, M. S. Roberts 2010 Markov decision processes: a tool for
     sequential decision making under uncertainty Medical Decision Making, 30(4), pp. 474-483
     https://doi.org/10.1177/0272989x09353194
[27] R. S. Sutton, A. G. Barto 2018 Reinforcement Learning: An Introduction MIT press
     Cambridge, second edition https://doi.org/10.5555/3312046
[28] O. Vinyals, M. Fortunato, N. Jaitly 2015 Pointer networks Advances in Neural Information
     Processing Systems, pp. 2692–2700.
[29] A. Vaswani et al. 2017 Attention is All you Need Advances in neural information processing
     systems 30, pp. 5998–6008.
[30] I. Goodfellow, Y. Bengio, A. Courville 2016 Deep Learning Cambridge: MIT press, 1(2)
     https://doi.org/10.1007/s10710-017-9314-z
[31] P. Veliˇckovi´c et al. 2018 Graph attention networks International Conference on Learning
     Representations https://doi.org/10.17863/CAM.48429
[32] R. J. Williams 1992 Simple statistical gradient-following algorithms for connectionist
     reinforcement        learning      Machine        learning       8(3-4),      рp.     229-256
     https://doi.org/10.1007/BF00992696
[33] R. Baltean-Lugojan, R. Misener, P. Bonami, A. Tramontani 2018 Strong sparse cut selection via
     trained neural nets for quadratic semidefinite outer-approximations Technical report, Imperial
     College, London.
[34] D. Applegate, R. Bixby, V. Chv´atal, and W.Cook 2007 The traveling salesman problem. A
     computational study Princeton University Press https://doi.org/10.1515/9781400841103
[35] V.Korolyov, M. Ogurtsov and A. Khodzinsky 2021 Statement of the Problem of Complete Set
     of UAV Group on the Basis of Models of Granular Calculations and Fuzzy Logic Cybernetics
     and Computer Technologies 2, pp. 25–38 https://doi.org/10.34229/2707-451X.21.2.3


                                                                                                 54