<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Methods for Solving the Traveling Salesman Problem Based on Reinforcement Learning and Metaheuristics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eugene Fedorov</string-name>
          <email>fedorovee75@ukr.net</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olga Nechyporenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cherkasy State Technological University</institution>
          ,
          <addr-line>Shevchenko blvd., 460, Cherkasy, 18006</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>To date, the search for a solution to the traveling salesman problem is relevant for general and special-purpose intelligent computer systems. Currently, there is a problem of insufficient efficiency of methods for finding a solution to the traveling salesman problem. The aim of the work is to increase the efficiency of finding a solution to the traveling salesman problem through reinforcement learning based on Q-learning and SARSA, and a metaheuristic method based on the ant colony algorithm. To achieve this goal, a method based on Q-learning for the traveling salesman problem, a method based on SARSA for the traveling salesman problem, and an Ant-Q based method for the traveling salesman problem were created in the work. The advantages of the proposed methods include the following. Firstly, the modification of the Q-learning and SARSA methods through dynamic parameters makes it possible to increase the learning rate while maintaining the mean squared error of the method. Secondly, in the Ant-Q method with dynamic parameters, the ε-greedy Q-learning approach is used to select a new vertex, which is close to random search at initial iterations, and close to directed search at final iterations. This is ensured by the use of dynamic Q-learning parameters and makes it possible to increase the learning rate while maintaining the mean squared error of the method. Thirdly, in the Ant-Q method with dynamic parameters for calculating the change in the global pheromone level at initial iterations, the pheromone increment (current reward table for Q-learning parameters) plays the main role, which ensures the breadth of the search. In the final iterations, the previous pheromone level (global reward table for Qlearning parameters) plays the main role, which ensures the convergence of the method. This is ensured by the use of dynamic Q-learning parameters and makes it possible to increase the learning rate while maintaining the mean squared error of the method. The numerical study made it possible to evaluate the proposed methods (for the first and second methods, the number of iterations was 300; for the third method, the number of iterations was 10; for all three methods, the mean squared error was 0.05). The proposed methods make it possible to expand the scope of reinforcement learning and metaheuristics, which is confirmed by their adaptation for the specified optimization problem, and contributes to an increase in the efficiency of general and special-purpose intelligent computer systems. The prospect of further research is the study of the proposed methods for a wide class of artificial intelligence problems.</p>
      </abstract>
      <kwd-group>
        <kwd>1 reinforcement learning</kwd>
        <kwd>metaheuristics</kwd>
        <kwd>ant colony algorithm</kwd>
        <kwd>dynamic parameters</kwd>
        <kwd>traveling salesman problem</kwd>
        <kwd>pheromone level</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>To date, the development of methods aimed at solving the traveling salesman problem, which are
used in general and special-purpose intelligent computer systems, is an urgent task.</p>
      <p>Optimization methods that find the exact solution have a high computational complexity.
Optimization methods that find an approximate solution through directed search have a high
probability of hitting a local extremum. Random search methods do not guarantee convergence. Due
to this, there is a problem of insufficient efficiency of optimization methods, which needs to be
addressed.</p>
      <p>Metaheuristics (or modern heuristics) are used to speed up finding a solution to the traveling
salesman problem and reduce the probability of hitting a local extremum [1-3]. Metaheuristics
expands the possibilities of heuristics by combining heuristic methods based on a high-level strategy
[4-6]. The most promising metaheuristics are agent-based metaheuristics, which show the best results
when searching for a solution to the traveling salesman problem [7-8].</p>
      <p>Another popular approach is reinforcement learning [9].</p>
      <p>Currently, there is a trend towards the joint use of reinforcement learning and metaheuristics [10].</p>
      <p>The aim of the work is to increase the efficiency of finding a solution to the traveling salesman
problem through reinforcement learning and the metaheuristic method.</p>
      <p>To achieve this goal, it is necessary to solve the following tasks:
1. Create a Q-learning method for the traveling salesman problem.
2. Create a SARSA-based method for the traveling salesman problem.
3. Create an Ant-Q based method for the traveling salesman problem.</p>
      <p>4. Conduct a numerical study of the proposed optimization methods.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Formulation of the problem</title>
      <p>The problem of increasing the efficiency of solving the traveling salesman problem based on
reinforcement learning methods (Q-learning and SARSA) and the ant colony algorithm is presented as
the problem of finding such an ordered set of operators { A1 , A2 }, the iterative application of which
ensures finding a solution x* , such that F (x* ) → min , where x – is a vertex vector (route), F (⋅) – is
a target function (route length), A1 – is an operator that selects a vertex based on the ε- greedy
approach, A2 – is an operator that updates the reward table.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Literature review</title>
      <p>Existing metaheuristics and reinforcement learning methods have one or more of the following
advantages:
• computational complexity is lower than in traditional methods of exhaustive search (branch
and bound, dynamic programming, etc.) [10];
• the probability of hitting a local extremum is lower than in gradient methods [10].</p>
      <p>Existing metaheuristics and reinforcement learning methods have one or more of the following
disadvantages:
• there is only an abstract description of the method or the description of the method is focused
on solving only a specific problem [11];
• method convergence is not guaranteed [12];
• the influence of the iteration number on the process of finding a solution is not taken into
account [13];
• there is no possibility to solve problems of conditional optimization [14];
• insufficient method accuracy [15].
• the procedure for determining parameter values is not automated [16].</p>
      <p>This raises the problem of constructing efficient optimization methods.</p>
      <p>One of the most common and accurate metaheuristics is the ant colony algorithm, which was
proposed by Dorigo [18-20], further developed in [21-23] and implemented in software in [24].</p>
      <p>One of the most common and accurate reinforcement learning methods are Q-learning and SARSA
[9], which are based on the Bellman equation.</p>
      <p>Currently, hybrid metaheuristics are often used to control search [25-26].
4. Q-learning method with dynamic parameters for the traveling salesman
problem
0 &lt; ρ min &lt; ρ max &lt; 1 , parameters ε min ,ε max</p>
      <p>for the ε-greedy approach, 0 &lt; ε min &lt; ε max &lt; 1 ,
parameters θ min ,θ max (determine the importance of the future reward), 0 &lt;θ min &lt;θ max &lt; 1, are
specified.
1.2. The optimal vertices vector x* is defined by randomly ordering the set A .
1.3. The reward table is initialized as
2. Iteration number is n=1.
3. Parameters are calculated:</p>
      <p>
        Q = [Q(i, j)] , Q(i, j) = 0 , i, j ∈1, M .
6. An action (vertex) a is selected, to which it is necessary to move from vertex s, using an
εgreedy approach (if U (
        <xref ref-type="bibr" rid="ref1 ref2 ref3">0,1</xref>
        ) &lt; ε (n) , then choose an action (vertex) a randomly from the set of
allowed actions (vertices) A / Atabu , otherwise choose an action (vertex) a as the nearest neighbor
of vertex s from the set of allowed actions (vertices)
b ∈ A / Atabu ). The selected action (vertex) a is included in the set of forbidden actions (vertices)
Atabu , i.e. Atabu = Atabu  {a} , and becomes the new vertex of the vertices vector, i. e. x|Atabu | = a .
7. If there are no allowed actions (vertices) left, i.e. A / Atabu = ∅ , then go to step 12.
8. The current reward table element R(s, a) is calculated as the negative weight of the edge (s, a) ,
i.e. R(s, a) = −dsa .
9. A new state (vertex) e = a is observed.
10. An element of the reward table Q(s, a) is calculated as
      </p>
      <p>Q(s, a) = (1 − ρ (n))Q(s, a) + ρ (n)(R(s, a) +θ (n) max Q(e,b)), b ∈ A / Atabu .</p>
      <p>b
11. Set the current state (vertex) as s = a . Go to step 6.
12. If the best value of the target function at the current iteration is less than the best value of the
target function for all previous iterations, i.e. F (x) &lt; F (x* ) , then replace the best vertices vector,
i.e. x* = x .</p>
      <p>13. If not the last iteration, i.e. n &lt; N , then go to step 3.
5. SARSA-based method with dynamic parameters for the traveling salesman
problem
0 &lt; ρ min &lt; ρ max &lt; 1 , parameters ε min ,ε max</p>
      <p>for the ε-greedy approach, 0 &lt; ε min &lt; ε max &lt; 1 ,
parameters θ min ,θ max (determine the importance of the future reward), 0 &lt;θ min &lt;θ max &lt; 1, are
specified.
1.2. The optimal vertices vector x* is defined by randomly ordering the set A .
1.3. The reward table is initialized as</p>
      <p>
        Q = [Q(i, j)] , Q(i, j) = 0 , i, j ∈1, M .
2. Iteration number is n=1.
3. Parameters are calculated:
5. The set of prohibited actions (vertices) Atabu = {1}is initialized.
6. An action (vertex) a is selected, to which it is necessary to move from vertex s, using an
εgreedy approach (if U (
        <xref ref-type="bibr" rid="ref1 ref2 ref3">0,1</xref>
        ) &lt; ε (n) , then choose an action (vertex) a randomly from the set of
allowed actions (vertices) A / Atabu , otherwise choose an action (vertex) a as the nearest neighbor
of vertex s from the set of allowed actions (vertices)
A / Atabu , i.e. a = arg max Q(s, b) ,
b
Atabu , i.e. Atabu = Atabu  {a} , and becomes the new vertex of the vertices vector, i. e. x|Atabu | = a .
7. If there are no allowed actions (vertices) left, i.e. A / Atabu = ∅ , then go to step 13.
8. The current reward table element R(s, a) is calculated as the negative weight of the edge (s, a) ,
i.e. R(s, a) = −dsa .
9. A new state (vertex) e = a is observed.
10. An action (vertex) c is selected, to which it is necessary to move from vertex e, using an
εgreedy approach (if U (
        <xref ref-type="bibr" rid="ref1 ref2 ref3">0,1</xref>
        ) &lt; ε (n) , then choose an action (vertex) c randomly from the set of
allowed actions (vertices) A / Atabu , otherwise choose an action (vertex) c as the nearest neighbor
of vertex e from the set of allowed actions (vertices)
A / Atabu , i.e. c = arg max Q(e, b) ,
b
b ∈ A / Atabu ). The selected action (vertex) c is included in the set of forbidden actions (vertices)
Atabu , i.e. Atabu = Atabu  {c}, and becomes the new vertex of the vertices vector, i. e. x|Atabu | = c .
11. An element of the reward table Q(s, a) is calculated as
      </p>
      <p>Q(s, a) = (1 − ρ (n))Q(s, a) + ρ (n)(R(s, a) +θ (n)Q(e, c)) .
12. Set the current state (vertex) as s = a . Set the current state (vertex) as a = c . Go to step 7.
13. If the best value of the target function at the current iteration is less than the best value of the
target function for all previous iterations, i.e. F (x) &lt; F (x* ) , then replace the best vertex vector, i.e.
x* = x .</p>
      <p>14. If not the last iteration, i.e. n &lt; N , then go to step 3.
6. Ant-Q
problem
method</p>
      <p>
        with dynamic parameters for the traveling salesman
x – vertices vector.
(
        <xref ref-type="bibr" rid="ref5">3</xref>
        )
The method consists of the following steps:
1. Initialization.
1.1. The maximum number of iterations N , the length of the vertices vector M , population size
K , the discrete set of states (vertices) S = {1,..., M } , the discrete set of actions (vertices)
A = {1,..., M }, the matrix of edge weights [dij ] , i, j ∈1, M , parameters ρ min , ρ max (control the
learning rate), 0 &lt; ρ min &lt; ρ max &lt; 1 , parameters ε min ,ε max for the modified ε-greedy approach,
0 &lt; ε min &lt; ε max &lt; 1 , parameters θ min ,θ max (determine the importance of the future reward),
0 &lt;θ min &lt;θ max &lt; 1, are specified.
1.2. The optimal vertices vector x* is defined by randomly ordering the set A .
1.3. The reward table is initialized (pheromone level) as
      </p>
      <p>Q = [Q(i, j)] , Q(i, j) = 0 , i, j ∈1, M .
2. Iteration number is n=1.
3. Parameters are calculated:
ρ (n) = ρ max − (ρ max − ρ min ) n −1 ,</p>
      <p>N −1
θ (n) =θ min + (θ max −θ min ) n −1 .</p>
      <p>N −1
4. The initial state (vertex) sk = 1 is observed, which becomes a new vertex of the vertices vector,
i.e. xk1 = 1 , k ∈1, K .
5. The set of prohibited actions (vertices) Aktabu = {1} , k ∈1, K is initialized.
6. Ant number is k = 1 .
7. Calculate the transition probabilities of the k th ant from vertex sk to other vertices
psk b =  l∈A∑(\1Akt−a(b1uρ−(ρn)()nQ))(Qsk(,sbk), +b)ρ+(ρn)((n1)/(d1s/k db)sk b ) , b ∈ A \ Aktabu , b ∈1, M .</p>
      <p>
        0, b ∉ A \ Aktabu
8. An action (vertex) ak is selected, to which it is necessary to move from vertex sk , using the
modified ε-greedy approach (if U (
        <xref ref-type="bibr" rid="ref1 ref2 ref3">0,1</xref>
        ) &lt; ε , then choose an action ak , satisfying the inequality
ak −1 ak
∑b=1 psk b &lt; U (
        <xref ref-type="bibr" rid="ref1 ref2 ref3">0,1</xref>
        ) ≤ ∑b=1 psk b , from the set of allowed actions (vertices) A / Atabu , otherwise choose
an action (vertex) a as the nearest neighbor of vertex s from the set of allowed actions (vertices)
A / Atabu , i.e. aл = arg mbax{(1 − ρ (n))Q(sk , b) + ρ (n)(1/ dsk b )} , b ∈ A \ Aktabu ). The selected
action (vertex) is included in the set of forbidden actions (vertices), i.e. Atabu = Atabu  {ak } , and
k k
becomes the new vertex of the vertices vector, i. e. xk,|Aktabu | = ak .
9. A new state (vertex) ek = ak is observed.
10. An element of the reward table (pheromone level) Q(sk , ak ) is calculated as
      </p>
      <p>Q(sk , ak ) = (1 − ρ (n))Q(sk , ak ) + ρ (n)θ (n) max Q(ek , b) , b ∈ A / Aktabu .</p>
      <p>b
11. Set the current state (vertex) as sk = ak .
12. If the current ant is not the last one, i.e. k &lt; K , then increase the ant number, i.e. k = k + 1 , go
to step 7.
13. If there are still allowed actions (vertices) left, i.e. A / AKtabu ≠ ∅ , then go to step 6.
14. If the best value of the target function at the current iteration is less than the best value of the
target function for all previous iterations, i.e. min F (xk ) &lt; F (x*) , then replace the best vertices
k∈1,K
vector, i.e. x* = arg min F (xk ) .</p>
      <p>k∈1,K
15. The table of current rewards R(s, a) is calculated as</p>
      <p>K χ
R(s, a) = ∑ Z sa (xk ) , s ∈1, M −1 , a ∈ i + 1, M ,
k =1 F (xk )
χ Z sa (xk ) = 1, xk ∈ Zsa ,</p>
      <p>0, xk ∉ Zsa
where Zsa – the set of vertices vectors that contain edges (s, a) or (a, s) .</p>
      <p>16. The reward table (pheromone level) is calculated as</p>
      <p>Q(s, a) = (1 − ρ (n))Q(s, a) + ρ (n)R(s, a) , s ∈1, M −1 , a ∈ i + 1, M .
17. If the current iteration is not the last one, i.e. n &lt; N , then increase the iteration number, i.e.
n = n + 1, go to step 3, otherwise stop.</p>
      <p>
        Comment. U (
        <xref ref-type="bibr" rid="ref1 ref2 ref3">0,1</xref>
        ) – is a function that returns a uniformly distributed random number in the range
[0,1] .
      </p>
    </sec>
    <sec id="sec-4">
      <title>7. Experiments and results</title>
      <p>The numerical study of the proposed optimization methods was carried out using the Python
package.</p>
      <p>For Q-learning methods, SARSA, Ant-Q with dynamic parameters, parameters values
ρ min = 0.1, ρ max = 0.9 (control learning rate, evaporation rate coefficients), parameters values
ε min = 0.1,ε max = 0.9 for ε-greedy approach, parameters values θ min = 0.1,θ max = 0.9 (determine
importance of the future reward).</p>
      <p>For the Ant-Q method with dynamic parameters, the ant population size is K = 20 .</p>
      <p>For the problem "about the shortest path in the world of tiles", the search for a solution was carried
out on the standard database https://digitalcommons.du.edu/gridmaps2D, which is described in [25]
(traditionally used to test methods for solving problems of finding routes in the world of tiles).</p>
      <p>1,1</p>
      <p>1
0,9
0,8
0,7
)0,6
n
(
a
t
te0,5
0,4
0,3
0,2
0,1
0</p>
      <p>The dependence (Figure 1) of parameter θ (n) on the iteration number n shows that its share
increases with the iteration number.</p>
      <p>The dependence of parameters ρ (n) and ε (n) is determined as
ρ (n) = ρ max − (ρ max − ρ min ) n −1 and ε (n) = ε max − (ε max −ε min ) n −1 and is shown in Figure 2.</p>
      <p>N −1 N −1</p>
      <p>The dependence (Figure 2) of parameters ρ (n) and ε (n) on the iteration number n shows that
their share decreases with an increase in the iteration number.
0,9
0,8
0,7
)
(sn0,6
p
e
,)
(n0,5
o
r
0,4
0,3
0,2
0,1
0</p>
      <p>The results of comparison of the proposed methods with the methods based on the ant colony
algorithm without simulated annealing and random pheromone level, described in [16–22], are
presented in Table 1.</p>
      <p>The results of comparison of the proposed Q-learning method with dynamic parameters and with
the traditional Q-learning method based on the mean squared error criterion and the number of
iterations for solving the traveling salesman problem are presented in Table 1. Similar results were
obtained for the proposed SARSA method with dynamic parameters and with the traditional SARSA
method.</p>
      <p>The results of comparing the proposed Ant-Q method with dynamic parameters and with the
traditional Ant-Q method based on the mean squared error criterion and the number of iterations for
solving the traveling salesman problem are presented in Table 2.</p>
    </sec>
    <sec id="sec-5">
      <title>8. Discussion</title>
      <p>Advantages of the proposed methods:
1. Modification of Q-learning and SARSA methods using dynamic parameters allows to increase
the learning rate while maintaining the mean squared error of the method. The learning rate of the
modified Q-learning and SARSA methods using dynamic parameters increased by about 7 times
compared to the known methods.
2. In the Ant-Q method with dynamic parameters, the ε-greedy Q-learning approach is used to
select a new vertex, which is close to random search at initial iterations, and close to directed
search at final iterations. This is achieved by using dynamic Q-learning parameters and makes it
possible to increase the learning rate while maintaining the mean squared error of the method
(Table 2). The learning rate of the modified Ant-Q methods using dynamic parameters increased
by 10 times, compared to the known methods.
3. In the Ant-Q method with dynamic parameters to calculate the change in the global pheromone
level in the initial iterations, the pheromone increment (current reward table for Q-learning
parameters) plays the main role, which ensures the breadth of the search, and in the final iterations
the previous pheromone level plays the main role (global reward table for Q-learning parameters),
which ensures the convergence of the method. This is achieved by using dynamic Q-learning
parameters and makes it possible to increase the learning rate while maintaining the mean squared
error of the method (Table 2).
9. Conclusions
1. The paper proposes a modification of the Q-learning and SARSA methods by using dynamic
parameters in the reward table update rule, which makes it possible to increase the learning rate.
2. The paper proposes a modification of the Ant-Q method by using dynamic parameters in the
reward table update rule and the ε-greedy vertex selection approach, which makes it possible to
increase the learning rate.
3. The proposed optimization methods, due to the study of the entire search space at the initial
iterations and the search directionality at the final iterations, make it possible to ensure high
accuracy in solving the "traveling salesman" problem.</p>
      <p>Practical value. The proposed methods allow expanding the scope of reinforcement learning and
metaheuristics, which is confirmed by their adaptation for the specified optimization problem, and
improves the efficiency of general and special-purpose intelligent computer systems.
The prospect of further research is to investigate the proposed methods for a wide class of artificial
intelligence problems.
10.References
[7] S. Subbotin, A. Oliinyk, V. Levashenko, E. Zaitseva, Diagnostic rule mining based on artificial
immune system for a case of uneven distribution of classes in sample, Communications, volume
3 (2016) 3-11.
[8] X.-S. Yang, Optimization Techniques and Applications with Examples, Hoboken, New Jersey:</p>
      <p>Wiley &amp; Sons, 2018. doi: 10.1002/9781119490616.
[9] A. L. C. Ottoni, E. G. Nepomuceno, M. S. de Oliveira, D. C. R. de Oliveira, Reinforcement
learning for the traveling salesman problem with refueling, in: Complex &amp; Intelligent Systems,
vol. 8, 2021, pp. 1-15. doi.org/10.1007/s40747-021-00444-4.
[10] M. Dorigo, L. Gambardella, Ant-Q: A reinforcement learning approach to the traveling salesman
problem, in: Proceedings of ML-95, Twelfth Internet Conference on Machine Learning, 1995,
pp. 252-260.
[11] C. Blum, G. R. Raidl, Hybrid Metaheuristics. Powerful Tools for Optimization, Charm: Springer,
2016. doi: 10.1007/978-3-319-30883-8.
[12] R. Martí, P. M. Pardalos, M. G. C. Resende, Handbook of Heuristics, Charm: Springer, 2018.</p>
      <p>doi: 10.1007/978-3-319-07124-4.
[13] O. Bozorg Haddad, M. Solgi, H. Loaiciga, Meta-heuristic and Evolutionary Algorithms for
Engineering Optimization, Hoboken, New Jersey: Wiley &amp; Sons, 2017. doi:
10.1002/9781119387053.
[14] M. Gendreau, J.-Y. Potvin, Gendreau M. Handbook of Metaheuristics, New York: Springer,
2010. doi: 10.1007/978-1-4419-1665-5.
[15] B. Chopard, M. Tomassini, An Introduction to Metaheuristics for Optimization, New York:</p>
      <p>Springer, 2018. doi: 10.1007/978-3-319-93073-2.
[16] J. Radosavljević, Metaheuristic Optimization in Power Engineering, New York: Institution of</p>
      <p>Engineering and Technology, 2018. doi: 10.1049/PBPO131E.
[17] K. F. Doerner, M. Gendreau, P. Greistorfer, W. Gutjahr, R. F. Hartl, M. Reimann,
Metaheuristics. Progress in Complex Systems Optimization, New York: Springer, 2007. doi:
10.1007/978-0-387-71921-4.
[18] M. Dorigo, V. Maniezzo, A. Colorni, Ant system: optimization by a colony of cooperating
agents, in: IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 26,
issue 1, 1996, pp. 29–41. doi: 10.1109/3477.484436
[19] A. Colorni, M. Dorigo, V. Maniezzo, M. Trubian, Ant system for job-shop scheduling, in:
Belgian Journal of Operations Research, Statistics and Computer Science, vol. 34, issue 1, 1994,
pp. 39–53.
[20] L. M. Gambardella, É. D. Taillard, M. Dorigo, Ant colonies for the quadratic assignment
problem, in: Journal of the Operational Research Society, vol. 50, issue 2, 1999, pp. 167–176.
doi: 10.1057/palgrave.jors.2600676
[21] H. Min, P. Dazhi, Y. Song, An improved hybrid ant colony algorithm and its application in
solving TSP, in: Proceedings of the 7th IEEE Joint International Information Technology and
Artificial Intelligence Conference (ITAIC '14), 2014, pp. 423–427.
[22] K.-L. Du, M. N. S. Swamy, Search and Optimization by Metaheuristics. Techniques and</p>
      <p>Algorithms Inspired by Nature, Charm: Springer, 2016. doi: 10.1007/978-3-319-41192-7.
[23] E. Alba, A. Nakib, P. Siarry, Metaheuristics for Dynamic Optimization, Berlin: Springer-Verlag,
2013. doi: 10.1007/978-3-642-30665-5
[24] J. Brownlee, Clever algorithms: nature-inspired programming recipes, Melbourne: Brownlee,
2011.
[25] O. O. Grygor, E. E. Fedorov, T. Yu. Utkina, A. G. Lukashenko, K. S. Rudakov, D. A. Harder, V.</p>
      <p>M. Lukashenko, Optimization method based on the synthesis of clonal selection and annealing
simulation algorithms, Radio Electronics, Computer Science, Control (2019) 90-99. doi:
10.15588/1607-3274-2019-2-10.
[26] E. Fedorov, V. Lukashenko, T. Utkina, A. Lukashenko, K. Rudakov, Method for parametric
identification of Gaussian mixture model based on clonal selection algorithm, in: CEUR
Workshop Proceedings, vol. 2353, 2019. pp. 41-55.
[27] N. R. Sturtevant, Benchmarks for Grid-Based Pathfinding, in: IEEE Transactions on
Computational Intelligence and AI in Games, vol. 4, issue 2, 2012, pp. 144–148.
doi:10.1109/TCIAIG.2012.2197681.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 n</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>1 3 5 7 9 11 31 51 71 91 12 32 52 72 92 13 33 53 73 93 14 34 54 74 94 15 35 55 75 95 16 36 56 76 96 17 37 57 77 97 18 38 58 78 98 19 39 59 79 99 n</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>X.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gen</surname>
          </string-name>
          , Introduction to evolutionary algorithms, London: Springer-Verlag,
          <year>2010</year>
          . doi:
          <volume>10</volume>
          .1007/978-1-
          <fpage>84996</fpage>
          -129-5.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Engelbrecht</surname>
          </string-name>
          , Computational Intelligence: an introduction, Chichester, West Sussex, Wiley &amp; Sons,
          <year>2007</year>
          . doi:
          <volume>10</volume>
          .1002/9780470512517.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [3]
          <string-name>
            <surname>El-G. Talbi</surname>
          </string-name>
          ,
          <article-title>Metaheuristics: from design to implementation</article-title>
          , Hoboken, New Jersey: Wiley &amp; Sons,
          <year>2009</year>
          . doi:
          <volume>10</volume>
          .1002/9780470496916.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>X.-S.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Nature-inspired Algorithms</article-title>
          and Applied Optimization, Charm: Springer,
          <year>2018</year>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -29694-9.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nakib</surname>
          </string-name>
          ,
          <string-name>
            <surname>El-G. Talbi</surname>
          </string-name>
          ,
          <source>Metaheuristics for Medicine and Biology</source>
          , Berlin: Springer-Verlag,
          <year>2017</year>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>662</fpage>
          -54428-0.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Glover</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Kochenberger</surname>
          </string-name>
          , Handbook of metaheuristics, Dordrecht: Kluwer Academic Publishers,
          <year>2003</year>
          . doi:
          <volume>10</volume>
          .1007/B101874.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>