jSO and GWO Algorithms Optimize Together
Radka Poláková1 , Daniel Valenta1
1
    Silesian University in Opava, Faculty of Philosophy and Science in Opava, The Institute of Computer Science, Opava, Czech Republic


                                       Abstract
                                       This paper deals with global optimization. There are many algorithms for global optimization in the literature. In this text,
                                       we focus on two effective optimizers. The first one is an adaptive version of Differential evolution algorithm which was the
                                       most successful version of the algorithm on CEC 2014 congress and is called the jSO algorithm. The second one is the Grey
                                       wolf optimizer which was introduced in 2014. We propose new algorithm cooperation in which both of these algorithms
                                       are used together to get better results when optimization problems are being solved. In our attempt, both algorithms take
                                       turns making the optimization process. We tested the proposed algorithm on four multi-modal functions on two levels of
                                       dimension. The results are quite promising.

                                       Keywords
                                       Optimisation, GWO, jSO, Cooperation, Optimization algorithms


1. Global Optimization                                                                          The most difficult functions to solve with optimization
                                                                                                algorithms are those that are not easily differentiable at
In human life, there are many different problems which                                          some points or have many local extremes. The result
lead to optimization of a mathematical function. To opti-                                       may not always correspond exactly to the real optimum.
mize a function 𝑓 in a search space means to find a point                                       However, it could be enough with regard to the complex-
in the search space in which the function 𝑓 has global op-                                      ity of the optimized function. In other words, the result
timum, thus global minimum or global maximum, which                                             of a stochastic optimizer is a sufficient estimation of the
depends on the task. It is well-known that when we are                                          real solution of the problem, and it is not so expensive.
searching for global maximum of function 𝑓 , it is the                                          According to the No Free Lunch theorem [1], there is
same as finding a global minimum of function 𝑔 = −𝑓 ,                                           no optimization algorithm that solves all functions best.
so we can talk only about minimization in the following.                                        Therefore, new methods are being developed. We de-
The minimization problem is defined as follows. Let’s                                           scribe two already established and one new method in
have a function 𝑓 ,                                                                             the following sections.
                                            𝑓 : 𝑆 → ℛ, 𝑆 ⊂ ℛ𝐷                            (1)
                                                                                                2. Differential Evolution and jSO
𝑓 is minimized function, 𝐷 is dimension of problem. 𝑆
is search space, here continuous.                                                               There are many different algorithms to optimize-
                                                                                                minimize a mathematical function. In this paper, we work
           𝑆 = Π𝐷
                𝑗=1 [𝑎𝑗 , 𝑏𝑗 ];                      𝑎𝑗 < 𝑏𝑗 , 𝑗 = 1, 2, . . . , 𝐷       (2)    with jSO which is an adaptive version of Differential evo-
                                                                                                lution algorithm and also with Grey wolf optimizer. We
Π notation (also called Product notation, Cartesian in
                                                                                                describe all three algorithms briefly in this section.
this case) is used here in the standard way and indicates
repeated multiplication.
   A point 𝑥⃗* is global minimum point of the function 𝑓                                        2.1. Differential Evolution
in the search space 𝑆 if the following condition holds.                                                Differential evolution (DE) algorithm was introduced
                                                                                                       in [2]. It is an efficient optimizer. It is population-based
                         ∀𝑥 ⃗ ∈ 𝑆; 𝑓 (𝑥⃗* ) ≤ 𝑓 (𝑥                  ⃗)                         (3) algorithm in which a population evolves during the run in
                                                                                                       order to have better and better members. Better in a sense
It is possible to find the minimum of a function by analyt- of lower function value in the point. The best member of
ical way, but there are functions for which such process is the population is the result at the end of the run. It is the
difficult, long, or impossible because of function features. point of global optimum found by the algorithm.
Then stochastic algorithms could help us.                                                                 The algorithm works with the population of points
ITAT’22: Information technologies – Applications and Theory, Septem- which are at the beginning of the run of the algorithm
ber 23–27, 2022, Zuberec, Slovakia                                                                     generated randomly (with uniform distribution) in the
$ radka.polakova@fpf.slu.cz (R. Poláková);                                                             search space 𝑆. Then, each population member is moved
daniel.valenta@fpf.slu.cz (D. Valenta)
          © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License around the search space and is evolved to have better
    CEUR
          Attribution 4.0 International (CC BY 4.0).
          CEUR Workshop Proceedings (CEUR-WS.org)
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                                                                                       values of optimized function.
    In a generation (iteration) of the algorithm run, a trial Algorithm 1 DE
point 𝑦⃗𝑖 is computed for each population member 𝑥⃗𝑖 ,         1: Generate the initial generation 𝑃0 of the population
𝑖 = 1, 2, . . . , 𝑁𝑃 , 𝑁𝑃 is the size of the population. The      𝑃 ; 𝑃0 = (𝑥   ⃗ 1 , ⃗𝑥2 , ..., ⃗𝑥𝑁𝑃 )
point 𝑦⃗𝑖 is produced by two operations, the fist one is       2: Compute the value of the optimized function 𝑓 at all
mutation. A mutant point 𝑢⃗𝑖 is computed based on a kind          points of the generation 𝑃0 ;
of mutation. Then the mutant 𝑢⃗𝑖 enters into crossover         3: Set counter of generations to 𝑔 = 0;
operation together with the original point 𝑥⃗𝑖 . The result    4: while termination criterion is not met do
of the crossover is the trial point 𝑦⃗𝑖 . If the trial point   5:     𝑄𝑔 = 𝑃𝑔 ;
𝑦⃗𝑖 is better than the original point 𝑥⃗𝑖 , the new point 𝑦⃗𝑖  6:     for 𝑖 = 1 to 𝑁𝑃 do
enters into the next generation instead of 𝑥⃗𝑖 . If not, the   7:         Create a trial point ⃗𝑦 𝑖 to point ⃗𝑥𝑖 ;
original point 𝑥⃗𝑖 enters into the next generation of the      8:         Compute the value of the function 𝑓 at point
population.                                                              ⃗𝑦 𝑖 ;
    There are several types of mutation, e.g. rand/1, ran-     9:         if 𝑓 (𝑦  ⃗ 𝑖 ) ≤ 𝑓 (𝑥   ⃗ 𝑖 ) then
drl/1, current-to-rand/1, rand/2, current-to-pbest/1, etc. 10:                   Insert point ⃗𝑦 𝑖 into 𝑄𝑔 instead of point
The most used one is the rand/1 mutation - eq. (4). By                           ⃗𝑥𝑖 ;
this mutation, each mutant 𝑢⃗𝑖 is computed from three 11:             𝑃𝑔+1 = 𝑄𝑔 ; 𝑔 = 𝑔 + 1;
randomly chosen points 𝑟⃗1𝑖 , 𝑟⃗2𝑖 , and 𝑟⃗3𝑖 from current 12: The result is the best point in 𝑃 .
                                                                                                             𝑔
population which are different from original point 𝑥⃗𝑖 .
                𝑢⃗𝑖 = 𝑟⃗1𝑖 + 𝐹 (𝑟⃗2𝑖 − 𝑟⃗3𝑖 )            (4)
                                                                circle memories to adapt 𝐹 and 𝐶𝑅 parameters. Then
The 𝐹 is mutation parameter of DE. Also the crossover           the LSHADE algorithm [4] was proposed, it is SHADE
could be made in one of several ways in DE. The most            with a linear reduction of population size mechanism.
frequently used two types of crossover are binomial and         The next algorithm is iLSHADE [7]. And finally, the last
exponential ones. The algorithm uses parameter 𝐶𝑅               one is jSO. The jSO algorithm uses the linear reduction
as a crossover parameter. It influences the count of co-        of population size mechanism, archive, and other tools.
ordinates which are inherited by a trial point from the         It has several features different from iLSHADE,    e.g. size
mutant point 𝑢⃗𝑖 . In the binomial crossover, inherited                                                  √
                                                                of the population is set to 𝑁𝑃 = 25 × 𝐷 × log𝐷 at the
coordinates are distributed uniformly. The trial point          beginning of the search process instead of 𝑁𝑃 = 18×𝐷,
takes consecutive series of coordinates from the mutant         parameter 𝑝, which is the parameter of current-to-pbest/1
in the exponential crossover.                                   (and also of current-to-pbest-w/1) mutation, is handled
   Thus, DE algorithm has several parameters. They are          in a different way in this algorithm compared to its pre-
mutation parameter 𝐹 , crossover parameter 𝐶𝑅, muta-            decessors. The size of historical circle memories 𝐻 is set
tion type, type of crossover, and also population size 𝑁𝑃 .     to 5 here. For a detailed description of the algorithm, see
There are many adaptive versions of the algorithm in            [5] and Algorithm 2.
the literature. Several of them were successful on CEC
congresses, e.g. SHADE [3], LSHADE [4], jSO [5].
   The principles of the algorithm are shown in the             2.3. Grey Wolf Optimizer
pseudo-code below. Note that this is a simplification,   Grey wolf optimizer (GWO) is a nature-based and already
which does not describe how the trial point was created. well-established meta-heuristic method inspired by social
The trial point ⃗𝑦 𝑖 to point ⃗𝑥𝑖 is created using two opera-
                                                         dynamics found in the pack of grey wolves [8]. In other
tions, mutation and crossover. We will discuss the spe-  words, the algorithm simulates the behavior of wolves,
cific method of creating the trial point in the pseudo-code
                                                         that live and hunt together in packs. The algorithm was
presented in the following section for the jSO algorithm introduced in 2014.
(Algorithm 2).                                              Let us focus on principles observed in a pack of wolves.
                                                         There are strict rules that they must follow, and each wolf
2.2. jSO                                                 has a clearly defined role. Based on this, we can classify
                                                         them into four categories: alpha, beta, delta, and omega.
jSO [5] is very sophisticated version of DE. The algo- The leader of the pack is the alpha pair of wolves. They
rithm evolved from its predecessors. The first one, of are dominant in the group and other wolves follow their
course except DE, is JADE [6]. Authors of this algorithm lead. They could be substituted by beta wolves if it is
developed a new mutation, current-to-pbest/1. This mu- necessary. Beta wolves are second in command. They are
tation in slightly modified version is used in jSO. They important because they help and support the alpha pair
also suggested to employ an archive of old members of during its decisions. Delta wolves follow instructions
the population. The next algorithm is SHADE [3], which ordered by wolves alpha and beta. Mid-ranking wolves
is improved version of JADE. SHADE employs historical
Algorithm 2 jSO                                                in the hierarchy are delta ones. They ensure the routine
 1: Generate the initial generation 𝑃0 of the population       activities of the pack and follow the orders of alpha and
    𝑃 randomly; 𝑃0 = (𝑥       ⃗ 1 , ⃗𝑥2 , ..., ⃗𝑥𝑁𝑃 )          beta wolves. Each delta wolf has a specific focus, based
 2: Initialize archive 𝐴 = ∅;                                  on which we can divide them into the following sub-
 3: Compute the value of the optimized function 𝑓 at all       categories: scouts, sentinels, and caretakers. The omega
    points of the generation 𝑃0 ;                              wolves are in the lowest position in the hierarchy and
 4: Set all values in 𝑀𝐹 and 𝑀𝐶𝑅 to 0.5;                       others wolves often pick on them. It is important to filter
    Note: 𝑀𝐹 and 𝑀𝐶𝑅 are circle memories, storing              aggression and prevent frustration in the pack. Losing
    the position parameters for the Cauchy (𝑀𝐹 ) and           omega wolves can cause fighting between other wolves,
    normal (𝑀𝐶𝑅 ) distributions;                               and damage to discipline or hierarchy.
    The size of these memories is 𝐻; 𝐻 = 5;                       In nature, wolves’ primary goal is to find and hunt
 5: 𝑔 = 0 (current iteration - generation);                    down prey. This process consists of these main steps:
 6: 𝑒𝑣𝑎𝑙𝑠 = 𝑁𝑃 (current number of used function eval-          searching for prey, encircling prey, and attacking prey.
    uations);                                                  During searching for the prey, wolves are trying to find
 7: 𝑘 = 1 (index counter for circle memories);                 the most abundant (but easily catchable) prey to hunt.
 8: while termination criterion is not met do                  Once they find such prey, they attempt to push it into a
 9:     𝑆𝐹 = ∅ and 𝑆𝐶𝑅 = ∅;                                    situation when it is alone and cannot escape while they
10:     for 𝑖 = 1 to 𝑁𝑃 do                                     encircle it. Finally, when the prey is surrounded and can
11:          Select 𝑟𝑖 randomly from {1, 2, ..., 𝐻};           no longer escape, they attack it. Wolves attack the weak
12:          if 𝑟𝑖 = 𝐻 then                                    spots of the prey like legs, snout, or belly, until it stops
13:               𝑀𝐹,𝑟𝑖 = 0.9; 𝑀𝐶𝑅,𝑟𝑖 = 0.9;                   resisting, and afterwards, they bring it down and crush
14:          if 𝑀𝐶𝑅,𝑟𝑖 < 0 then                                its windpipe.
15:               𝐶𝑅𝑖,𝑔 = 0;                                      The Grey wolf optimizer is inspired by the processes
16:          else Generate 𝐶𝑅 using normal distribution:       described above: the creation of a social hierarchy and
                  𝐶𝑅𝑖,𝑔 = 𝑁𝑖 (𝑀𝐶𝑅,𝑟𝑖 , 0.1);                   hunting technique. Because it is an agent-based algo-
                                                               rithm, each agent represents one of the wolves in the
17:          if 𝑒𝑣𝑎𝑙𝑠 < 14 𝑚𝑎𝑥𝑒𝑣𝑎𝑙𝑠; 𝑚𝑎𝑥𝑒𝑣𝑎𝑙𝑠 is the
                                                               pack. Agents are randomly placed into the environment
             maximum number of allowed function
                                                               (search space 𝑆). The better value (in the sense of mini-
             evaluations;
                                                               mization) in the position of the current wolf the closer
             then
                                                               position to the prey (the solution of the solved optimiza-
18:               𝐶𝑅𝑖,𝑔 = 𝑚𝑎𝑥(𝐶𝑅𝑖,𝑔 , 0.7);
                                                               tion problem).
19:          else if 𝑒𝑣𝑎𝑙𝑠 < 12 𝑚𝑎𝑥𝑒𝑣𝑎𝑙𝑠; then
                                                                  GWO is an iterative algorithm. In each iteration, each
20:               𝐶𝑅𝑖,𝑔 = 𝑚𝑎𝑥(𝐶𝑅𝑖,𝑔 , 0.6);
                                                               wolf is assigned to the pack hierarchy according to the
21:          Generate 𝐹 using Cauchy distribution:             value of its fitness function 𝑓 at its position. The wolf
             𝐹𝑖,𝑔 = 𝐶𝑖 (𝑀𝐹,𝑟𝑖 , 0.1);                          with the best value is ranked as alpha, the second best as
22:          if 𝑒𝑣𝑎𝑙𝑠 < 16 𝑚𝑎𝑥𝑒𝑣𝑎𝑙𝑠 and 𝐹𝑖,𝑔 > 0.7             beta, the third best as delta, and all the others as omega.
             then                                              Wolves alpha, beta, and delta have the same meaning and
23:               𝐹𝑖,𝑔 = 0.7;                                  save the three best solutions found at the iteration. Posi-
24:          A trial point ⃗𝑦 𝑖,𝑔 is created using             tions of wolves in the environment are updated in each
             DE/current-to-pbest-w/1/bin strategy (see         iteration. The new position of the agent is based upon
            [5]);                                              the estimated location of the prey, which is probably
25:     Evaluate optimized (objective) function 𝑓 in all       somewhere between alpha, beta, and delta wolves. We
        𝑁𝑃 made trial points ⃗𝑦 𝑖,𝑔 , 𝑖 = 1, 2, . . . , 𝑁𝑃 ;   assume this, but the optimum may be located elsewhere,
26:     𝑒𝑣𝑎𝑙𝑠 = 𝑒𝑣𝑎𝑙𝑠 + 𝑁𝑃 ;                                   so a mechanism is needed to thoroughly scan the entire
27:     for 𝑖 = 1 to 𝑁𝑃 do                                     environment. The agent approaching the prey hunts it,
28:          if 𝑓 (𝑦⃗ 𝑖,𝑔 ) ≤ 𝑓 (𝑥
                                 ⃗ 𝑖,𝑔 ) then                  while the agent moving away from it tries to find even
29:               Update point for the next generation:        more abundant prey elsewhere. For this purpose, two
                   ⃗𝑥𝑖,𝑔+1 = ⃗𝑦 𝑖,𝑔                            vectors 𝐴⃗ and 𝐶⃗ are used, thanks to which the algorithm
30:          else ⃗𝑥𝑖,𝑔+1 = ⃗𝑥𝑖,𝑔                              passes smoothly through two phases: scouting and hunt-
31:          if 𝑓 (𝑦             ⃗ 𝑖,𝑔 ) then
                    ⃗ 𝑖,𝑔 ) < 𝑓 (𝑥                             ing the prey. Both vectors have random components so
32:               Insert ⃗𝑥𝑖,𝑔 into archive 𝐴;                 they help to prevent convergence to a local optimum (not
33:               Insert 𝐶𝑅𝑖,𝑔 into 𝑆𝐶𝑅 ; and 𝐹𝑖,𝑔 into 𝑆𝐹 ;   very abundant prey) instead of a global one.
34:     If necessary, shrink archive A;                           Vector 𝐴 ⃗ has components 𝑟𝑎𝑛𝑑(−1, 1) * 𝑎, where
35:     Calculate the new value of the first parameter         𝑟𝑎𝑛𝑑(−1, 1) generates a random number with a uniform
        for both distributions 𝑆𝐹 and 𝑆𝐶𝑅 , and store
        them in 𝑀𝐹𝑘 and 𝑀𝐶𝑅𝑘 ; 𝑘 = 𝑘 + 1;
36:     if 𝑘 > 𝐻 then 𝑘 = 1;
37:     Apply linear population size reduction mecha-
        nism, see [5] (update 𝑁𝑃 and population 𝑃 );
38:     Update parameter 𝑝 for current-to-pbest-w/1
        mutation strategy (see [5]);
39:     𝑔 = 𝑔 + 1;
40: The result is the best point in 𝑃𝑔 .
distribution between -1 and 1 and 𝑎 = 2 − (2𝑖𝑡/𝑖𝑡𝑚𝑎𝑥 ),            𝑋⃗𝛽 (𝑖𝑡), and 𝑋 ⃗𝛿 (𝑖𝑡) wolves, similarly as 𝐴
                                                                                                                ⃗ , so we get 𝐶⃗1 ,
while 𝑖𝑡 is the algorithm current iteration and 𝑖𝑡𝑚𝑎𝑥 is           𝐶⃗2 , and 𝐶⃗3 .
the maximum number of iterations. Each component of                   GWO has only two parameters, the size of the wolf
the vector influences movement of the agent in a specific          pack 𝑁 and the length of the time it can search for the
dimension of the environment (search space). As you                optimum. In the original proposal, it works with the
can see, the interval in which components of vector 𝐴         ⃗    maximum of iterations it can make. Here, we use the
lie is narrowing we can say linearly from [−2, 2] to [0, 0]        maximum number of function evaluations, in order to do
as the number of iterations increases. It is because the           a fair comparison of algorithms.
parameter 𝑎 decreases during the whole run from 2 to 0.
The closer is the value of component of 𝐴     ⃗ to 0, the higher   Algorithm 3 GWO
is the probability that the agent chooses the hunting               1: Randomly generate an initial population of wolves-
phase instead of the scouting one.                                     agents 𝑋 ⃗1 , 𝑋
                                                                                     ⃗2 , ..., 𝑋⃗𝑁 into the environment;
   Another vector supporting divergence between scout-              2: while termination criterion is not met do
ing and hunting phases is 𝐶 ⃗ . Vector 𝐶  ⃗ is similar to vector    3:     Calculate the fitness value of each agent 𝑋     ⃗𝑗 ;
𝐴, but the values of components of this vector do not
⃗                                                                   4:     Determine the social hierarchy and find position
linearly decrease as the number of iterations grows. This                  of alpha, beta, and delta wolves;
vector has components set to 𝑟𝑎𝑛𝑑(0, 2), which is a ran-            5:     Generate vectors 𝐴    ⃗ and 𝐶 ⃗ for all three best
dom number with the uniform distribution between 0                         wolves;
and 2. The closer the value is to 0, the higher is the prob-        6:     Calculate the new position of each agent 𝑋      ⃗𝑗 ;
ability that the agent chooses the hunting phase. This              7: The result is the position of the best wolf after the
vector helps wolves to behave more naturally. In nature,               last iteration of the algorithm.
there are various obstacles (e.g. bushes, stones, or trees)
on hunting trails. Wolves change direction to avoid them,
so they do not move directly towards their prey. Vector               Because wolves are moving closer toward prey from
⃗ simulates this part of their behaviour.
𝐶                                                                  various directions with the increasing number of itera-
                                                                   tions, they are encircling it.
   Each wolf-agent is on position 𝑋     ⃗𝑗 in search space 𝑆, 𝑗
is index of wolf, 𝑗 = 1, 2, . . . , 𝑁 , where 𝑁 is the size of
wolf pack. Positional vectors of agents 𝑋       ⃗𝑗 are updated
                                                                   3. Cooperation of Algorithms
                                                    ⃗    ⃗   ⃗
according to the formula 𝑋𝑗 (𝑖𝑡 + 1) = 𝑋1 +𝑋32 +𝑋3 ,
                                ⃗
where 𝑋  ⃗1 , 𝑋
              ⃗2 , and 𝑋
                       ⃗3 represent potential new positions     We proposed to use both algorithms, jSO and GWO, to-
of the prey to move (positions close to optimum) and are        gether in the optimization process of a function. The used
computed in the following way:                                  idea of common use of both algorithms is very simple.
                                                                We wanted to do a part of optimization process by one
                 𝑋⃗1 = 𝑋⃗𝛼 (𝑖𝑡) − 𝐴⃗1 * 𝐷⃗𝛼 ,               (5) of two mentioned algorithms and then give the results
                                                                of the algorithm to the other algorithm to do another
                 𝑋⃗2 = 𝑋⃗𝛽 (𝑖𝑡) − 𝐴⃗2 * 𝐷⃗𝛽 ,               (6) part of optimization process and then after making its
                                                                part of the process, it gives its results to the first algo-
                 𝑋⃗3 = 𝑋 ⃗𝛿 (𝑖𝑡) − 𝐴⃗3 * 𝐷⃗𝛿 ,              (7) rithm, etc. Each algorithm needs some time to optimize
where 𝑋⃗𝛼 (𝑖𝑡), 𝑋⃗𝛽 (𝑖𝑡), and 𝑋 ⃗𝛿 (𝑖𝑡) are current positions a function. It is like when someone needs some time to
of alpha, beta, and delta wolf, respectively. They rep- do something he has to do. It is not the best idea to let
resent the current best three preys found, 𝐴      ⃗ is already somebody do something and immediately after he starts
defined above and is generated separately for each of doing it to stop him. Mentioned ideas led us to divide
𝑋⃗𝛼 (𝑖𝑡), 𝑋⃗𝛽 (𝑖𝑡), and 𝑋⃗𝛿 (𝑖𝑡) wolves, so we get 𝐴⃗1 , 𝐴⃗2 , the number of allowed function evaluations into several
and 𝐴3 . Vectors 𝐷𝛼 , 𝐷⃗𝛽 , 𝐷
      ⃗              ⃗         ⃗𝛿 represent the distance of (not many) parts, we divide the amount into 𝑘 (𝑘 = 10)
                                                                portions here. The length of a portion is 𝑙. And then,
the wolf 𝑋𝑗 from prey. It is computed in the following jSO makes 5 of these parts and GWO does the rest of the
           ⃗
way:                                                            parts (also 5 parts). jSO starts doing optimization process
              𝐷⃗𝛼 = |𝐶⃗1 * 𝑋⃗𝛼 (𝑖𝑡) − 𝑋  ⃗𝑗 (𝑖𝑡)|,          (8) and after spending such amount of function evaluation
              𝐷⃗𝛽 = |𝐶⃗2 * 𝑋⃗𝛽 (𝑖𝑡) − 𝑋  ⃗𝑗 (𝑖𝑡)|,          (9) that equals to 𝑙, it gives its population to GWO. GWO
                                                                takes the best 𝑁 (𝑁 = 6, the size of the pack of wolves)
               𝐷⃗𝛿 = |𝐶⃗3 * 𝑋⃗𝛿 (𝑖𝑡) − 𝑋⃗𝑗 (𝑖𝑡)|,          (10) points of received population and spends next 𝑙 function
where |𝑋 ⃗ | is vector whose components are the absolute evaluations. After each iteration, GWO in cooperation
values of components of 𝑋  ⃗ . Vector 𝐶⃗ was already defined tests if its current optimum is better than the optimum
                                                                in the population which was originally received from
above and is generated separately for each of 𝑋⃗𝛼 (𝑖𝑡),
jSO. And if so, it rewrites randomly chosen points in the            Table 1
"temporary" population which is prepared for next work               Search spaces, optimal values, and position of optimal values
of jSO. When it is the last run of GWO in cooperation, it            for used test functions
rewrites the whole received population by the wolves,
when the best point found in the iteration is better than              function         search space 𝑆       𝑓 (𝑥⃗* )          𝑥⃗*
the interim result. After spending its amount of function              Ackley             [−30, 30]𝐷            0       (0, 0, 0, . . . , 0)
evaluations, it gives prepared population (with the best               Rastrigin        [−5.12, 5.12]𝐷          0       (0, 0, 0, . . . , 0)
found point) again to jSO. And the process is repeated 5               Rosenbrock         [−10, 10]𝐷            0       (1, 1, 1, . . . , 1)
times. Thus, the last algorithm which works in the search              Happycat           [−50, 50]𝐷            0       (0, 0, 0, . . . , 0)
process by our cooperation algorithm is GWO.
   We wanted to use all advantages of both original algo-
rithms, so we put them together in a way that keeps all
the principles of the original algorithms. So, all param-               The presented cooperative algorithm is one of the first
eters of both algorithms are set to the values according             attempts to use the GWO and jSO algorithms together.
their original proposal, see [5], [8]. When jSO optimizes,           In order to achieve even better results and to find the
the size of population is gradually linearly decreased.              optimal number of repetitions of the sequence of both al-
Also, parameter 𝑝 decreases as proposed in [5]. When                 gorithms, it would be necessary to perform experiments
the GWO algorithm works inside cooperation, also param-              on a larger number of optimization functions with a va-
eter 𝑎 decreases (as proposed in [8]) during the whole               riety of features.
run of cooperation. The size of the pack of wolves is equal
to 𝑁 during the whole run of the cooperation algorithm.
                                                                     4. Experiments and Results
Algorithm 4 Cooperation algorithm                                      We computed optima by three algorithms, jSO, GWO,
 1: Generate the initial generation 𝑃0 of the population and proposed cooperation in this paper. Four multi-modal
    𝑃 ; 𝑃0 = (𝑥 ⃗ 1 , ⃗𝑥2 , ..., ⃗𝑥𝑁𝑃 ); 𝑁𝑃 is set as in the jSO functions were used to make an experimental tests. Used
    algorithm;                                                         functions are Ackley function - eq. (11), Rastrigin func-
 2: 𝑙 = 𝑚𝑎𝑥𝑒𝑣𝑎𝑙𝑠
             𝑘
                     ; 𝑘   is the   total number     of   runs of both tion - eq. (12), Rosenbrock function - eq. (13), and Hap-
    algorithms in cooperation; 𝑘 = 10; 𝑚𝑎𝑥𝑒𝑣𝑎𝑙𝑠 is the pycat function - eq. (14).
    total number of allowed function evaluations;                                                          √︁ ∑︀
 3: Set the iteration counter 𝑐 to 1;
                                                                                                (︁                           )︁
                                                                                                                1   𝐷      2
                                                                              ⃗ ) = −20 exp −0.02 𝐷
                                                                           𝑓 (𝑥                                     𝑗=1 𝑥𝑗 −
 4: 𝑟 = 0;                                                                          (︁ ∑︀
                                                                                            𝐷
                                                                                                             )︁                      (11)
                                                                                      1
 5: while 𝑐 ≤ 𝑘 do                                                          − exp 𝐷         𝑗=1 cos 2𝜋𝑥𝑗 + 20 + exp(1)
 6:     if 𝑐 mod 2 = 1 then
 7:         Read population 𝑃𝑟 as input;                                                         𝐷
                                                                                                ∑︁
                                                                                                                                     (12)
                                                                                                      [︀ 2                     ]︀
 8:         Run the jSO algorithm for 𝑙 evaluations;                         𝑓 (𝑥
                                                                                ⃗ ) = 10 𝐷 +            𝑥𝑗 − 10 cos(2𝜋𝑥𝑗 )
                                                                                                𝑗=1
 9:         𝑟 =𝑟+1
10:         Note: the output of this run of jSO is 𝑃𝑟 ;                             𝐷−1
                                                                                     ∑︁ [︀
                                                                                           100(𝑥2𝑗 − 𝑥𝑗+1 )2 + (1 − 𝑥𝑗 )2            (13)
                                                                                                                                  ]︀
11:     else if 𝑐 mod 2 = 0 then                                           𝑓 (𝑥
                                                                              ⃗) =
12:         Read the 𝑁 best points from 𝑃𝑟                                           𝑗=1

            into 𝑋⃗ 1, 𝑋   ⃗ 2 , ..., 𝑋
                                      ⃗ 𝑁 ; 𝑁 = 6 in this case;                                    ⃒∑︀
                                                                                                   ⃒ 𝐷          2
                                                                                                                      ⃒1/4
            Run the GWO algorithm for 𝑙 evaluations;                                     𝑓 (𝑥
                                                                                            ⃗ ) =            𝑥    − 𝐷 ⃒ +
                                                                                                                      ⃒
13:                                                                                                             𝑗
                                                                                                         𝑗=1
                                                                                                                                     (14)
                                                                                                   ⃒
            if the condition about currently found best
                                                                                    (︁ ∑︀                           )︁
14:                                                                               1          𝐷      2        𝐷
                                                                                                          ∑︀
                                                                               + 𝐷 0.5 𝑗=1 𝑥𝑗 + 𝑗=1 𝑥𝑗 + 0.5
            point and the best point of 𝑃𝑟 holds (see text)
            then                                                          Used search space 𝑆 of each used function is displayed
15:             if 𝑐 < 𝑘 then                                          in Table 1. In this table, global optimum and point of
16:                    Rewrite 𝑁 points in 𝑃𝑟 by current               the optimum are written too. It is not clearly visible, but
                       positions of 𝑋     ⃗ 1, 𝑋
                                               ⃗ 2 , ..., 𝑋
                                                          ⃗ 𝑁 but do   each of these functions has many local extremes.
                       not affect the best three points of 𝑃𝑟 ;           Algorithms were tested on two levels of dimension,
17:             else                                                   𝐷  = 10 and 𝐷 = 30, in this work. In each dimension,
18:                    Rewrite whole 𝑃𝑟 by pack 𝑋           ⃗ 1, 𝑋⃗ 2, we set the total amount of allowed function evaluations
                       ..., 𝑋
                            ⃗𝑁                                         to two different values, for 𝐷 = 10, the two values were
19:     𝑐 = 𝑐 + 1;                                                     3000 and 30000, for 𝐷 = 30, the two values were 10000
                                                                       and 100000. For each combination of algorithm, function,
20: The result is the best point in 𝑃𝑟 .
                                                                       dimension, and value of allowed function evaluations,
Table 2                                                           runs, GWO reaches very good values, they are very near
Results of three tested algorithms on Ackley, Rastrigin, Rosen-   optimum. Results of jSO are only a little worse here. The
brock, and Happycat functions in dimensions 𝐷 = 10 and            cooperation algorithm adopts the results of GWO in this
𝐷 = 30 with two different amounts of allowed function eval-       case. In shorter runs, GWO reaches better results than
uations, 3000 and 30000 for 𝐷 = 10, 10000 and 100000 for          jSO, and when both algorithms optimize together in the
𝐷 = 30
                                                                  cooperation algorithm, the results are only a little worse
                                                                  than the results of GWO, but better than the results of
                                  jSO      GWO          coop      jSO. For dimension 𝐷 = 30 in longer runs, GWO reaches
                                mean        mean       mean       again very small values (reached values of jSO are worse)
 func     D     maxevals           std        std         std
                                                                  and cooperation adopts them again. In shorter runs here,
 ack      10     3 × 103       1.2005     0.0019      0.0973
                               0.1801     0.0037      0.1164
                                                                  the situation is similar or a little better than for shorter
                 3 × 104      2.1E-06    4.4E-16    4.4E-16       runs in dimension 𝐷 = 10.
                              5.0E-06           0           0        When we discuss the optimization process of Rastrigin
          30     1 × 104       1.4459    1.6E-11     2.8E-06      function with tested algorithms, the situation is very
                               0.0869    6.2E-11     6.3E-06      similar to the previous case, in both tested dimensions
                 1 × 105       0.0027    4.4E-16    4.4E-16       for both lengths of runs.
                               0.0016           0           0        For Rosenbrock function for both tested dimensions
 ras      10     3 × 103       24.443     0.0098      1.0742      for shorter runs, GWO is better optimizer than jSO and
                               5.0200     0.0263      1.8183      cooperation is little better then GWO. But when algo-
                 3 × 104      4.8E-08           0           0     rithms have much more time (larger amount of function
                              9.5E-08           0           0
                                                                  evaluations), jSO is better then GWO and cooperation
          30     1 × 104       125.44           0     0.0664
                               11.986           0     0.2570
                                                                  reaches better results than GWO but not better than jSO
                 1 × 105       0.6580           0           0     reaches.
                               0.8321           0           0        When we think about the Happycat function, the re-
 ros      10     3 × 103       12.188      9.0068    8.6566       sults of tests are very similar in both dimensions. It holds
                               3.3652     0.0327      0.3620      for both lengths of runs. We mean the results of compar-
                 3 × 104     1.9E-06       8.9895     0.5470      ison of algorithms. Moreover, it seems that the results of
                             5.7E-06       0.0131     0.3062      cooperation are a little better than the results of the jSO
          30     1 × 104       57.495      28.993    28.421       algorithm (which is here the better one of GWO and jSO
                               12.379     0.0040      0.3609      algorithms) for shorter runs and lower dimension. For
                 1 × 105      12.716       28.974     19.871      longer runs in both dimensions, jSO is better then GWO
                               0.8105     0.0554      0.7234
                                                                  and cooperation reaches better results than GWO but a
 hac      10     3 × 103       0.7336      1.7077    0.6613
                                                                  little worse results than jSO.
                               0.2225      0.3172    0.1212
                 3 × 104      0.1094       1.8145     0.1652
                              0.0232       0.3412     0.0316
          30     1 × 104      0.7271       1.8126     0.9969
                                                                  5. Conclusion
                              0.1310       0.2919     0.1742
                                                           The new algorithm called cooperation for global optimiza-
                 1 × 105      0.2262       2.0396     0.2878
                               0.0362      0.3823    0.0283
                                                           tion was proposed in this paper. It is based on two very ef-
                                                           fective optimizers, Grey wolf optimizer and one of many
                                                           adaptive versions of Differential evolution algorithm, the
                                                           jSO algorithm.
we made 15 runs. The total amount of runs in our exper-       We proposed to use both algorithms together for opti-
iments was 720.                                            mization.  The idea of cooperation is very easy, the algo-
   All tested algorithms were implemented in GNU Oc- rithms take turns performing the optimization process.
tave, version 7.1.0 and all computations were carried Four multi-modal functions and two levels of dimension
out on a standard PC with Windows 10 Home, Intel(R) were selected for our experimental comparison. The
Core(TM) i7-7500U CPU 2.70GHz 2.90GHz, 8 GB RAM. results of the made experimental comparison are promis-
   Summarisation of experimental test results is written ing.
in Table 2. We highlight the best results in bold. Results    In this paper, we have used only a basic scheme in
are also displayed on the two figures above. There is a which each algorithm has been used repeatedly 5-times
boxplot for shorter runs on the left side and the boxplot for the cooperation algorithm. In future work, we plan to
for longer ones on the right side for all four displayed develop some more sophisticated schema for the change
combinations of function and dimension on these figures. of controlling of optimization process by these two al-
   For Ackley function and dimension 𝐷 = 10 in longer gorithms, probably based on the stagnation of search
                         Ackley, D=10                                                  Rosenbrock, D=10


                         Ackley, D=30                                                  Rosenbrock, D=30


                        Rastrigin, D=10                                                 Happycat, D=10


                        Rastrigin, D=30                                                 Happycat, D=30


Figure 1: Results of all three tested algorithms on Ackley and   Figure 2: Results of all three tested algorithms on Rosenbrock
Rastrigin functions                                              and Happycat functions


process by currently used algorithm. The goal is to find a References
scheme that works best for most of the functions tested,
ideally better than both original algorithms in most cases. [1] Wolpert D. H., Macready, W. G.: No Free Lunch The-
                                                                orems for Optimization. IEEE Transactions on Evo-
Acknowledgement: This work was supported by the                 lutionary Computation. 1 (1997) 67–82
project no. CZ.02.2.69/0.0/0.0/18_054/0014696, "Devel- [2] Storn R., Price, K.: Differential evolution - A Simple
opment of R&D capacities of the Silesian University in          and Efficient Heuristic for Global Optimization over
Opava", co-funded by the European Union.                        Continuous Spaces. J. Global Optimization. 11 (1997)
                                                                341–359
                                                            [3] Tanabe R., Fukunaga, A.: Success-history based pa-
Acknowledgments                                                 rameter adaptation for differential evolution. In IEEE
                                                                Congress on Evolutionary Computation 2013. (2013)
This work was supported by the project no.
                                                                71–78
CZ.02.2.69/0.0/0.0/18_054/0014696,         "Development
                                                            [4] Tanabe R., Fukunaga, A.: Improving the Search Per-
of R&D capacities of the Silesian University in Opava",
                                                                formance of SHADE Using Linear Population Size
co-funded by the European Union.
                                                                Reduction. In IEEE Congress on Evolutionary Compu-
                                                                tation 2014. (2014) 1658–1665
[5] Brest J., Maučec M. S., Boškovič B.: Single Objec- [7] Brest J., Maučec M. S., Boškovič B.: iL-SHADE: Im-
    tive Real-Parameter Optimization: Algorithm jSO.       proved L-SHADE algorithm for single objective real-
    In IEEE Congress on Evolutionary Computation 2017.     parameter optimization. In IEEE Congress on Evolu-
    (2017) 1311–1318                                       tionary Computation 2016. (2016) 1188–1195
[6] Zhang J., Sanderson A. C.: JADE: Adaptive Differ- [8] Mirjalili S., Mirjalili S. M., Lewis A.: Grey Wolf Opti-
    ential Evolution With Optional External Archive.       mizer. Advances in Engineering Software. 69 (2014)
    IEEE Transactions on Evolutionary Computation. 13      46–61
    (2009) 945–958