jSO and GWO Algorithms Optimize Together Radka PolΓ‘kovΓ‘1 , Daniel Valenta1 1 Silesian University in Opava, Faculty of Philosophy and Science in Opava, The Institute of Computer Science, Opava, Czech Republic Abstract This paper deals with global optimization. There are many algorithms for global optimization in the literature. In this text, we focus on two effective optimizers. The first one is an adaptive version of Differential evolution algorithm which was the most successful version of the algorithm on CEC 2014 congress and is called the jSO algorithm. The second one is the Grey wolf optimizer which was introduced in 2014. We propose new algorithm cooperation in which both of these algorithms are used together to get better results when optimization problems are being solved. In our attempt, both algorithms take turns making the optimization process. We tested the proposed algorithm on four multi-modal functions on two levels of dimension. The results are quite promising. Keywords Optimisation, GWO, jSO, Cooperation, Optimization algorithms 1. Global Optimization The most difficult functions to solve with optimization algorithms are those that are not easily differentiable at In human life, there are many different problems which some points or have many local extremes. The result lead to optimization of a mathematical function. To opti- may not always correspond exactly to the real optimum. mize a function 𝑓 in a search space means to find a point However, it could be enough with regard to the complex- in the search space in which the function 𝑓 has global op- ity of the optimized function. In other words, the result timum, thus global minimum or global maximum, which of a stochastic optimizer is a sufficient estimation of the depends on the task. It is well-known that when we are real solution of the problem, and it is not so expensive. searching for global maximum of function 𝑓 , it is the According to the No Free Lunch theorem [1], there is same as finding a global minimum of function 𝑔 = βˆ’π‘“ , no optimization algorithm that solves all functions best. so we can talk only about minimization in the following. Therefore, new methods are being developed. We de- The minimization problem is defined as follows. Let’s scribe two already established and one new method in have a function 𝑓 , the following sections. 𝑓 : 𝑆 β†’ β„›, 𝑆 βŠ‚ ℛ𝐷 (1) 2. Differential Evolution and jSO 𝑓 is minimized function, 𝐷 is dimension of problem. 𝑆 is search space, here continuous. There are many different algorithms to optimize- minimize a mathematical function. In this paper, we work 𝑆 = Π𝐷 𝑗=1 [π‘Žπ‘— , 𝑏𝑗 ]; π‘Žπ‘— < 𝑏𝑗 , 𝑗 = 1, 2, . . . , 𝐷 (2) with jSO which is an adaptive version of Differential evo- lution algorithm and also with Grey wolf optimizer. We Ξ  notation (also called Product notation, Cartesian in describe all three algorithms briefly in this section. this case) is used here in the standard way and indicates repeated multiplication. A point π‘₯βƒ—* is global minimum point of the function 𝑓 2.1. Differential Evolution in the search space 𝑆 if the following condition holds. Differential evolution (DE) algorithm was introduced in [2]. It is an efficient optimizer. It is population-based βˆ€π‘₯ βƒ— ∈ 𝑆; 𝑓 (π‘₯βƒ—* ) ≀ 𝑓 (π‘₯ βƒ—) (3) algorithm in which a population evolves during the run in order to have better and better members. Better in a sense It is possible to find the minimum of a function by analyt- of lower function value in the point. The best member of ical way, but there are functions for which such process is the population is the result at the end of the run. It is the difficult, long, or impossible because of function features. point of global optimum found by the algorithm. Then stochastic algorithms could help us. The algorithm works with the population of points ITAT’22: Information technologies – Applications and Theory, Septem- which are at the beginning of the run of the algorithm ber 23–27, 2022, Zuberec, Slovakia generated randomly (with uniform distribution) in the $ radka.polakova@fpf.slu.cz (R. PolΓ‘kovΓ‘); search space 𝑆. Then, each population member is moved daniel.valenta@fpf.slu.cz (D. Valenta) Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License around the search space and is evolved to have better CEUR Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 values of optimized function. In a generation (iteration) of the algorithm run, a trial Algorithm 1 DE point 𝑦⃗𝑖 is computed for each population member π‘₯⃗𝑖 , 1: Generate the initial generation 𝑃0 of the population 𝑖 = 1, 2, . . . , 𝑁𝑃 , 𝑁𝑃 is the size of the population. The 𝑃 ; 𝑃0 = (π‘₯ βƒ— 1 , βƒ—π‘₯2 , ..., βƒ—π‘₯𝑁𝑃 ) point 𝑦⃗𝑖 is produced by two operations, the fist one is 2: Compute the value of the optimized function 𝑓 at all mutation. A mutant point 𝑒⃗𝑖 is computed based on a kind points of the generation 𝑃0 ; of mutation. Then the mutant 𝑒⃗𝑖 enters into crossover 3: Set counter of generations to 𝑔 = 0; operation together with the original point π‘₯⃗𝑖 . The result 4: while termination criterion is not met do of the crossover is the trial point 𝑦⃗𝑖 . If the trial point 5: 𝑄𝑔 = 𝑃𝑔 ; 𝑦⃗𝑖 is better than the original point π‘₯⃗𝑖 , the new point 𝑦⃗𝑖 6: for 𝑖 = 1 to 𝑁𝑃 do enters into the next generation instead of π‘₯⃗𝑖 . If not, the 7: Create a trial point ⃗𝑦 𝑖 to point βƒ—π‘₯𝑖 ; original point π‘₯⃗𝑖 enters into the next generation of the 8: Compute the value of the function 𝑓 at point population. ⃗𝑦 𝑖 ; There are several types of mutation, e.g. rand/1, ran- 9: if 𝑓 (𝑦 βƒ— 𝑖 ) ≀ 𝑓 (π‘₯ βƒ— 𝑖 ) then drl/1, current-to-rand/1, rand/2, current-to-pbest/1, etc. 10: Insert point ⃗𝑦 𝑖 into 𝑄𝑔 instead of point The most used one is the rand/1 mutation - eq. (4). By βƒ—π‘₯𝑖 ; this mutation, each mutant 𝑒⃗𝑖 is computed from three 11: 𝑃𝑔+1 = 𝑄𝑔 ; 𝑔 = 𝑔 + 1; randomly chosen points π‘Ÿβƒ—1𝑖 , π‘Ÿβƒ—2𝑖 , and π‘Ÿβƒ—3𝑖 from current 12: The result is the best point in 𝑃 . 𝑔 population which are different from original point π‘₯⃗𝑖 . 𝑒⃗𝑖 = π‘Ÿβƒ—1𝑖 + 𝐹 (π‘Ÿβƒ—2𝑖 βˆ’ π‘Ÿβƒ—3𝑖 ) (4) circle memories to adapt 𝐹 and 𝐢𝑅 parameters. Then The 𝐹 is mutation parameter of DE. Also the crossover the LSHADE algorithm [4] was proposed, it is SHADE could be made in one of several ways in DE. The most with a linear reduction of population size mechanism. frequently used two types of crossover are binomial and The next algorithm is iLSHADE [7]. And finally, the last exponential ones. The algorithm uses parameter 𝐢𝑅 one is jSO. The jSO algorithm uses the linear reduction as a crossover parameter. It influences the count of co- of population size mechanism, archive, and other tools. ordinates which are inherited by a trial point from the It has several features different from iLSHADE, e.g. size mutant point 𝑒⃗𝑖 . In the binomial crossover, inherited √ of the population is set to 𝑁𝑃 = 25 Γ— 𝐷 Γ— log𝐷 at the coordinates are distributed uniformly. The trial point beginning of the search process instead of 𝑁𝑃 = 18×𝐷, takes consecutive series of coordinates from the mutant parameter 𝑝, which is the parameter of current-to-pbest/1 in the exponential crossover. (and also of current-to-pbest-w/1) mutation, is handled Thus, DE algorithm has several parameters. They are in a different way in this algorithm compared to its pre- mutation parameter 𝐹 , crossover parameter 𝐢𝑅, muta- decessors. The size of historical circle memories 𝐻 is set tion type, type of crossover, and also population size 𝑁𝑃 . to 5 here. For a detailed description of the algorithm, see There are many adaptive versions of the algorithm in [5] and Algorithm 2. the literature. Several of them were successful on CEC congresses, e.g. SHADE [3], LSHADE [4], jSO [5]. The principles of the algorithm are shown in the 2.3. Grey Wolf Optimizer pseudo-code below. Note that this is a simplification, Grey wolf optimizer (GWO) is a nature-based and already which does not describe how the trial point was created. well-established meta-heuristic method inspired by social The trial point ⃗𝑦 𝑖 to point βƒ—π‘₯𝑖 is created using two opera- dynamics found in the pack of grey wolves [8]. In other tions, mutation and crossover. We will discuss the spe- words, the algorithm simulates the behavior of wolves, cific method of creating the trial point in the pseudo-code that live and hunt together in packs. The algorithm was presented in the following section for the jSO algorithm introduced in 2014. (Algorithm 2). Let us focus on principles observed in a pack of wolves. There are strict rules that they must follow, and each wolf 2.2. jSO has a clearly defined role. Based on this, we can classify them into four categories: alpha, beta, delta, and omega. jSO [5] is very sophisticated version of DE. The algo- The leader of the pack is the alpha pair of wolves. They rithm evolved from its predecessors. The first one, of are dominant in the group and other wolves follow their course except DE, is JADE [6]. Authors of this algorithm lead. They could be substituted by beta wolves if it is developed a new mutation, current-to-pbest/1. This mu- necessary. Beta wolves are second in command. They are tation in slightly modified version is used in jSO. They important because they help and support the alpha pair also suggested to employ an archive of old members of during its decisions. Delta wolves follow instructions the population. The next algorithm is SHADE [3], which ordered by wolves alpha and beta. Mid-ranking wolves is improved version of JADE. SHADE employs historical Algorithm 2 jSO in the hierarchy are delta ones. They ensure the routine 1: Generate the initial generation 𝑃0 of the population activities of the pack and follow the orders of alpha and 𝑃 randomly; 𝑃0 = (π‘₯ βƒ— 1 , βƒ—π‘₯2 , ..., βƒ—π‘₯𝑁𝑃 ) beta wolves. Each delta wolf has a specific focus, based 2: Initialize archive 𝐴 = βˆ…; on which we can divide them into the following sub- 3: Compute the value of the optimized function 𝑓 at all categories: scouts, sentinels, and caretakers. The omega points of the generation 𝑃0 ; wolves are in the lowest position in the hierarchy and 4: Set all values in 𝑀𝐹 and 𝑀𝐢𝑅 to 0.5; others wolves often pick on them. It is important to filter Note: 𝑀𝐹 and 𝑀𝐢𝑅 are circle memories, storing aggression and prevent frustration in the pack. Losing the position parameters for the Cauchy (𝑀𝐹 ) and omega wolves can cause fighting between other wolves, normal (𝑀𝐢𝑅 ) distributions; and damage to discipline or hierarchy. The size of these memories is 𝐻; 𝐻 = 5; In nature, wolves’ primary goal is to find and hunt 5: 𝑔 = 0 (current iteration - generation); down prey. This process consists of these main steps: 6: π‘’π‘£π‘Žπ‘™π‘  = 𝑁𝑃 (current number of used function eval- searching for prey, encircling prey, and attacking prey. uations); During searching for the prey, wolves are trying to find 7: π‘˜ = 1 (index counter for circle memories); the most abundant (but easily catchable) prey to hunt. 8: while termination criterion is not met do Once they find such prey, they attempt to push it into a 9: 𝑆𝐹 = βˆ… and 𝑆𝐢𝑅 = βˆ…; situation when it is alone and cannot escape while they 10: for 𝑖 = 1 to 𝑁𝑃 do encircle it. Finally, when the prey is surrounded and can 11: Select π‘Ÿπ‘– randomly from {1, 2, ..., 𝐻}; no longer escape, they attack it. Wolves attack the weak 12: if π‘Ÿπ‘– = 𝐻 then spots of the prey like legs, snout, or belly, until it stops 13: 𝑀𝐹,π‘Ÿπ‘– = 0.9; 𝑀𝐢𝑅,π‘Ÿπ‘– = 0.9; resisting, and afterwards, they bring it down and crush 14: if 𝑀𝐢𝑅,π‘Ÿπ‘– < 0 then its windpipe. 15: 𝐢𝑅𝑖,𝑔 = 0; The Grey wolf optimizer is inspired by the processes 16: else Generate 𝐢𝑅 using normal distribution: described above: the creation of a social hierarchy and 𝐢𝑅𝑖,𝑔 = 𝑁𝑖 (𝑀𝐢𝑅,π‘Ÿπ‘– , 0.1); hunting technique. Because it is an agent-based algo- rithm, each agent represents one of the wolves in the 17: if π‘’π‘£π‘Žπ‘™π‘  < 14 π‘šπ‘Žπ‘₯π‘’π‘£π‘Žπ‘™π‘ ; π‘šπ‘Žπ‘₯π‘’π‘£π‘Žπ‘™π‘  is the pack. Agents are randomly placed into the environment maximum number of allowed function (search space 𝑆). The better value (in the sense of mini- evaluations; mization) in the position of the current wolf the closer then position to the prey (the solution of the solved optimiza- 18: 𝐢𝑅𝑖,𝑔 = π‘šπ‘Žπ‘₯(𝐢𝑅𝑖,𝑔 , 0.7); tion problem). 19: else if π‘’π‘£π‘Žπ‘™π‘  < 12 π‘šπ‘Žπ‘₯π‘’π‘£π‘Žπ‘™π‘ ; then GWO is an iterative algorithm. In each iteration, each 20: 𝐢𝑅𝑖,𝑔 = π‘šπ‘Žπ‘₯(𝐢𝑅𝑖,𝑔 , 0.6); wolf is assigned to the pack hierarchy according to the 21: Generate 𝐹 using Cauchy distribution: value of its fitness function 𝑓 at its position. The wolf 𝐹𝑖,𝑔 = 𝐢𝑖 (𝑀𝐹,π‘Ÿπ‘– , 0.1); with the best value is ranked as alpha, the second best as 22: if π‘’π‘£π‘Žπ‘™π‘  < 16 π‘šπ‘Žπ‘₯π‘’π‘£π‘Žπ‘™π‘  and 𝐹𝑖,𝑔 > 0.7 beta, the third best as delta, and all the others as omega. then Wolves alpha, beta, and delta have the same meaning and 23: 𝐹𝑖,𝑔 = 0.7; save the three best solutions found at the iteration. Posi- 24: A trial point ⃗𝑦 𝑖,𝑔 is created using tions of wolves in the environment are updated in each DE/current-to-pbest-w/1/bin strategy (see iteration. The new position of the agent is based upon [5]); the estimated location of the prey, which is probably 25: Evaluate optimized (objective) function 𝑓 in all somewhere between alpha, beta, and delta wolves. We 𝑁𝑃 made trial points ⃗𝑦 𝑖,𝑔 , 𝑖 = 1, 2, . . . , 𝑁𝑃 ; assume this, but the optimum may be located elsewhere, 26: π‘’π‘£π‘Žπ‘™π‘  = π‘’π‘£π‘Žπ‘™π‘  + 𝑁𝑃 ; so a mechanism is needed to thoroughly scan the entire 27: for 𝑖 = 1 to 𝑁𝑃 do environment. The agent approaching the prey hunts it, 28: if 𝑓 (𝑦⃗ 𝑖,𝑔 ) ≀ 𝑓 (π‘₯ βƒ— 𝑖,𝑔 ) then while the agent moving away from it tries to find even 29: Update point for the next generation: more abundant prey elsewhere. For this purpose, two βƒ—π‘₯𝑖,𝑔+1 = ⃗𝑦 𝑖,𝑔 vectors 𝐴⃗ and 𝐢⃗ are used, thanks to which the algorithm 30: else βƒ—π‘₯𝑖,𝑔+1 = βƒ—π‘₯𝑖,𝑔 passes smoothly through two phases: scouting and hunt- 31: if 𝑓 (𝑦 βƒ— 𝑖,𝑔 ) then βƒ— 𝑖,𝑔 ) < 𝑓 (π‘₯ ing the prey. Both vectors have random components so 32: Insert βƒ—π‘₯𝑖,𝑔 into archive 𝐴; they help to prevent convergence to a local optimum (not 33: Insert 𝐢𝑅𝑖,𝑔 into 𝑆𝐢𝑅 ; and 𝐹𝑖,𝑔 into 𝑆𝐹 ; very abundant prey) instead of a global one. 34: If necessary, shrink archive A; Vector 𝐴 βƒ— has components π‘Ÿπ‘Žπ‘›π‘‘(βˆ’1, 1) * π‘Ž, where 35: Calculate the new value of the first parameter π‘Ÿπ‘Žπ‘›π‘‘(βˆ’1, 1) generates a random number with a uniform for both distributions 𝑆𝐹 and 𝑆𝐢𝑅 , and store them in π‘€πΉπ‘˜ and π‘€πΆπ‘…π‘˜ ; π‘˜ = π‘˜ + 1; 36: if π‘˜ > 𝐻 then π‘˜ = 1; 37: Apply linear population size reduction mecha- nism, see [5] (update 𝑁𝑃 and population 𝑃 ); 38: Update parameter 𝑝 for current-to-pbest-w/1 mutation strategy (see [5]); 39: 𝑔 = 𝑔 + 1; 40: The result is the best point in 𝑃𝑔 . distribution between -1 and 1 and π‘Ž = 2 βˆ’ (2𝑖𝑑/π‘–π‘‘π‘šπ‘Žπ‘₯ ), 𝑋⃗𝛽 (𝑖𝑑), and 𝑋 ⃗𝛿 (𝑖𝑑) wolves, similarly as 𝐴 βƒ— , so we get 𝐢⃗1 , while 𝑖𝑑 is the algorithm current iteration and π‘–π‘‘π‘šπ‘Žπ‘₯ is 𝐢⃗2 , and 𝐢⃗3 . the maximum number of iterations. Each component of GWO has only two parameters, the size of the wolf the vector influences movement of the agent in a specific pack 𝑁 and the length of the time it can search for the dimension of the environment (search space). As you optimum. In the original proposal, it works with the can see, the interval in which components of vector 𝐴 βƒ— maximum of iterations it can make. Here, we use the lie is narrowing we can say linearly from [βˆ’2, 2] to [0, 0] maximum number of function evaluations, in order to do as the number of iterations increases. It is because the a fair comparison of algorithms. parameter π‘Ž decreases during the whole run from 2 to 0. The closer is the value of component of 𝐴 βƒ— to 0, the higher Algorithm 3 GWO is the probability that the agent chooses the hunting 1: Randomly generate an initial population of wolves- phase instead of the scouting one. agents 𝑋 βƒ—1 , 𝑋 βƒ—2 , ..., 𝑋⃗𝑁 into the environment; Another vector supporting divergence between scout- 2: while termination criterion is not met do ing and hunting phases is 𝐢 βƒ— . Vector 𝐢 βƒ— is similar to vector 3: Calculate the fitness value of each agent 𝑋 ⃗𝑗 ; 𝐴, but the values of components of this vector do not βƒ— 4: Determine the social hierarchy and find position linearly decrease as the number of iterations grows. This of alpha, beta, and delta wolves; vector has components set to π‘Ÿπ‘Žπ‘›π‘‘(0, 2), which is a ran- 5: Generate vectors 𝐴 βƒ— and 𝐢 βƒ— for all three best dom number with the uniform distribution between 0 wolves; and 2. The closer the value is to 0, the higher is the prob- 6: Calculate the new position of each agent 𝑋 ⃗𝑗 ; ability that the agent chooses the hunting phase. This 7: The result is the position of the best wolf after the vector helps wolves to behave more naturally. In nature, last iteration of the algorithm. there are various obstacles (e.g. bushes, stones, or trees) on hunting trails. Wolves change direction to avoid them, so they do not move directly towards their prey. Vector Because wolves are moving closer toward prey from βƒ— simulates this part of their behaviour. 𝐢 various directions with the increasing number of itera- tions, they are encircling it. Each wolf-agent is on position 𝑋 ⃗𝑗 in search space 𝑆, 𝑗 is index of wolf, 𝑗 = 1, 2, . . . , 𝑁 , where 𝑁 is the size of wolf pack. Positional vectors of agents 𝑋 ⃗𝑗 are updated 3. Cooperation of Algorithms βƒ— βƒ— βƒ— according to the formula 𝑋𝑗 (𝑖𝑑 + 1) = 𝑋1 +𝑋32 +𝑋3 , βƒ— where 𝑋 βƒ—1 , 𝑋 βƒ—2 , and 𝑋 βƒ—3 represent potential new positions We proposed to use both algorithms, jSO and GWO, to- of the prey to move (positions close to optimum) and are gether in the optimization process of a function. The used computed in the following way: idea of common use of both algorithms is very simple. We wanted to do a part of optimization process by one 𝑋⃗1 = 𝑋⃗𝛼 (𝑖𝑑) βˆ’ 𝐴⃗1 * 𝐷⃗𝛼 , (5) of two mentioned algorithms and then give the results of the algorithm to the other algorithm to do another 𝑋⃗2 = 𝑋⃗𝛽 (𝑖𝑑) βˆ’ 𝐴⃗2 * 𝐷⃗𝛽 , (6) part of optimization process and then after making its part of the process, it gives its results to the first algo- 𝑋⃗3 = 𝑋 ⃗𝛿 (𝑖𝑑) βˆ’ 𝐴⃗3 * 𝐷⃗𝛿 , (7) rithm, etc. Each algorithm needs some time to optimize where 𝑋⃗𝛼 (𝑖𝑑), 𝑋⃗𝛽 (𝑖𝑑), and 𝑋 ⃗𝛿 (𝑖𝑑) are current positions a function. It is like when someone needs some time to of alpha, beta, and delta wolf, respectively. They rep- do something he has to do. It is not the best idea to let resent the current best three preys found, 𝐴 βƒ— is already somebody do something and immediately after he starts defined above and is generated separately for each of doing it to stop him. Mentioned ideas led us to divide 𝑋⃗𝛼 (𝑖𝑑), 𝑋⃗𝛽 (𝑖𝑑), and 𝑋⃗𝛿 (𝑖𝑑) wolves, so we get 𝐴⃗1 , 𝐴⃗2 , the number of allowed function evaluations into several and 𝐴3 . Vectors 𝐷𝛼 , 𝐷⃗𝛽 , 𝐷 βƒ— βƒ— ⃗𝛿 represent the distance of (not many) parts, we divide the amount into π‘˜ (π‘˜ = 10) portions here. The length of a portion is 𝑙. And then, the wolf 𝑋𝑗 from prey. It is computed in the following jSO makes 5 of these parts and GWO does the rest of the βƒ— way: parts (also 5 parts). jSO starts doing optimization process 𝐷⃗𝛼 = |𝐢⃗1 * 𝑋⃗𝛼 (𝑖𝑑) βˆ’ 𝑋 ⃗𝑗 (𝑖𝑑)|, (8) and after spending such amount of function evaluation 𝐷⃗𝛽 = |𝐢⃗2 * 𝑋⃗𝛽 (𝑖𝑑) βˆ’ 𝑋 ⃗𝑗 (𝑖𝑑)|, (9) that equals to 𝑙, it gives its population to GWO. GWO takes the best 𝑁 (𝑁 = 6, the size of the pack of wolves) 𝐷⃗𝛿 = |𝐢⃗3 * 𝑋⃗𝛿 (𝑖𝑑) βˆ’ 𝑋⃗𝑗 (𝑖𝑑)|, (10) points of received population and spends next 𝑙 function where |𝑋 βƒ— | is vector whose components are the absolute evaluations. After each iteration, GWO in cooperation values of components of 𝑋 βƒ— . Vector 𝐢⃗ was already defined tests if its current optimum is better than the optimum in the population which was originally received from above and is generated separately for each of 𝑋⃗𝛼 (𝑖𝑑), jSO. And if so, it rewrites randomly chosen points in the Table 1 "temporary" population which is prepared for next work Search spaces, optimal values, and position of optimal values of jSO. When it is the last run of GWO in cooperation, it for used test functions rewrites the whole received population by the wolves, when the best point found in the iteration is better than function search space 𝑆 𝑓 (π‘₯βƒ—* ) π‘₯βƒ—* the interim result. After spending its amount of function Ackley [βˆ’30, 30]𝐷 0 (0, 0, 0, . . . , 0) evaluations, it gives prepared population (with the best Rastrigin [βˆ’5.12, 5.12]𝐷 0 (0, 0, 0, . . . , 0) found point) again to jSO. And the process is repeated 5 Rosenbrock [βˆ’10, 10]𝐷 0 (1, 1, 1, . . . , 1) times. Thus, the last algorithm which works in the search Happycat [βˆ’50, 50]𝐷 0 (0, 0, 0, . . . , 0) process by our cooperation algorithm is GWO. We wanted to use all advantages of both original algo- rithms, so we put them together in a way that keeps all the principles of the original algorithms. So, all param- The presented cooperative algorithm is one of the first eters of both algorithms are set to the values according attempts to use the GWO and jSO algorithms together. their original proposal, see [5], [8]. When jSO optimizes, In order to achieve even better results and to find the the size of population is gradually linearly decreased. optimal number of repetitions of the sequence of both al- Also, parameter 𝑝 decreases as proposed in [5]. When gorithms, it would be necessary to perform experiments the GWO algorithm works inside cooperation, also param- on a larger number of optimization functions with a va- eter π‘Ž decreases (as proposed in [8]) during the whole riety of features. run of cooperation. The size of the pack of wolves is equal to 𝑁 during the whole run of the cooperation algorithm. 4. Experiments and Results Algorithm 4 Cooperation algorithm We computed optima by three algorithms, jSO, GWO, 1: Generate the initial generation 𝑃0 of the population and proposed cooperation in this paper. Four multi-modal 𝑃 ; 𝑃0 = (π‘₯ βƒ— 1 , βƒ—π‘₯2 , ..., βƒ—π‘₯𝑁𝑃 ); 𝑁𝑃 is set as in the jSO functions were used to make an experimental tests. Used algorithm; functions are Ackley function - eq. (11), Rastrigin func- 2: 𝑙 = π‘šπ‘Žπ‘₯π‘’π‘£π‘Žπ‘™π‘  π‘˜ ; π‘˜ is the total number of runs of both tion - eq. (12), Rosenbrock function - eq. (13), and Hap- algorithms in cooperation; π‘˜ = 10; π‘šπ‘Žπ‘₯π‘’π‘£π‘Žπ‘™π‘  is the pycat function - eq. (14). total number of allowed function evaluations; √︁ βˆ‘οΈ€ 3: Set the iteration counter 𝑐 to 1; (︁ )︁ 1 𝐷 2 βƒ— ) = βˆ’20 exp βˆ’0.02 𝐷 𝑓 (π‘₯ 𝑗=1 π‘₯𝑗 βˆ’ 4: π‘Ÿ = 0; (︁ βˆ‘οΈ€ 𝐷 )︁ (11) 1 5: while 𝑐 ≀ π‘˜ do βˆ’ exp 𝐷 𝑗=1 cos 2πœ‹π‘₯𝑗 + 20 + exp(1) 6: if 𝑐 mod 2 = 1 then 7: Read population π‘ƒπ‘Ÿ as input; 𝐷 βˆ‘οΈ (12) [οΈ€ 2 ]οΈ€ 8: Run the jSO algorithm for 𝑙 evaluations; 𝑓 (π‘₯ βƒ— ) = 10 𝐷 + π‘₯𝑗 βˆ’ 10 cos(2πœ‹π‘₯𝑗 ) 𝑗=1 9: π‘Ÿ =π‘Ÿ+1 10: Note: the output of this run of jSO is π‘ƒπ‘Ÿ ; π·βˆ’1 βˆ‘οΈ [οΈ€ 100(π‘₯2𝑗 βˆ’ π‘₯𝑗+1 )2 + (1 βˆ’ π‘₯𝑗 )2 (13) ]οΈ€ 11: else if 𝑐 mod 2 = 0 then 𝑓 (π‘₯ βƒ—) = 12: Read the 𝑁 best points from π‘ƒπ‘Ÿ 𝑗=1 into 𝑋⃗ 1, 𝑋 βƒ— 2 , ..., 𝑋 βƒ— 𝑁 ; 𝑁 = 6 in this case; βƒ’βˆ‘οΈ€ βƒ’ 𝐷 2 βƒ’1/4 Run the GWO algorithm for 𝑙 evaluations; 𝑓 (π‘₯ βƒ— ) = π‘₯ βˆ’ 𝐷 βƒ’ + βƒ’ 13: 𝑗 𝑗=1 (14) βƒ’ if the condition about currently found best (︁ βˆ‘οΈ€ )︁ 14: 1 𝐷 2 𝐷 βˆ‘οΈ€ + 𝐷 0.5 𝑗=1 π‘₯𝑗 + 𝑗=1 π‘₯𝑗 + 0.5 point and the best point of π‘ƒπ‘Ÿ holds (see text) then Used search space 𝑆 of each used function is displayed 15: if 𝑐 < π‘˜ then in Table 1. In this table, global optimum and point of 16: Rewrite 𝑁 points in π‘ƒπ‘Ÿ by current the optimum are written too. It is not clearly visible, but positions of 𝑋 βƒ— 1, 𝑋 βƒ— 2 , ..., 𝑋 βƒ— 𝑁 but do each of these functions has many local extremes. not affect the best three points of π‘ƒπ‘Ÿ ; Algorithms were tested on two levels of dimension, 17: else 𝐷 = 10 and 𝐷 = 30, in this work. In each dimension, 18: Rewrite whole π‘ƒπ‘Ÿ by pack 𝑋 βƒ— 1, 𝑋⃗ 2, we set the total amount of allowed function evaluations ..., 𝑋 ⃗𝑁 to two different values, for 𝐷 = 10, the two values were 19: 𝑐 = 𝑐 + 1; 3000 and 30000, for 𝐷 = 30, the two values were 10000 and 100000. For each combination of algorithm, function, 20: The result is the best point in π‘ƒπ‘Ÿ . dimension, and value of allowed function evaluations, Table 2 runs, GWO reaches very good values, they are very near Results of three tested algorithms on Ackley, Rastrigin, Rosen- optimum. Results of jSO are only a little worse here. The brock, and Happycat functions in dimensions 𝐷 = 10 and cooperation algorithm adopts the results of GWO in this 𝐷 = 30 with two different amounts of allowed function eval- case. In shorter runs, GWO reaches better results than uations, 3000 and 30000 for 𝐷 = 10, 10000 and 100000 for jSO, and when both algorithms optimize together in the 𝐷 = 30 cooperation algorithm, the results are only a little worse than the results of GWO, but better than the results of jSO GWO coop jSO. For dimension 𝐷 = 30 in longer runs, GWO reaches mean mean mean again very small values (reached values of jSO are worse) func D maxevals std std std and cooperation adopts them again. In shorter runs here, ack 10 3 Γ— 103 1.2005 0.0019 0.0973 0.1801 0.0037 0.1164 the situation is similar or a little better than for shorter 3 Γ— 104 2.1E-06 4.4E-16 4.4E-16 runs in dimension 𝐷 = 10. 5.0E-06 0 0 When we discuss the optimization process of Rastrigin 30 1 Γ— 104 1.4459 1.6E-11 2.8E-06 function with tested algorithms, the situation is very 0.0869 6.2E-11 6.3E-06 similar to the previous case, in both tested dimensions 1 Γ— 105 0.0027 4.4E-16 4.4E-16 for both lengths of runs. 0.0016 0 0 For Rosenbrock function for both tested dimensions ras 10 3 Γ— 103 24.443 0.0098 1.0742 for shorter runs, GWO is better optimizer than jSO and 5.0200 0.0263 1.8183 cooperation is little better then GWO. But when algo- 3 Γ— 104 4.8E-08 0 0 rithms have much more time (larger amount of function 9.5E-08 0 0 evaluations), jSO is better then GWO and cooperation 30 1 Γ— 104 125.44 0 0.0664 11.986 0 0.2570 reaches better results than GWO but not better than jSO 1 Γ— 105 0.6580 0 0 reaches. 0.8321 0 0 When we think about the Happycat function, the re- ros 10 3 Γ— 103 12.188 9.0068 8.6566 sults of tests are very similar in both dimensions. It holds 3.3652 0.0327 0.3620 for both lengths of runs. We mean the results of compar- 3 Γ— 104 1.9E-06 8.9895 0.5470 ison of algorithms. Moreover, it seems that the results of 5.7E-06 0.0131 0.3062 cooperation are a little better than the results of the jSO 30 1 Γ— 104 57.495 28.993 28.421 algorithm (which is here the better one of GWO and jSO 12.379 0.0040 0.3609 algorithms) for shorter runs and lower dimension. For 1 Γ— 105 12.716 28.974 19.871 longer runs in both dimensions, jSO is better then GWO 0.8105 0.0554 0.7234 and cooperation reaches better results than GWO but a hac 10 3 Γ— 103 0.7336 1.7077 0.6613 little worse results than jSO. 0.2225 0.3172 0.1212 3 Γ— 104 0.1094 1.8145 0.1652 0.0232 0.3412 0.0316 30 1 Γ— 104 0.7271 1.8126 0.9969 5. Conclusion 0.1310 0.2919 0.1742 The new algorithm called cooperation for global optimiza- 1 Γ— 105 0.2262 2.0396 0.2878 0.0362 0.3823 0.0283 tion was proposed in this paper. It is based on two very ef- fective optimizers, Grey wolf optimizer and one of many adaptive versions of Differential evolution algorithm, the jSO algorithm. we made 15 runs. The total amount of runs in our exper- We proposed to use both algorithms together for opti- iments was 720. mization. The idea of cooperation is very easy, the algo- All tested algorithms were implemented in GNU Oc- rithms take turns performing the optimization process. tave, version 7.1.0 and all computations were carried Four multi-modal functions and two levels of dimension out on a standard PC with Windows 10 Home, Intel(R) were selected for our experimental comparison. The Core(TM) i7-7500U CPU 2.70GHz 2.90GHz, 8 GB RAM. results of the made experimental comparison are promis- Summarisation of experimental test results is written ing. in Table 2. We highlight the best results in bold. Results In this paper, we have used only a basic scheme in are also displayed on the two figures above. There is a which each algorithm has been used repeatedly 5-times boxplot for shorter runs on the left side and the boxplot for the cooperation algorithm. In future work, we plan to for longer ones on the right side for all four displayed develop some more sophisticated schema for the change combinations of function and dimension on these figures. of controlling of optimization process by these two al- For Ackley function and dimension 𝐷 = 10 in longer gorithms, probably based on the stagnation of search Ackley, D=10 Rosenbrock, D=10 Ackley, D=30 Rosenbrock, D=30 Rastrigin, D=10 Happycat, D=10 Rastrigin, D=30 Happycat, D=30 Figure 1: Results of all three tested algorithms on Ackley and Figure 2: Results of all three tested algorithms on Rosenbrock Rastrigin functions and Happycat functions process by currently used algorithm. The goal is to find a References scheme that works best for most of the functions tested, ideally better than both original algorithms in most cases. [1] Wolpert D. H., Macready, W. G.: No Free Lunch The- orems for Optimization. IEEE Transactions on Evo- Acknowledgement: This work was supported by the lutionary Computation. 1 (1997) 67–82 project no. CZ.02.2.69/0.0/0.0/18_054/0014696, "Devel- [2] Storn R., Price, K.: Differential evolution - A Simple opment of R&D capacities of the Silesian University in and Efficient Heuristic for Global Optimization over Opava", co-funded by the European Union. Continuous Spaces. J. Global Optimization. 11 (1997) 341–359 [3] Tanabe R., Fukunaga, A.: Success-history based pa- Acknowledgments rameter adaptation for differential evolution. In IEEE Congress on Evolutionary Computation 2013. (2013) This work was supported by the project no. 71–78 CZ.02.2.69/0.0/0.0/18_054/0014696, "Development [4] Tanabe R., Fukunaga, A.: Improving the Search Per- of R&D capacities of the Silesian University in Opava", formance of SHADE Using Linear Population Size co-funded by the European Union. Reduction. In IEEE Congress on Evolutionary Compu- tation 2014. (2014) 1658–1665 [5] Brest J., Maučec M. S., BoΕ‘kovič B.: Single Objec- [7] Brest J., Maučec M. S., BoΕ‘kovič B.: iL-SHADE: Im- tive Real-Parameter Optimization: Algorithm jSO. proved L-SHADE algorithm for single objective real- In IEEE Congress on Evolutionary Computation 2017. parameter optimization. In IEEE Congress on Evolu- (2017) 1311–1318 tionary Computation 2016. (2016) 1188–1195 [6] Zhang J., Sanderson A. C.: JADE: Adaptive Differ- [8] Mirjalili S., Mirjalili S. M., Lewis A.: Grey Wolf Opti- ential Evolution With Optional External Archive. mizer. Advances in Engineering Software. 69 (2014) IEEE Transactions on Evolutionary Computation. 13 46–61 (2009) 945–958