1. Global Optimization

jSO and GWO Algorithms Optimize Together

Radka Poláková

Daniel Valenta

0 0 Silesian University in Opava, Faculty of Philosophy and Science in Opava, The Institute of Computer Science , Opava , Czech Republic

This paper deals with global optimization. There are many algorithms for global optimization in the literature. In this text, we focus on two efective optimizers. The first one is an adaptive version of Diferential evolution algorithm which was the most successful version of the algorithm on CEC 2014 congress and is called the jSO algorithm. The second one is the Grey wolf optimizer which was introduced in 2014. We propose new algorithm cooperation in which both of these algorithms are used together to get better results when optimization problems are being solved. In our attempt, both algorithms take turns making the optimization process. We tested the proposed algorithm on four multi-modal functions on two levels of dimension. The results are quite promising.

eol>Optimisation GWO jSO Cooperation Optimization algorithms

1. Global Optimization

: → ℛ, ⊂ ℛ

(1) is minimized function, is dimension of problem. is search space, here continuous.

= Π =1[ , ]; < , = 1, 2, . . . ,

(2) Π notation (also called Product notation, Cartesian in this case) is used here in the standard way and indicates repeated multiplication.

A point ⃗* is global minimum point of the function in the search space if the following condition holds.

∀⃗ ∈ ; (⃗* ) ≤ (⃗)

(3) It is possible to find the minimum of a function by analytical way, but there are functions for which such process is dificult, long, or impossible because of function features.

Then stochastic algorithms could help us. 2. Diferential Evolution and jSO There are many diferent algorithms to optimizeminimize a mathematical function. In this paper, we work with jSO which is an adaptive version of Diferential evolution algorithm and also with Grey wolf optimizer. We describe all three algorithms briefly in this section. 2.1. Diferential Evolution Diferential evolution (DE) algorithm was introduced in [2]. It is an eficient optimizer. It is population-based algorithm in which a population evolves during the run in order to have better and better members. Better in a sense of lower function value in the point. The best member of the population is the result at the end of the run. It is the point of global optimum found by the algorithm.

The algorithm works with the population of points which are at the beginning of the run of the algorithm generated randomly (with uniform distribution) in the search space . Then, each population member is moved

In a generation (iteration) of the algorithm run, a trial point ⃗ is computed for each population member ⃗, = 1, 2, . . . , , is the size of the population. The point ⃗ is produced by two operations, the fist one is mutation. A mutant point ⃗ is computed based on a kind of mutation. Then the mutant ⃗ enters into crossover operation together with the original point ⃗. The result of the crossover is the trial point ⃗. If the trial point ⃗ is better than the original point ⃗, the new point ⃗ enters into the next generation instead of ⃗. If not, the original point ⃗ enters into the next generation of the population.

There are several types of mutation, e.g. rand/1, randrl/1, current-to-rand/1, rand/2, current-to-pbest/1, etc.

The most used one is the rand/1 mutation - eq. (4). By this mutation, each mutant ⃗ is computed from three randomly chosen points ⃗1 , ⃗2 , and ⃗3 from current population which are diferent from original point ⃗.

Algorithm 1 DE 1: Generate the initial generation 0 of the population

; 0 = (⃗1, ⃗2, ..., ⃗ ) 2: Compute the value of the optimized function at all

points of the generation 0; 3: Set counter of generations to = 0; 4: while termination criterion is not met do 5: = ; 6: for = 1 to do 7: Create a trial point ⃗ to point ⃗; 8: Compute the value of the function at point ⃗ = ⃗1 + (⃗2 − ⃗3 ) (4) circle memories to adapt and parameters. Then The is mutation parameter of DE. Also the crossover the LSHADE algorithm [4] was proposed, it is SHADE could be made in one of several ways in DE. The most with a linear reduction of population size mechanism. frequently used two types of crossover are binomial and The next algorithm is iLSHADE [7]. And finally, the last exponential ones. The algorithm uses parameter one is jSO. The jSO algorithm uses the linear reduction as a crossover parameter. It influences the count of co- of population size mechanism, archive, and other tools. ordinates which are inherited by a trial point from the It has several features diferent from iLSHADE, e.g. size mutant point ⃗. In the binomial crossover, inherited of the population is set to = 25 × √ × log at the coordinates are distributed uniformly. The trial point beginning of the search process instead of = 18 × , takes consecutive series of coordinates from the mutant parameter , which is the parameter of current-to-pbest/1 in the exponential crossover. (and also of current-to-pbest-w/1) mutation, is handled

Thus, DE algorithm has several parameters. They are in a diferent way in this algorithm compared to its premutation parameter , crossover parameter , muta- decessors. The size of historical circle memories is set tion type, type of crossover, and also population size . to 5 here. For a detailed description of the algorithm, see There are many adaptive versions of the algorithm in [5] and Algorithm 2. the literature. Several of them were successful on CEC congresses, e.g. SHADE [3], LSHADE [4], jSO [5].

The principles of the algorithm are shown in the 2.3. Grey Wolf Optimizer pseudo-code below. Note that this is a simplification, which does not describe how the trial point was created.

The trial point ⃗ to point ⃗ is created using two operations, mutation and crossover. We will discuss the specific method of creating the trial point in the pseudo-code presented in the following section for the jSO algorithm (Algorithm 2). 2.2. jSO jSO [5] is very sophisticated version of DE. The algorithm evolved from its predecessors. The first one, of course except DE, is JADE [6]. Authors of this algorithm developed a new mutation, current-to-pbest/1. This mutation in slightly modified version is used in jSO. They also suggested to employ an archive of old members of the population. The next algorithm is SHADE [3], which is improved version of JADE. SHADE employs historical Grey wolf optimizer (GWO) is a nature-based and already well-established meta-heuristic method inspired by social dynamics found in the pack of grey wolves [8]. In other words, the algorithm simulates the behavior of wolves, that live and hunt together in packs. The algorithm was introduced in 2014.

Let us focus on principles observed in a pack of wolves.

There are strict rules that they must follow, and each wolf has a clearly defined role. Based on this, we can classify them into four categories: alpha, beta, delta, and omega.

The leader of the pack is the alpha pair of wolves. They are dominant in the group and other wolves follow their lead. They could be substituted by beta wolves if it is necessary. Beta wolves are second in command. They are important because they help and support the alpha pair during its decisions. Delta wolves follow instructions ordered by wolves alpha and beta. Mid-ranking wolves Algorithm 2 jSO in the hierarchy are delta ones. They ensure the routine 1: Generate the initial generation 0 of the population activities of the pack and follow the orders of alpha and randomly; 0 = (⃗1, ⃗2, ..., ⃗ ) beta wolves. Each delta wolf has a specific focus, based 2: Initialize archive = ∅; on which we can divide them into the following sub3: Compute the value of the optimized function at all categories: scouts, sentinels, and caretakers. The omega points of the generation 0; wolves are in the lowest position in the hierarchy and 4: Set all values in and to 0.5; others wolves often pick on them. It is important to filter Note: and are circle memories, storing aggression and prevent frustration in the pack. Losing the position parameters for the Cauchy ( ) and omega wolves can cause fighting between other wolves, normal () distributions; and damage to discipline or hierarchy.

The size of these memories is ; = 5; In nature, wolves’ primary goal is to find and hunt 5: = 0 (current iteration - generation); down prey. This process consists of these main steps: 6: = (current number of used function eval- searching for prey, encircling prey, and attacking prey.

uations); During searching for the prey, wolves are trying to find 7: = 1 (index counter for circle memories); the most abundant (but easily catchable) prey to hunt. 8: while termination criterion is not met do Once they find such prey, they attempt to push it into a 9: = ∅ and = ∅; situation when it is alone and cannot escape while they 10: for = 1 to do encircle it. Finally, when the prey is surrounded and can 11: Select randomly from {1, 2, ..., }; no longer escape, they attack it. Wolves attack the weak 12: if = then spots of the prey like legs, snout, or belly, until it stops 13: , = 0.9; , = 0.9; resisting, and afterwards, they bring it down and crush 14: if , < 0 then its windpipe. 15: , = 0; The Grey wolf optimizer is inspired by the processes 16: else Generate using normal distribution: described above: the creation of a social hierarchy and , = (, , 0.1); hunting technique. Because it is an agent-based algo17: if < 14 ; is the rithm, each agent represents one of the wolves in the maximum number of allowed function pack. Agents are randomly placed into the environment evaluations; (search space ). The better value (in the sense of minithen mization) in the position of the current wolf the closer 1198:: elseif,= <12(,,0.7;)t;hen tpioosnitpiornobtloe mth)e. prey (the solution of the solved optimiza20: , = (,, 0.6); woGlfWisOasissiagnneitdertaotitvheeaplgaockrithhimer.aIrncheyacahccitoerrdaitniognt,oeathche 21: Generate using Cauchy distribution: value of its fitness function at its position. The wolf , = (, , 0.1); with the best value is ranked as alpha, the second best as 22: if < 16 and , > 0.7 beta, the third best as delta, and all the others as omega.

then Wolves alpha, beta, and delta have the same meaning and 23: , = 0.7; save the three best solutions found at the iteration. Posi24: A trial point ⃗, is created using tions of wolves in the environment are updated in each DE/current-to-pbest-w/1/bin strategy (see iteration. The new position of the agent is based upon [5]); the estimated location of the prey, which is probably 25: Evaluate optimized (objective) function in all somewhere between alpha, beta, and delta wolves. We made trial points ⃗,, = 1, 2, . . . , ; assume this, but the optimum may be located elsewhere, 26: = + ; so a mechanism is needed to thoroughly scan the entire 27: for = 1 to do environment. The agent approaching the prey hunts it, 28: if (⃗,) ≤ (⃗,) then while the agent moving away from it tries to find even 29: Update point for the next generation: more abundant prey elsewhere. For this purpose, two ⃗,+1 = ⃗, vectors ⃗ and ⃗ are used, thanks to which the algorithm 30: else ⃗,+1 = ⃗, passes smoothly through two phases: scouting and hunt31: if (⃗,) < (⃗,) then ing the prey. Both vectors have random components so 32: Insert ⃗, into archive ; they help to prevent convergence to a local optimum (not 33: Insert , into ; and , into ; very abundant prey) instead of a global one.

Vector ⃗ has components (− 1, 1) * , where (− 1, 1) generates a random number with a uniform 34: If necessary, shrink archive A; 35: Calculate the new value of the first parameter for both distributions and , and store them in and ; = + 1; 36: if > then = 1; 37: Apply linear population size reduction mecha

nism, see [5] (update and population ); 38: Update parameter for current-to-pbest-w/1

mutation strategy (see [5]); 39: = + 1; 40: The result is the best point in . distribution between -1 and 1 and = 2 − (2/), while is the algorithm current iteration and is the maximum number of iterations. Each component of the vector influences movement of the agent in a specific dimension of the environment (search space). As you can see, the interval in which components of vector ⃗ lie is narrowing we can say linearly from [− 2, 2] to [0, 0] as the number of iterations increases. It is because the parameter decreases during the whole run from 2 to 0.

The closer is the value of component of ⃗ to 0, the higher is the probability that the agent chooses the hunting phase instead of the scouting one.

Another vector supporting divergence between scouting and hunting phases is ⃗. Vector ⃗ is similar to vector ⃗, but the values of components of this vector do not linearly decrease as the number of iterations grows. This vector has components set to (0, 2), which is a random number with the uniform distribution between 0 and 2. The closer the value is to 0, the higher is the probability that the agent chooses the hunting phase. This vector helps wolves to behave more naturally. In nature, there are various obstacles (e.g. bushes, stones, or trees) on hunting trails. Wolves change direction to avoid them, so they do not move directly towards their prey. Vector ⃗ simulates this part of their behaviour.

Each wolf-agent is on position ⃗ in search space , is index of wolf, = 1, 2, . . . , , where is the size of wolf pack. Positional vectors of agents ⃗ are updated according to the formula ⃗ ( + 1) = ⃗1+⃗2+⃗3 ,

3 where ⃗1, ⃗2, and ⃗3 represent potential new positions of the prey to move (positions close to optimum) and are computed in the following way: 1 = ⃗ () − 1 * ⃗ , ⃗ ⃗ 2 = ⃗ () − 2 * ⃗ , ⃗ ⃗ 3 = ⃗ () − 3 * ⃗ , ⃗ ⃗ where ⃗ (), ⃗ (), and ⃗ () are current positions of alpha, beta, and delta wolf, respectively. They represent the current best three preys found, ⃗ is already defined above and is generated separately for each of ⃗ (), ⃗ (), and ⃗ () wolves, so we get ⃗1, ⃗2, and ⃗3. Vectors ⃗ , ⃗ , ⃗ represent the distance of the wolf ⃗ from prey. It is computed in the following way: = |1 * ⃗ () − ⃗ ()|, ⃗ ⃗ = |2 * ⃗ () − ⃗ ()|, ⃗ ⃗ = |3 * ⃗ () − ⃗ ()|, ⃗ ⃗ ⃗ (), and ⃗ () wolves, similarly as ⃗, so we get ⃗1, ⃗2, and ⃗3.

GWO has only two parameters, the size of the wolf pack and the length of the time it can search for the optimum. In the original proposal, it works with the maximum of iterations it can make. Here, we use the maximum number of function evaluations, in order to do a fair comparison of algorithms.

Algorithm 3 GWO 1: Randomly generate an initial population of wolves

agents ⃗1, ⃗2, ..., ⃗ into the environment; 2: while termination criterion is not met do 3: Calculate the fitness value of each agent ⃗ ; 4: Determine the social hierarchy and find position

of alpha, beta, and delta wolves; 5: Generate vectors ⃗ and ⃗ for all three best

wolves; 6: Calculate the new position of each agent ⃗ ; 7: The result is the position of the best wolf after the

last iteration of the algorithm.

Because wolves are moving closer toward prey from various directions with the increasing number of iterations, they are encircling it.

3. Cooperation of Algorithms

(10)

We proposed to use both algorithms, jSO and GWO, together in the optimization process of a function. The used idea of common use of both algorithms is very simple.

We wanted to do a part of optimization process by one (5) of two mentioned algorithms and then give the results of the algorithm to the other algorithm to do another (6) part of optimization process and then after making its part of the process, it gives its results to the first algo(7) rithm, etc. Each algorithm needs some time to optimize a function. It is like when someone needs some time to do something he has to do. It is not the best idea to let somebody do something and immediately after he starts doing it to stop him. Mentioned ideas led us to divide the number of allowed function evaluations into several (not many) parts, we divide the amount into ( = 10) portions here. The length of a portion is . And then, jSO makes 5 of these parts and GWO does the rest of the parts (also 5 parts). jSO starts doing optimization process (8) and after spending such amount of function evaluation (9) that equals to , it gives its population to GWO. GWO takes the best ( = 6, the size of the pack of wolves) points of received population and spends next function evaluations. After each iteration, GWO in cooperation tests if its current optimum is better than the optimum in the population which was originally received from where |⃗| is vector whose components are the absolute values of components of ⃗. Vector ⃗ was already defined above and is generated separately for each of ⃗ (), jSO. And if so, it rewrites randomly chosen points in the "temporary" population which is prepared for next work of jSO. When it is the last run of GWO in cooperation, it rewrites the whole received population by the wolves, when the best point found in the iteration is better than the interim result. After spending its amount of function evaluations, it gives prepared population (with the best found point) again to jSO. And the process is repeated 5 times. Thus, the last algorithm which works in the search process by our cooperation algorithm is GWO.

We wanted to use all advantages of both original algorithms, so we put them together in a way that keeps all the principles of the original algorithms. So, all parameters of both algorithms are set to the values according their original proposal, see [5], [8]. When jSO optimizes, the size of population is gradually linearly decreased. Also, parameter decreases as proposed in [5]. When the GWO algorithm works inside cooperation, also parameter decreases (as proposed in [8]) during the whole run of cooperation. The size of the pack of wolves is equal to during the whole run of the cooperation algorithm. Algorithm 4 Cooperation algorithm 1: Generate the initial generation 0 of the population ; 0 = (⃗1, ⃗2, ..., ⃗ ); is set as in the jSO 2: = ; is the total number of runs of both algorithm;

3: Set the iteration counter to 1; 4: = 0;

5: while ≤ do if mod 2 = 1 then

Read population as input; Run the jSO algorithm for evaluations;

= + 1

Note: the output of this run of jSO is ;

else if mod 2 = 0 then

Read the best points from

into ⃗1, ⃗2, ..., ⃗ ; = 6 in this case;

Run the GWO algorithm for evaluations;

if the condition about currently found best point and the best point of holds (see text) then if < then else

Rewrite points in by current positions of ⃗1, ⃗2, ..., ⃗ but do not afect the best three points of ; Rewrite whole by pack ⃗1, ⃗2, 20: The result is the best point in .

4. Experiments and Results

We computed optima by three algorithms, jSO, GWO, and proposed cooperation in this paper. Four multi-modal functions were used to make an experimental tests. Used functions are Ackley function - eq. (11), Rastrigin function - eq. (12), Rosenbrock function - eq. (13), and Hap

Used search space of each used function is displayed in Table 1. In this table, global optimum and point of the optimum are written too. It is not clearly visible, but each of these functions has many local extremes.

Algorithms were tested on two levels of dimension, = 10 and = 30, in this work. In each dimension, we set the total amount of allowed function evaluations to two diferent values, for = 10, the two values were 3000 and 30000, for = 30, the two values were 10000 and 100000. For each combination of algorithm, function, dimension, and value of allowed function evaluations, − exp (︁ 1 ∑︀ =1 cos 2

︁) (⃗) = − 20 exp (︁ − 0.02 √︁ 1 ∑︀ =1 2 −

︁) + 20 + exp(1) (⃗) = 10 + ∑︁ [︀ 2 − 10 cos(2 ) ︀] =1 − 1 =1 (⃗) = ∑︁ [︀ 100(2 − +1)2 + (1 − )2]︀ + 1 (︁ (⃗) = ⃒⃒ ∑︀

=1 2 − ⃒ 0.5 ∑︀ =1 2 + ∑︀ =1 ⃒

⃒ ︁) ⃒ 1/4

+ + 0.5 (11) (12) (13) (14) we made 15 runs. The total amount of runs in our experiments was 720.

All tested algorithms were implemented in GNU Octave, version 7.1.0 and all computations were carried out on a standard PC with Windows 10 Home, Intel(R) Core(TM) i7-7500U CPU 2.70GHz 2.90GHz, 8 GB RAM.

Summarisation of experimental test results is written in Table 2. We highlight the best results in bold. Results are also displayed on the two figures above. There is a boxplot for shorter runs on the left side and the boxplot for longer ones on the right side for all four displayed combinations of function and dimension on these figures.

For Ackley function and dimension = 10 in longer runs, GWO reaches very good values, they are very near optimum. Results of jSO are only a little worse here. The cooperation algorithm adopts the results of GWO in this case. In shorter runs, GWO reaches better results than jSO, and when both algorithms optimize together in the cooperation algorithm, the results are only a little worse than the results of GWO, but better than the results of jSO. For dimension = 30 in longer runs, GWO reaches again very small values (reached values of jSO are worse) and cooperation adopts them again. In shorter runs here, the situation is similar or a little better than for shorter runs in dimension = 10.

When we discuss the optimization process of Rastrigin function with tested algorithms, the situation is very similar to the previous case, in both tested dimensions for both lengths of runs.

For Rosenbrock function for both tested dimensions for shorter runs, GWO is better optimizer than jSO and cooperation is little better then GWO. But when algorithms have much more time (larger amount of function evaluations), jSO is better then GWO and cooperation reaches better results than GWO but not better than jSO reaches.

When we think about the Happycat function, the results of tests are very similar in both dimensions. It holds for both lengths of runs. We mean the results of comparison of algorithms. Moreover, it seems that the results of cooperation are a little better than the results of the jSO algorithm (which is here the better one of GWO and jSO algorithms) for shorter runs and lower dimension. For longer runs in both dimensions, jSO is better then GWO and cooperation reaches better results than GWO but a little worse results than jSO.

5. Conclusion

The new algorithm called cooperation for global optimization was proposed in this paper. It is based on two very effective optimizers, Grey wolf optimizer and one of many adaptive versions of Diferential evolution algorithm, the jSO algorithm.

We proposed to use both algorithms together for optimization. The idea of cooperation is very easy, the algorithms take turns performing the optimization process.

Four multi-modal functions and two levels of dimension were selected for our experimental comparison. The results of the made experimental comparison are promising.

In this paper, we have used only a basic scheme in which each algorithm has been used repeatedly 5-times for the cooperation algorithm. In future work, we plan to develop some more sophisticated schema for the change of controlling of optimization process by these two algorithms, probably based on the stagnation of search Ackley, D=30 Rastrigin, D=10 Rastrigin, D=30

Rosenbrock, D=10 Rosenbrock, D=30 Happycat, D=10 Happycat, D=30 [5] Brest J., Maučec M. S., Boškovič B.: Single Objec- [7] Brest J., Maučec M. S., Boškovič B.: iL-SHADE: Imtive Real-Parameter Optimization: Algorithm jSO. proved L-SHADE algorithm for single objective realIn IEEE Congress on Evolutionary Computation 2017. parameter optimization. In IEEE Congress on Evolu(2017) 1311–1318 tionary Computation 2016. (2016) 1188–1195 [6] Zhang J., Sanderson A. C.: JADE: Adaptive Difer- [8] Mirjalili S., Mirjalili S. M., Lewis A.: Grey Wolf Optiential Evolution With Optional External Archive. mizer. Advances in Engineering Software. 69 (2014) IEEE Transactions on Evolutionary Computation. 13 46–61 (2009) 945–958

ideally better than both original algorithms in most cases . [1] Wolpert

D. H.

, Macready , W. G.: No Free Lunch Theorems for Optimization . IEEE Transactions on Evo-

Acknowledgement: This work was supported by the lutionary Computation . 1 ( 1997 ) 67 - 82

project no . CZ.02.2.69/0 .0/0.0/18_054/0014696, "Devel- [2] Storn

, Price , K. : Diferential evolution - A Simple

Opava", co-funded by the European Union . Continuous Spaces. J. Global Optimization . 11 ( 1997 ) 341 - 359 [3] Tanabe

, Fukunaga , A. : Success-history based pa-

Acknowledgments rameter adaptation for diferential evolution . In IEEE Congress on Evolutionary Computation 2013 . ( 2013 )

This work was supported by the project no . 71 - 78

CZ.02.2.69/0 .0/0.0/18_054/0014696, "Development [4] Tanabe

, Fukunaga , A. : Improving the Search Per-

co-funded by the European Union . Reduction. In IEEE Congress on Evolutionary Computation 2014 . ( 2014 ) 1658 - 1665