The Role And Impact of The Baldwin Effect on Genetic Algorithms Karolina K˛esik Institute of Mathematics Silesian University of Technology Kaszubska 23, 44-100 Gliwice, Poland Email: karola.ksk@gmail.com Abstract—The increasing influence of optimization methods chat-bots were heuristic can be used as an engine [15]. For on modern technologies means that not only new algorithms these type of apps, different notation are developed [3]. In are designed but old ones are improved. One of the classic [13], heuristic were used to extract brain tumor from MR optimization methods is the genetic algorithm inspired by the phenomenon of biological evolution. One of the modifications is images. Similarly in the area of smart technology optimization Baldwin’s approach as genetic assimilation. It is a process that has a significant role. In [2], it was used in wireless sensor creates permanent structures as an effect on the frequent use network, and in [16], modified technique has found application of specific organs. This paper presents four different algorithms in optimization of coverage in visible light communication in that have been used as a Baldwin effect in the classical genetic smart homes systems. algorithm. All techniques have been tested using 9 test functions and the obtained results are discussed. In this paper, the idea of improving genetic algorithm by the introduction of Baldwin effect is presented. I. I NTRODUCTION II. O PTIMIZATION PROBLEM FORMULATION Describing things and process in nature very often comes to It is quite often that some problems can be described by the mathematical description using equation or function. Espe- some function. And then, the smallest or largest value for a cially when there are some parameters that make it impossible given function is wanted. Such a task is called an optimization to determine what proportions should be maintained depending problem. Let’s assume that x is a point in n–dimension, and on other external factors. Moreover, even mathematical tools function will be described in the following way f (x) : Rn → are not able to simply and quickly determine what is the R, then minimization problem can be defined as solution for a more complex function or their dependencies. One of the best existing methods of finding a global Minimize f (x) extreme for multi-variable functions are algorithms inspired subject to g(x) ≥ 0 (1) by phenomena occurring in nature or behavior of a specific Li ≤ xi ≤ Ri i = 0, 1, . . . , n − 1, group of animals. It is visible on the developed research around the world and emerging publications. One of the most where g(·) is inequality constraint and each spatial components dynamically developing branches of optimization are heuristic of point x is in the range hLi , Ri i. algorithms, ie those that do not guarantee the correct solution III. G ENETIC APPROACH TO OPTIMIZATION in a finite time. An example of such a technique is the dragonfly algorithm [8] which is inspired by the static and Through many ages, people observe the changes in natural dynamic behavior of dragonfly’s swarm. Again in [12], the processes. Then, many times, they tried to use this knowledge authors described the mathematical model of behavior of polar in science by creating computer methods based on those bears which are forced to move on ice floe and hunt for seals notices. In 70s of 20th century, the man called John Holland in arctic conditions. created the Genetic Algorithm, which was also the first evolu- Not only the algorithms are developed but the applications tionary algorithm [4]. The inspiration of creating this technique that have complicated functions. One of such example is anal- was desire of making computer be able to solve problems ysis of samples taken from a fluorescence microscope. In [19], likewise it goes in the natural evolution process. The initial optimization algorithms were used in the detection of bacterial idea of Genetic Algorithm was to create a population which shapes at various stages of life. Again in [17], this idea contain some individuals and each of them had a unique binary was used for charge and discharge electronic vehicles using genetic code assigned to it. Set codes correspond to chromo- building energy management system. Another application area somes exist in organisms and that makes the elements of code is biometric security. In [10], voice samples were interpreted as equivalent to the genes. Subsequently, after many researches, two-dimensional images where important features to identify there were implemented some improvements occurred during the owner of the voice were sought. Other application is the noticed biological process of evolution. The observed operations start from the mechanism, which transforms the Copyright held by the author(s). binary codes similarly as crossing–over, according to the 6 Table I: Selected test function for described optimization problem. Name  v Function  Input domain x Global minimum u1 n 2 n u X ! X 1 Ackley f1 (x) = −20 −0.2t xi  − exp n n cos(2πxi ) h−32.8, 32.8i (0, . . . , 0) 0 i=1 i=1 n n x2i   X Y xi Griewank f2 (x) = − cos √ +1 h−600, 600i (0, . . . , 0) 0 i=1 4000 i=1 i Xn x2i − 10 cos(2πxi )  Rastrigin f3 (x) = 10n + h−5.12, 5.12i (0, . . . , 0) 0 i=1 n X p  Schwefel f4 (x) = 418.9829n − xi sin |xi | h−500, 500i (0, . . . , 0) 0 i=1 n X X i Rotated hyper-ellipsoid f5 (x) = x2j h−65.5, 65.5i (0, . . . , 0) 0 i=1 j=1 Xn Sum squares f6 (x) = ix2i h−5.12, 5.12i (0, . . . , 0) 0 i=1 !2 !4 Xn n X n X Zakharov f7 (x) = x2i + 0.5ixi + 0.5ixi h−5, 10i (0, . . . , 0) 0 i=1 i=1 i=1 n−1 X 2 Rosenbrock f8 (x) = 100 xi+1 − x2i + (xi − 1)2 h−5, 10i (0, . . . , 0) 0 i=1 Xn Styblinski-Tang f9 (x) = 12 x4i − 16x2i + 5xi h−5, 5i (−2.903534, . . . , −2.903534) −39.16599n i=1 process of inheritance. Then they based on mutation and existing generation P t . It can be described as selection. Holland discover that by adding the logic operators ( f (xt ) and using selection, the average value of population fitness can pr (xti ) = xti i . (2) increase. The most important part of that is using encoding and xti ∈ P t genetic operators to solve problems. Nowadays, in the age of After few sampling of this variable, it is possible to intro- new technology, it is obvious to use for that the potential of duce the distribution function of reproduction probability as advanced programming with high–performance computers. In i this paper, the version of the genetic algorithm operating on X numbers in the decimal notation was selected. Pr (xti ) = pr (xti ). (3) j=1 1) Genetic Algorithm (GA): The Genetic Algoritms is made In the next step there is a selection of individuals – if it is of few steps. The first one is initialization in which the initial reproduced or not. The factor that determines this choice is a population is created. That step is about generating the size random variable α from h0, 1i. The individual xi is reproduced of population, the encoding of chromosomes and the formula as long as the following formula for fitness function f (·) with the appropriate restrictions. The main assumption here is about setting the size of population, Pr (xti−1 ) < α ≤ Pr (xti ) (4) because it is the parameter that affects on accuracy of the algorithm. If the population is too small then the algorithm is fulfilled. Genetic operators, like mutation and crossover, are finishes calculation without reaching the correct solution. On responsible for creating a diverse population but they also take the other hand, when the number of individuals is too large, control of reproducing. then it can aggravate the computer and make the calculations The first genetic operator mentioned above, the mutation, last very long. Other elements of initial population are set is about replacing the chromosomes. In this process a random randomly. variable ξ belongs to h0, 1i is added to the chromosome Afterwards, there is made a selection of individuals. Some xt+1 i mutated = xti + ξ. (5) of them are passed to the next stage of calculations. That Then let the λ be the vector of random values and length process is called reproduction. It can be done in many ways. equivalent to the number of individuals in generation. If for Description one of them requires the use of probability that pm , which is the probability of mutation, is followed the one individual xi is chosen to new generation. The probability ineqality of setting this individual in next population depends on the λi < pm , (6) value of fitness function at exact point f (xi ). The higher value of fitness function results in a higher probability at this then the chromosome is mutated. point. The clue of this step is to create a random value pr The second of genetic operators is crossover. This process for further description of new population selected from the is about exchanging genetic material. Two chromosomes are 7 called the parents and that exchange takes place between them. instinct. It is very simple to give examples of this observation. As a result of crossover, two new individuals are generated If there is a population in the environment, then those individu- and attached to the population. Similarly to Eq. (5), taking als, which will learn the most needed peculiar abilities, those a random real ξ from h0, 1i allows to define the formula of will survive. And they will have bigger offspring. What is creating new individual as an average of parental chromosomes more, their offspring inherit genes with exceptional properties. xi and xi+1 After more researches on the Baldwin’s theory, it was noticed that not skills but only the ability to acquire some skills is xi t+1 t t t des1 = xi + ξ(xi+1 − xi )xides2 . (7) inherited. This can explain the language predispositions in Then whole new population has to be evaluated whether people. they fit to the conditions in environment. This process, called In practice, Baldwin effect is reflected in the value of fitness succession, is replacing the old generation with the new one. function because it is the only significant skill to survive. It is Replacing all individuals in population is only one of the visible in additional local search. ways to carry out succession. The other way is to change just A. Gradient descend a part of old generation. Here, there becomes a need to choose Analysis of the direction of gradient decrease is one of a method of selection chromosomes. Anyway, a parameter the classic optimization methods [14]. Assume that x = g from h0, 100i can describe an amount of individuals from (x0 , . . . , xn−1 ) is a start point. To find the direction, we need old generation that remained. Decision, which of individuals to calculate the negative gradient for each spatial coordinates should be replaced, can be taken on many different levels: according to • the ones that are similar to individuals in next population, ∂f (x0 , . . . , xn−1 ) • the ones that are the least adapted, −∇fxi = − . (8) ∂xi • the ones that are selected by succession • the ones that are chosen in random way. Using above calculation of −∇, the new coordinate is calculated as The method which let the best chromosomes survive is succes- xti = xti + λ(−∇fxti ), (9) sion. In that way, it doesn’t matter in which population some individuals appear – if they are ones of the best, then they where λ is a step and t means current iteration. After finding stay to the end of algorithm. In every iteration individuals the new value of point, method compare it with the previous are ordered by value of fitness function f (xi ). Then, some one using fitness function f (·). If new value has better of the worst are replaced with new ones. By using some adaptation, it replace the old one. kind of "back-up" generation, the best individuals are stored Algorithm 2 Gradient descent through all the algorithm. It ensures that better ones will not be replaced by worse ones. 1: Start, 2: Define fitness condition f (·), the step λ, t, Algorithm 1 Genetic algorithm 3: Take a staring point x, 1: Start, 4: t := 1, 2: Define all parameters - T , f (·), n, 5: while t ≤ T do 3: Generate the initial population, 6: Calculate x0 by Eq. (9), 4: t := 1; 7: if f (x0 ) ≤ f (x) then 5: while t ≤ T do 8: x = x0 , 6: Select the best individuals for reproduction, 9: end if 7: Apply crossover operation using Eq. (7), 10: t + +, 8: Apply mutation operation using Eq. (5), 11: end while 9: Replace the worst individuals with new one, 12: Return x, 10: t++ 13: Stop. 11: end while 12: Return x, B. Iterated local search 13: Stop. Iterated local search is one of the classic optimization technique called hill climbing [7]. The name is adequate to the operation of the algorithm, which for a given point x performs IV. BALDWIN EFFECT perturbation (i.e. random displacement) as After observing the evolution process, James Mark Baldwin x0 = x − ξ, (10) claimed that there exists an ability to acquire new behaviours during all life that depends on environment. Furthermore, the where ξ ∈ h0, 1i. It allows to escape from the local minimum. offspring derive those skills from their ancestors [1]. This Then local search in the neighborhood is done. For this mechanism is called the Baldwin Effect or also organic se- purpose, another method is used. If the obtained point is better lection. The driving force behind this phenomenon is survival in terms of the fitness function, it overwrites the initial one. 8 Algorithm 3 Iterated local search annealing process in thermodynamics, and more precisely how 1: Start, metals cool and anneal. It assumes that the temperature is 2: Define f (·), T , high at beginning of the process what gives high probability 3: Take a starting point x, of change. As the temperature decreases, the probability of 4: Local search at x, change is smaller. In practice, this means that the probability 5: t := 1, of adopting a worse solution is higher at high temperature and 6: while t ≤ T do smaller at lower. 7: Create x0 using Eq. (10), The algorithm uses a modified thermodynamic equation 8: Apply Gradient Descent staring in x0 (described in Alg. what can be defined as 2) to find x00 , δ 9: if f (x00 ) ≤ f (x) then P (E) ≈ e− T , (11) 10: x = x00 , 11: end if where δ is the difference in the quality of a given point x and 12: t + +, new, random one x0 in relation to the fitness condition 13: end while 14: Return x, δ = f (x0 ) − f (x). (12) 15: Stop. A new solution is adopted when the following condition is satisfied δ C. Variable neighborhood search γ < e− T , (13) Variable neighborhood search is another hill climbing method [9]. The algorithm assumes the creation of a set, where γ ∈ h0, 1i. Another important aspect is temperature then the use of another local search technique for each reducing calculated as representative in this set and comparison with a given point. If the new point proves to be better (using fitness function), Tk+1 = r · Tk , (14) the original one is overwritten by the neighbor. where T means the temperature value and k is the k-th Algorithm 4 Variable neighborhood search iteration and r is constant value. 1: Start, Algorithm 5 Simulated annealing 2: Define a set Nk , kmax , 1: Start, 3: Take a starting point x, 2: Define the start temperature T , fitness condition f , number 4: k := 1, 5: t := 1, of iteration t and parameter r, 3: Take a staring point x, 6: while t ≤ T do 4: k:=0, 7: while k ≤ kmax do 5: while k ≤ t do 8: Generate x0 ∈ Nk in random way, 9: Apply Gradient Descent staring in x0 (described in 6: i := 0, 00 Alg. 2) to find x , 7: for i to k do 10: if f (x00 ) ≤ f (x) then 8: Create random solution x0 , 11: x = x00 , 9: Calculate the difference δ according to Eq. (12), 12: k := 1, 10: if δ < 0 then 13: else 11: x = x0 14: k + +, 12: else 15: end if 13: Generate γ, 16: end while 14: if equation (13) is satisfied then 17: t + +, 15: x = x0 , 18: end while 16: end if 19: Return x, 17: end if 20: Stop. 18: end for 19: Reduce the temperature by Eq. (14), 20: k + +, D. Simulated annealing 21: end while 22: Return x, Simulated annealing is a probabilistic technique for opti- 23: Stop. mization purpose. It was described for the first time in [6] by Kirkpatrick and others. The technique is a model of the 9 Table II: Obtained average results. GA with Baldwin effect GA Gradient descent Iterated local search Variable neighborhood search Simulated annealing f1 2.19932 · 10−6 2.13421 · 10−6 2.21328 · 10−6 2.120981 · 10−6 1.912247 · 10−6 f2 1.18919 · 10−4 1.23193 · 10−4 1.01239 · 10−4 1.129815 · 10−3 1.01923 · 10−4 f3 −0.28311 · 10−3 −0.29172 · 10−4 0.28872 · 10−4 0.47623cdot10−4 −0.18926 · 10−4 f4 0.19172 · 10−3 0.22898 · 10−3 0.19863 · 10−3 0.199859 · 10−3 −0.12172 · 10−3 f5 −2.12812 · 10−12 1.907319 · 10−12 2.98221 · 10−11 −2.00193 · 10−12 −2.00012 · 10−12 f6 3.95276 · 10−14 −3.07329 · 10−13 3.55311 · 10−14 3.624782 · 10−14 −4.21424 · 10−15 f7 2.19272 · 10−4 1.78139 · 10−5 2.00712 · 10−5 1.759836 · 10−5 1.37127 · 10−5 f8 −1.78342 · 10−3 −1.81293 · 10−3 2.19813 · 10−3 3.1128 · 10−3 1.47516 · 10−3 f9 −195.7821 −195.7802 −195.7652 −195.7541 −195.8148 Table III: Obtained average errors. GA with Baldwin effect GA Gradient descent Iterated local search Variable neighborhood search Simulated annealing f1 −2.19932 · 10−06 −2.13421 · 10−06 −2.21328 · 10−06 −2.12098 · 10−06 −1.91247 · 10−06 f2 −0.000118919 −0.000123193 −0.000101239 −0.001129815 −0.000101923 f3 0.00028311 0.000029172 −0.000028872 0.000047623 0.000018926 f4 −0.00019172 −0.00022898 −0.00019863 −0.000199859 0.00012172 f5 2.12812 · 10−12 −1.90732 · 10−12 −2.98221 · 10−11 2.00193 · 10−12 2.00012 · 10−12 f6 −3.95276 · 10−14 3.07329 · 10−13 −3.55311 · 10−14 −3.62478 · 10−14 4.21424 · 10−15 f7 −0.000219272 −1.78139 · 10−05 −2.00712 · 10−05 −1.75984 · 10−05 −1.37127 · 10−05 f8 0.00178342 0.00181293 −0.00219813 −0.0031128 −0.00147516 f9 −0.04785 −0.04975 −0.06475 −0.07585 −0.01515 V. E XPERIMENTS Baldwin effect was performed only for 20 iteration what is All techniques were implemented and tested (Baldwin effect not a large number. for 20 steps, GA – 10000 iterations and 100 individuals) using Any modification that aims to increase the precision of all nine test functions. Each algorithm was made 100 times to the algorithm for the selected point in a given place (only get 100 solutions. The obtained points were used to calculate local space-searching was considered in this paper) is worth the average value of the fitness function as follows implementing in the case of creating hybrid methods and when 100 the accuracy is significant. 1 X f (xi ), (15) R EFERENCES 100 i=1 [1] J. M. Baldwin. A new factor in evolution. The american naturalist, and then the error for each function was calculated as 30(354):441–451, 1896. 100 [2] S. Bouarafa, R. Saadane, and M. D. Rahmani. Inspired from ants colony: 1 X Smart routing algorithm of wireless sensor network. Information, f (xideal ) − f (xi ) . (16) 100 i=1 9(1):23, 2018. [3] S. Chodarev and J. Porubän. Development of custom notation for All obtained results are presented in Tab. II and III. In XML-based language: A model-driven approach. Computer Science and Information Systems (ComSIS), 14(3), 2017. almost every modification interpreted as Baldwin effect, the [4] J. H. Holland. Genetic algorithms and the optimal allocation of trials. results were corrected what increased accuracy. Unfortunately, SIAM Journal on Computing, 2(2):88–105, 1973. the improvement took place at the expense of increasing [5] T. Kapuściński, R. K. Nowicki, and C. Napoli. Comparison of effective- the time by adding additional calculations made at the local ness of multi-objective genetic algorithms in optimization of invertible s-boxes. In International Conference on Artificial Intelligence and Soft search. The best solution turned out to be the solution gained Computing, pages 466–476. Springer, 2017. from genetic algorithm with simulated annealing in all cases. [6] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by Iterated and variable neighborhood returned improved re- simulated annealing. science, 220(4598):671–680, 1983. [7] H. R. Lourenço, O. C. Martin, and T. Stützle. Iterated local search. In sults of the original algorithm, but in comparison to other used Handbook of metaheuristics, pages 320–353. Springer, 2003. methods, they are the least favorable due to the number of [8] S. Mirjalili. Dragonfly algorithm: a new meta-heuristic optimization calculations. The reason is a structure that uses an additional technique for solving single-objective, discrete, and multi-objective problems. Neural Computing and Applications, 27(4):1053–1073, 2016. algorithm which doubles the number of operations. [9] N. Mladenović and P. Hansen. Variable neighborhood search. Computers & operations research, 24(11):1097–1100, 1997. VI. C ONCLUSIONS [10] D. Połap. Neuro-heuristic voice recognition. In Computer Science and Baldwin effect can be interpreted as correction for obtained Information Systems (FedCSIS), 2016 Federated Conference on, pages 487–490. IEEE, 2016. results in each iteration. The obtained results showed the supe- [11] D. Połap and M. Woźniak. Detection of important features from images riority of simulated annealing over other tested methods. In the using heuristic approach. In International Conference on Information case of iterated or neighborhood technique, the results obtained and Software Technologies, pages 432–441. Springer, 2017. [12] D. Połap and M. Wozniak. Polar bear optimization algorithm: Meta- were better than the original version, but the execution time heuristic with fast population movement and dynamic birth and death has increased quite significantly. It is worth noting that the mechanism. Symmetry, 9(10):203, 2017. 10 [13] V. Rajinikanth, S. C. Satapathy, S. L. Fernandes, and S. Nachiappan. Entropy based segmentation of tumor from brain mr images–a study with teaching learning based optimization. Pattern Recognition Letters, 94:87–95, 2017. [14] S. Ruder. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, 2016. [15] A. Shaikh, G. Phalke, P. Patil, S. Bhosale, and J. Raghatwan. A survey on chatbot conversational systems. International Journal of Engineering Science, 3117, 2016. [16] G. Sun, Y. Liu, M. Yang, A. Wang, S. Liang, and Y. Zhang. Coverage optimization of vlc in smart homes based on improved cuckoo search algorithm. Computer Networks, 116:63–78, 2017. [17] S. Umetani, Y. Fukushima, and H. Morita. A linear programming based heuristic algorithm for charge and discharge scheduling of electric vehicles in a building energy management system. Omega, 67:115–122, 2017. [18] A. Venckauskas, A. Karpavicius, R. Damaševičius, R. Marcinkevičius, J. Kapočiūte-Dzikiené, and C. Napoli. Open class authorship attribution of lithuanian internet comments using one-class classifier. In Federated Conference on Computer Science and Information Systems (FedCSIS), pages 373–382. IEEE, 2017. [19] M. Woźniak, D. Połap, L. Kośmider, and T. Cłapa. Automated fluorescence microscopy image analysis of pseudomonas aeruginosa bacteria in alive and dead stadium. Engineering Applications of Artificial Intelligence, 67:100–110, 2018. [20] M. Woźniak, D. Połap, C. Napoli, and E. Tramontana. Application of bio-inspired methods in distributed gaming systems. Information Technology And Control, 46(1):150–164. [21] M. Wózniak, D. Połap, R. K. Nowicki, C. Napoli, G. Pappalardo, and E. Tramontana. Novel approach toward medical signals classifier. In Neural Networks (IJCNN), 2015 International Joint Conference on, pages 1–7. IEEE, 2015. [22] M. Wróbel, J. T. Starczewski, and C. Napoli. Handwriting recognition with extraction of letter fragments. In International Conference on Artificial Intelligence and Soft Computing, pages 183–192. Springer, 2017. 11