Mutation Control in Neat for Customizable Neural Networks Galchenkov Oleg, Nevrev Alexander, Ph.D., Senior Lecturer Ph.D., Senior Lecturer Odesa National Polytechnic Odesa National Polytechnic University, Ukraine, Odesa University, Ukraine, Odesa o.n.galchenkov@gmail.com a.i.nevrev@gmail.com A modification of the NEAT algorithm is proposed through the use of a mechanism for changing the level of mutations. At the same time, both the weight coefficients and the number of neurons in the hidden layers and the connections between them change. For the task of modelling the XOR operator, a significant acceleration of the training of a neural network was obtained. Keywords: neural network, genetic algorithm, fitness function, weighting factors, mutation, selection In traditional approaches to the use of neural networks, the network topology is usually selected and fixed before it is trained [1]. And the training itself consists of adjusting weights to ensure the extremum of the quality function. To select the neural network topology that is most suitable for a particular task, they try to train neural networks with various topology options and ultimately choose the one that shows the best values of the quality function or requires the same amount of computation for the same values of the quality function. Obviously, such a path is extremely costly in the computational sense and does not provide the best version of the topology due to the very large number of possible options. Therefore, it seems promising to develop and study algorithms that simultaneously configure both weighting factors and the topology of the neural network itself. Various versions of training algorithms for neural networks with the simultaneous adjustment of both weights and network topology began to appear in the 90s of the last century [2-5] and continue to be developed to date [6-9]. The main issues encountered in the development of such algorithms are the following [6]: - how to most effectively present variants of the neural network topology to be able to select the most useful elements and filter out the rest, - how can a topological innovation be protected on the interval of the number of training iterations necessary to adjust its weight coefficients and manifest in the value of the quality function, - how to ensure minimal neural network topology without using special functions that measure the complexity of the current configuration. Among the algorithms aimed at solving these issues, one of the basic ones is the NEAT (NeuroEvolution of Augmenting Topologies) genetic algorithm [6]. All new modifications represent an improvement on this algorithm. As a rule, all new ideas can be used for this algorithm at the same time. Therefore, in this work, when developing a new modification of the NEAT algorithm, it seems appropriate to use it as a base one as well. The entire configuration of the neural network in the form of a linear representation of neurons and the connections between them is used as the genome in the NEAT algorithm. The initial conditions for the genome are set in the form of a minimal configuration of the neural network and its size increases as it learns. To control the expandability of the network, each element of the genome (neurons and the connections between them) is assigned an innovative number that allows you to take into account the moment this element appears in the genome and, accordingly, the number of iterations of the learning algorithm during which this element is part of the given genome. The presence of an innovative number allows us to differentiate elements of genomes (genes) depending on the history of their appearance in the genome. Genes are called matching if they have the same innovation numbers. If the numbers are different, then the genes are called disjoint if the number of one lies in the range of the number of innovations in Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). the genome of the other gene, or redundant if the number of one is outside this range. When a new genome is formed from matching genes, one of them is randomly taken from any parent, and redundant and disjoint genes are taken from the parent with a large value of the quality function (fitness function). Thus, the NEAT algorithm produces a crossover without using the function of assessing the complexity of the current topology and is aimed at ensuring the predominance of genes with a positive effect over random genes. Since the introduction of new structural elements in the short term usually worsens the importance of the fitness function, and a positive effect is achieved after appropriate weighting, the NEAT algorithm uses innovative numbers to form niches for genomes with structural innovations and having similar topologies. Moreover, over a given number of iterations, genomes compete with each other only within niches, and not with the entire set of genomes. This allows you to protect new topological elements while adjusting the weight coefficients of these elements. The network structure gradually increases as structural mutations occur, leading to a better value of the quality function. The aim of this work is the further development of the NEAT algorithm to ensure a higher learning speed and achieve the resulting structure of a neural network with a small number of neurons and the connections between them. The NEAT algorithm is based on the ideology of the genetic approach [10] and, accordingly, involves the following steps [6]: 1) Generation of an initial population of individuals; 2) The calculation for each individual of its suitability with the help of a fitness function (quality function); 3) Removal of weak individuals; 4) Conducting a crossover of the best individuals in order to produce a new population; 5) Carrying out mutations of descendants due to changes in weighting factors, adding/removing connections between neurons, adding/removing neurons; 6) Repeat steps 2-5 until the desired value of the quality function is reached. Elite selection is used to remove weak individuals. He assumes that there remains a fixed percentage of the best individuals, and the rest are discarded. The NEAT algorithm assumes three types of mutations. 1. Mutations of weights (Fig. 1) Fig. 1. Change in weights in existing relationships. 2. Removing/adding connections between neurons (Fig. 2). Fig. 2. Adding a connection between existing neurons. 3. Addition / removal of neurons (Fig. 3) Fig. 3. Adding a neuron to the network The variety of mutations used determines the complex nature of the movement of the learning algorithm on the surface of the quality function. The essence of the proposed modification of the NEAT algorithm is to add a mechanism for changing the intensity of mutations. It can be configured using the following four factors: 1. mutation_intensity - initial (and, subsequently, the current value of the intensity of mutations); 2. mutation_intensity_max - a maximum allowable value of intensity; 3. mutation_intensity_min - the minimum acceptable value of intensity, which is used when resetting the current mutation, when the network exceeded the previous value of the fitness function; 4. mutation_intensity_step - step of the mutation intensity, by which the current intensity increases if the network during mutations remains at the same value of the fitness function. The number of mutations that will be carried out on one individual will vary from 1 to the value of mutation_intesity, that is, the current intensity of mutations. If the network is stuck in the learning process, the mutation intensity will slowly increase, thereby increasing the number of mutations per generation in order to accelerate the growth of the network. As soon as the network reaches a structure of higher complexity, and the value of the fitness function begins to increase again, the intensity of mutations is reset to the minimum value, the speed of mutations decreases so that the network does not create random extra connections and neurons. We will verify the performance of the modified NEAT algorithm by the example of modelling the logical operation of “exclusive OR” (XOR) [2]. The truth table of the XOR operator is shown in table 1. Table 1 X1 X2 XOR 0 0 0 1 0 1 0 1 1 1 1 0 The function was used as a fitness function. f  f00  f01  f10  f11    1 where f ij  min ,50    ans  out  2   ij ij  ansij - the response of the neural network to the combination of ij at the inputs, outij is the correct answer for the XOR operator when combining ij to the entrance. Figure 4 shows the change in the structure of the neural network as it is trained by the modified NEAT algorithm and the corresponding values of the fitness function (fitness parameter). Fig. 4 - The structure of the neural network: a - the starting structure, b - for fitness = 150, c - fitness = 172, d - fitness = 200. From a comparison of the configurations of the neural network at different stages of training, it can be seen that weights are first set up, and when the increase in the value of the fitness function slows down significantly, new neurons and connections are added. After that, setting the weights again becomes a priority. To obtain averaged characteristics, the neural network learning process was run 10 times for the NEAT algorithm and the modified NEAT algorithm. The corresponding numbers of populations on which full network training was achieved in each of the implementations are shown in Figs. 5 and 6. Fig. 5 - The number of the population on which the network is fully trained in modelling the XOR operator for each of the implementations when learning by the NEAT algorithm. Fig. 6 - The number of the population on which the network has been fully trained in modelling the XOR operator for each of the implementations when training with the modified NEAT algorithm. A comparison of the figures shows that the modified algorithm achieves complete network learning for a smaller number of populations. Since the work of learning algorithms begins with random initial conditions and uses random number generators when performing crossover and mutation operations, the NEAT algorithm in some experiments also required a small number of populations. Figure 7 shows the learning curves averaged over 10 implementations for the NEAT algorithm and the modified NEAT algorithm when the neural network simulates the operation of the XOR operator. Fig. 7 - Comparison of the learning speed of the neural network when modelling the XOR operator with the NEAT algorithm and the modified NEAT algorithm A comparison of the learning curves shown in Figure 7 shows a significant advantage of the modified version of the algorithm. Usually, in practice, two processes are distinguished - training and the use of a neural network. We can train many times on a special powerful computer. But, as a rule, it will be necessary to use it on another calculator, which is either less powerful or loaded with other tasks. Therefore, in practice, you need to make a certain amount of training and take the minimum configuration to use. Based on this, it is possible to compare the average number of experiments with their generation of populations in each (more precisely, the total number of generations) that must be performed in order to obtain the minimum configuration. Figure 8 shows the average number of generations needed to find a solution depending on the intensity of mutations, and Figure 9 shows the average number of generations needed to find the maximum of the quality function and the minimum configuration depending on the intensity of mutations. Fig. 8 - The average number of generations required to find a solution depending on the intensity of mutations Fig. 9 - The average number of generations required to find a solution and the minimum configuration depending on the intensity of mutations Figures 8 and 9 show that with an increase in the number of mutations from 1 to 5, the average number of generations of new populations that need to be done to find the maximum of the quality function without paying attention to the size of the resulting neural network decreases by more than 7 times. Naturally, to achieve the minimum configuration of a neural network, it is necessary to produce an average of 3-4 times more generations. Figure 9 shows that with an increase in the number of mutations from 1 to 5, in order to find the minimum configuration, an average of 4 times less is the generation of populations. Conclusions Additional use of the mechanism for controlling the intensity of mutations in the NEAT algorithm can significantly increase the learning speed of the neural network in the task of modelling the XOR operator. This, in turn, leads to a decrease in the total amount of computation required to fully configure the neural network and achieve a minimum configuration. However, due to the very large search space when solving complex problems by the NEAT algorithm, further development of this algorithm is necessary for the direction of increasing the learning speed and parallelizing operations for implementation on multiprocessor computers. References 1. Goodfellow I., Bengio Y., Courville A. Deep Learning, MIT Press, 2016, http://www:deeplearningbook:org. 2. Angeline, P. J., Saunders, G. M., and Pollack, J. B. (1993). An evolutionary algorithm that constructs recurrent neural networks. IEEE Transactions on Neural Networks, 5:54–65. 3. Braun, H. and Weisbrod, J. (1993). Evolving feedforward neural networks. In Albrecht, R. F., Reeves, C. R., and Steele, N. C., editors, Proceedings of ANNGA93, International Conference on Artificial Neural Networks and Genetic Algorithms, pages 25–32, Springer-Verlag, Innsbruck. 4. Dasgupta, D. and McGregor, D. (1992). Designing application-specific neural networks using the structured genetic algorithm. In Whitley, D. and Schaffer, J. D., editors, Proceedings of the International Conference on Combinations of Genetic Algorithms and Neural Networks, pages 87–96, IEEE Press, Piscataway, New Jersey. 5. Opitz, D. W. and Shavlik, J. W. (1997). Connectionist theory refinement: Genetically searching the space of network topologies. Journal of Artificial Intelligence Research, 6:177–209. 6. Stanley K., Miikkulainen R., Evolving Neural Networks through Augmenting Topologies – 2002 by the Massachusetts Institute of Technology .-Evolutionary Computation 10(2): 99-127, http://nn.cs.utexas .edu/downloads/papers/stanley.ec02.pdf 7. Miconi, T. (2016).Neural networks with differentiable structure. Preprint at https://arxiv.org/abs/1606.06216 8. Salimans, T., Ho, J., Chen, X., Sidor, S. & Openai, I. S. (2017). Evolution strategies as a scalable alternative to reinforcement learning. Preprint at https://arxiv.org/abs/1703.03864 9. Bhushan, S. Hybrid approach to energy efficient clustering for heterogeneous wireless sensor network using biogeography based optimization and k-means / Bhushan S., Antoshchuk S., Teslenko P., Kuzmenko V., Wojcik W., Mamyrbaev O., Zhunissova U., // Przeglad Elektrotechniczny. 2019. Book: 95 Issue 4 РР. 138- 141 DOI: 10.15199/48.2019.04.24 10. Mitsuo Gen, Runwei Cheng. Genetic Algorithms and Engineering Optimization .- John Wiley & Sons, Inc.- 2000.-511P.