=Paper=
{{Paper
|id=Vol-3312/paper13
|storemode=property
|title=Deep Learning of Neural Networks Using Genetic Algorithms
|pdfUrl=https://ceur-ws.org/Vol-3312/paper13.pdf
|volume=Vol-3312
|authors=Serhii Lienkov,Sergey Shvorov,Oleksandr Sieliukov,Igor Tolok,Nataliia Lytvynenko,Taras Davydenko
|dblpUrl=https://dblp.org/rec/conf/momlet/LienkovSSTLD22
}}
==Deep Learning of Neural Networks Using Genetic Algorithms==
<pdf width="1500px">https://ceur-ws.org/Vol-3312/paper13.pdf</pdf>
<pre>
Deep Learning of Neural Networks Using Genetic Algorithms
Serhii Lienkov1, Sergey Shvorov2, Oleksandr Sieliukov3, Igor Tolok4, Nataliia Lytvynenko5
and Taras Davydenko6
1,4,5
     Military Institute of Taras Shevchenko National University of Kyiv, 81 Lomonosova Str., Kyiv, 03189,
Ukraine
2,6
    National University of Life and Environmental Sciences, 15 Heroiv Oborony Str., Kyiv, 03041, Ukraine
3
  Kyiv National University of Construction and Architecture, 31 Povitroflotskyi Av., Kyiv, 03037, Ukraine

                 Abstract
                 The technologies of artificial intelligence (AI) are aimed at creating a "thinking machine",
                 that is, a computer system with human-like intelligence. One of the current directions of
                 intellectualization is the use of neural networks with the implementation of their deep learning.
                 The paper analyzes modern approaches to learning neural networks and investigates the
                 possibility of using genetic algorithms to solve the problems of deep learning of neural
                 networks. The purpose of the paper is to develop the scientific and methodological foundations
                 of learning neural networks using genetic algorithms. To achieve the goal, the following tasks
                 were solved: the justification of the approach to learning neural networks using genetic
                 algorithms was carried out and the task of optimizing the learning of neural networks using a
                 genetic algorithm was solved using the example of forecasting the time series of the
                 environmental temperature by the method of shortest descent. A biotechnical complex exposed
                 to external disturbances (external temperature) was chosen as the object on that relevant
                 research was conducted.

                 Keywords 1
                 Genetic algorithm, genes, chromosomes, optimal solution, deep learning, search domain,
                 evolution time.

1. Introduction
    As you know, an artificial neural network (ANN) is a set of interconnected neurons. The transfer
(activation) functions of all neurons in the network are fixed, and the weights are parameters of the
network and can change. Some neuron inputs are labeled as external inputs of the network, and some
outputs are labeled as external outputs of the network [1]. In general, applying any numbers to the
network inputs determines some set of numbers at the network outputs.
    Thus, the work of the neural network is to transform the input vector X into the output vector Y, and
this transformation is given by the weights of the network.
    Almost any problem can be reduced to a problem solved by a neural network. At the same time, the
construction of the ANN is solved in two stages:
    1) choosing the type (architecture) of the network;
    2) selection of weights (training) of the network.
    At the first stage, the following should be determined:
    - what neurons we want to use (number of inputs, transfer functions);
    - how they should be connected to each other;
    - what to take as network inputs and outputs.


MoMLeT+DS 2022: 4thInternational Workshop on Modern Machine Learning Technologies and Data Science, November, 25-26, 2022,
Leiden-Lviv, The Netherlands-Ukraine
EMAIL: lenkov_s@ukr.net (S. Lienkov); sosdoc@i.ua (S. Shvorov); selukov@3d.ua (O. Sieliukov); igortolok72@gmail.com (I. Tolok);
n123n@ukr.net (N. Lytvynenko); dtr55@ukr.net (T. Davydenko)
ORCID: 0000-0001-7689-239Х (S. Lienkov); 0000-0003-3358-1297 (S. Shvorov); 0000-0001-7979-3434 (O. Sieliukov); 0000-0001-6309-
9608 (I. Tolok); 0000-0002-2203-2746 (N. Lytvynenko); 0000-0003-0277-6892 (T. Davydenko)
            ©️ 2022 Copyright for this paper by its authors.
            Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
            CEUR Workshop Proceedings (CEUR-WS.org)
    At first glance, this problem seems unsolvable, but it isn’t necessary to invent a neural network
"from scratch" - there are dozens of different neural network architectures, and the effectiveness of
many of them has been proven mathematically [2].
    At the second stage, the most difficult task is solved – it’s necessary to "teach" the selected network,
i.e. choose such values of its weights, so that the network works as required. In those neural networks
used in practice, the number of weights can be several tens of thousands, so learning is a complex
process. For many architectures, special learning algorithms have been developed that allow to adjust
the weights of the network in a certain way. However, for complex ANN architectures, solving this
problem requires further research. Solving the problem of deep learning of neural networks is especially
relevant.

2. Problem Formulation
   As shown in [1,3], depending on the functions performed by neurons in the network, three types can
be distinguished:
   1) input neurons are neurons that receive an input vector encoding an input action or an image of
the external environment; they usually don’t carry out computational procedures, information is
transferred from the input to the output of the neuron by changing its activation;
   2) output neurons are neurons whose output values represent the output of the network;
   3) intermediate neurons are neurons that form the basis of artificial neural networks.
   In most neural models, the type of neuron is related to its placement in the network. If a neuron has
only output connections, then it’s an input neuron, and on the contrary, it’s an output neuron.
   However, there may be a case where the output of a topologically internal neuron is considered as
part of the network output. In the process of functioning (evolution of the state) of the network, the
input vector is transformed into the output vector, i.e., some processing of information is carried out.
   The process of the neural network (NN) functioning and the set of actions it can perform mainly
depends on the values of synaptic connections. That’s why, having determined the structure of the
network corresponding to the selected problem area, the optimization of the weighting coefficients in
real time is an urgent task [2,4]. Its ability to function qualitatively and adequately depends on how
effectively the NN training will be performed.
   The result of adding 𝑟 is:
                                                       n

                                                  r = ∑ Vi ∙ xi + f                                     (1)
                                                      i=1
where: 𝑟 - result of addition;
𝑉𝑖 – synapse weight (𝑖 = 1, … , 𝑛); 𝑓 – displacement value;
𝑥𝑖 – component of the input vector (input signal) (𝑖 = 1, … , 𝑛);
𝑛 - number of neuron inputs.
    The experiments on learning neural networks have shown that known methods of local and global
optimization (gradient, stochastic, Newton, Hessian, etc.) require a significant number of learning steps,
are sensitive to the accuracy of calculations, require a significant number of additional variables,
therefore, the search and development of new methods is an urgent task learning of neural networks [5-
7].
    Therefore, there is a need to use such approaches, that wouldn’t have the mentioned disadvantages.
The genetic algorithm (GA) stands out among optimization mathematical devices in the context of the
given task.
    The idea of GA was expressed by J. Holland in the late 1960s and early 1970s.
    The basic (classical, elementary or simple) genetic algorithm consists of the following steps:
    1) selection of the initial population of chromosomes;
    2) assessment of the fitness of chromosomes in the population;
    3) checking the condition of stopping the algorithm;
    4) selection of chromosomes;
    5) application of genetic operators;
    6) formation of a new population;
    7) selection of the "best" chromosome.
    The formation of the initial population consists in the random selection of a given number of
chromosomes (individuals), that are represented by binary sequences of a fixed length (that is, the
alleles of all genes in the chromosome are equal to 0 or 1).
    The assessment of chromosome fitness in a population consists in calculating the fitness function
for each chromosome of this population. The greater the value of this function, the higher the "quality"
of the chromosome. The form of the fitness function depends on the nature of the problem to be solved.
It’s assumed that the fitness function always takes non-negative values and, in addition, that this
function must be maximized to solve the optimization problem. If the original form of the fitness
function doesn’t satisfy these conditions, then an appropriate transformation is performed (for example,
the function minimization problem can be easily reduced to a maximization problem).
    The determination of the stopping condition of the genetic algorithm depends on its specific
application. In optimization tasks, if the maximum (or minimum) value of the fitness function is known,
then the algorithm can stop after reaching the expected optimal value, possibly with a given accuracy
[6-8]. Stopping of the algorithm can also happen if its execution doesn’t lead to an improvement of the
already achieved value. The algorithm can be stopped after a certain execution time or after performing
a given number of iterations. If the stopping condition is met, the transitionto the final stage of selecting
the "best" chromosome is made. Otherwise, the selection is performed in the next step.
    The chromosome selection consists in choosing (based on the values of the fitness function
calculated at the second stage) those chromosomes that will participate in the creation of offspring for
the next population, that is, for the next generation. Such a selection is made according to the principle
of natural selection, according to that chromosomes with the highest values of the fitness function have
the greatest chance of participating in the creation of new individuals. There are different methods of
selection. The most popular is the so-called roulette method, that got its name by analogy with the well-
known gambling game.
    As a result of the selection process, a parent population (parental pool) is created with a number 𝑁
equal to the number of the current population. The application of genetic operators to chromosomes
selected by selection leads to the formation of a new population of descendants from the parent
population created in the previous step.
    Application of genetic operators. In the classic genetic algorithm, two basic genetic operators are
used: the crossover operator and the mutation operator. However, it should be noted that the mutation
operator plays a secondary role compared to the crossover operator. Since crossover in the classic
genetic algorithm occurs almost always, while mutation is quite rare.
    Formation of a new population. Chromosomes obtained as a result of the application of genetic
operators to the chromosomes of the temporary parent population are included in the composition of
the new population. It becomes the current population for this iteration of the genetic algorithm. At each
subsequent iteration, the values of the fitness function are calculated for all chromosomes of this
population, after that the condition for stopping the algorithm is checked and either the result is fixed
in the form of the “best” chromosome (that has the largest value of the fitness function), or the transition
is made to the next step of the genetic algorithm, i.e. to selection. In the classical genetic algorithm, the
entire previous population of chromosomes is replaced by a new population of descendants having the
same number.
    In the classical genetic algorithm, only the binary coding method is used: selection by the "roulette
wheel" method and point crossing (with one crossing point). To increase the efficiency of its work,
many modifications of the basic classical genetic algorithm have been created, that are associated with
the use of other selection methods, with the modification of genetic operators (first of all, the crossover
operator), with the transformation of the fitness function, as well as with other ways of encoding the
parameters of the problem in the form chromosomes.
    These algorithms simulated evolutionary processes in the generations of such chromosomes. They
implemented mechanisms of selection and reproduction similar to those used in natural evolution. Just
as in nature, genetic algorithms searched for "good" chromosomes without using any information about
the nature of the problem being solved. It was needed some estimate of each chromosome reflecting its
fitness. The mechanism of selection consists in choosing chromosomes with the highest score (that is,
the most adapted), that reproduce more often than individuals with a lower score (worse
  adapted). The reproduction means the creation of new chromosomes as a result of the recombination of
  the genes of the parental chromosomes. The recombination is a process that results in new combinations
  of genes. Two operations are used for this: crossing over, that allows creating two completely new
  offspring chromosomes by combining the genetic material of a pair of parents, and mutation, that can
  cause changes in individual chromosomes.
      Genetic algorithms use a number of terms borrowed from genetics, primarily genes and
  chromosomes, as well as population, individual, allele, genotype, phenotype.
      Genetic algorithms are used in software development, in artificial intelligence systems, optimization,
  artificial neural networks and in other fields of knowledge. It should be noted that with their help
  problems are solved for that only neural networks were previously used. In this case, the genetic
  algorithms act simply as an alternative method independent of neural networks, designed to solve the
  same problem. The genetic algorithms are often used in conjunction with neural networks. They can
  support neural or jointly interact within the framework of a hybrid system designed to solve a specific
  task. The genetic algorithms are also used in conjunction with fuzzy systems.
      The genetic algorithm is a method that reflects the natural evolution of problem solving methods,
  and primarily optimization problems. The genetic algorithms are search procedures based on
  mechanisms of natural selection and heredity. They use the evolutionary principle of survival of the
  fittest. Genetic algorithms differ from traditional optimization methods in the following main
  properties:
      In terms of the speed of determining the optimal value of the objective function, the genetic
  algorithms are several orders of magnitude ahead of random search. However, the genetic algorithms
  aren’t the only way to solve optimization problems. In addition to it, there are two main approaches to
  solving such problems - exhaustive and local-gradient, each of that has its advantages and
  disadvantages.
      The iterative method is the easiest to program. To search for the optimal solution, it’s necessary to
  calculate consistently the value of the objective function at all possible points, remembering the
  maximum (or minimum) of them. The disadvantage of the method is the high computational
  complexity, however, if it’s possible to go through all options in a reasonable time, then the solution
  found is optimal.
      The second approach is based on the gradient descent method. First, some random parameter values
  are chosen, and then these values are gradually changed, achieving the highest growth rate of the
  objective function. When a local maximum (minimum) is reached, this method stops, so additional
  measures are required to find the global optimum.
      The gradient methods work quickly, but don’t guarantee the optimality of the solution found. They
  are ideal for solving unimodal problems, where the objective function has a single local optimum
  (global). The practical problems, as a rule, are multimodal and multidimensional, for them there are
  no universal methods that allow to find quickly absolutely accurate solutions. By combining the
  screening and gradient methods, it can to get approximate solutions, the accuracy of that will increase
  as the calculation time increases.
      The paper [9] shows the main differences between GA and standard optimization algorithms:
       the search for a suboptimal solution, based on the optimization of randomly given set of solutions,
  rather than one solution, that allows simultaneous analysis of several ways of approaching the
  extremum; evaluation of such solutions at each step allows synthesizing new solutions on the basis of
  old ones, i.e. evolutionary development of optimal solutions takes place;
       the solutions are considered as some coded structures, and not as a set of parameters, that allows
  in some cases to significantly reduce the time of data transformation, i.e. to increase the speed of finding
  optimal solutions;
       to assess the "suitability" of a decision for further evolutionary development, along with the use
  of the objective function, the "Rules of Survival" are additionally modeled, that increase the diversity
  of the set of decisions and determine the evolutionary development;
       the initialization, transformation and other types of decision operations use probabilistic rather
  than deterministic rules that introduce elements of randomness into the genetic search; therebysolving
  the problem of leaving local optima;
       there is no need to calculate the derivatives of the target function (as in gradient methods) or the
  matrix of derivatives of the second order (as in quasi-Newtonian methods);
       non-critical to the number of components of the admissible solution vector.
     The authors of [4-6] consider the issue of operator adaptation in evolutionary computations and its
application to optimize the structure of neural networks, effective multi-objective search for neural
architecture using Lamarck evolution, such works indicate the need for further research, and especially, in
the direction of the application of genetic algorithms for learning neural networks.
      The purpose of the paper is to develop the scientific and methodological foundations of learning
  neural networks using genetic algorithms.
      To achieve the goal, the following tasks were solved:
      1. The justification of the approach to learning neural networks using genetic algorithms was carried
  out;
      2. The optimization problems of learning using the genetic algorithm on the example of forecasting
  the time series of the temperature of the natural environment by the method of shortest descent are
  solved.

  3. Research materials and methodology
      In the classical formulation, the task of learning neural network is considered as the task of finding
  the minimum of the learning error, that depends on the parameters of the network [7,8]. The quality of
  learning directly affects the prediction capabilities of the neural network, and therefore the accuracyof
  the problems being solved. Taking into account the above, it’s possible to determine the network
  structure that corresponds to the chosen problem. The optimization of weighting factors in real time is
  relevant. The adequacy of its functioning depends on this.
      The main functional purpose of an artificial neural network is the transformation of input signals
  (some scattered information about the external environment) into output signals (concepts about the
  external environment). Based on (1), the neural network in this case is represented as some
  multidimensional function 𝐹: 𝑋 → 𝑌.
      If the set of weight coefficients 𝑊𝑖 of the input signals of the neurons of the network isn’t ordered,
  then the function 𝐹 implemented by the network is arbitrary. The set of all weights of all neurons
  corresponds to the vector 𝑊. The set of vectors 𝑊 forms the state space of the neural network. Let us
  correspond to the initial state of the network with some arbitrary vector 𝑊0. Then the trained neural
  network corresponds to 𝑊∗, i.e. such a state in that the one-valued mapping is realized 𝐹: 𝑋 → 𝑌. In this
  case, the task of learning NN is formally reduced to the task of transition from some initial stateof the
  network corresponding to 𝑊0, to the final state corresponding to 𝑊∗.
      When learning a neural network, the problem of error minimization is set
                                                          𝑝

                                               𝛼(𝑊) = ∑(𝑧𝑖 − 𝑏𝑖)                                         (2)
                                                         𝑖=1

  where 𝑧𝑖 - the value of the 𝑗th output of the neural network, is the known value of the 𝑗th output;
  𝑝 - the number of neurons in the output layer.

      If the network doesn’t make errors, then 𝛼 = 0, that is, the goal of learning neural network is the
  task of finding the minimum of the error function (2) in the state space 𝑊.
      In order to increase the reliability of decisions made on the basis of a neural network, it’s necessary
  to investigate alternative optimization algorithms that allow finding the global extremum. Therefore,
  there is a need to use such approaches, that wouldn’t have the indicated disadvantages. The considered
  genetic optimization algorithms are the most promising in this regard [7-8].
      To perform the optimization procedure using the genetic algorithm, it’s necessary:
      1) to choose a representation of optimization parameters in the form of a certain data format: line,
  vector, table, array, etc.;
      2) to develop or choose from a set of genetic operators those that best take into account the
  features of the search space;
      3) to determine the size of the initial population;
      4) to develop a technique for using genetic operators;
      5)to set the fitness function (the objective function by that variants are selected in the population);
      6) to develop a method of selection’s variants to a new population;
      7) to set the criterion for stopping the evolutionary process.
    To minimize the learning error of neural networks based on the genetic algorithm, each variant of
the vector of weighting coefficients 𝑊 is matched with some chromosome, presented in the form of a
bit string. In the selection process, a directed search is made for chromosomes that provide the
extremum of a given objective function, as the neural network learning error function 𝛼 is used in the
neural network training procedure.
    The concept of learning neural network when using a genetic algorithm, in contrast to traditional
learning methods, has a different meaning: learning here consists in applying genetic operators to the
genotype of the vector 𝑊𝑖, i.e. to the chromosome, and the training sample serves to calculate the
learning error 𝛼 of the neural network with specific values of the weighting coefficients 𝑊𝑖.
    Thus, the procedure for optimizing the learning process of neural network using genetic algorithm
is also iterative and includes the stages of synthesis of new chromosomes and their selection into the
new population.
    The scheme of the genetic algorithm of the learning neural network procedure is shown in Fig. 1.


                                                   START


                                      The selection of the initial population
                                                 of chromosomes


                                       The calculation of the value of the
                                           weighting coefficients


     The creation of a new            The assessment of the membership
   population by applying         function (fitness of chromosomes in the
     genetic operators                           population)


                No
                                                Is the condition
                                                    ended?


                                                             Yes

                                                   The «best»
                                               combination of
                                               weighting factors


                                                      END
Figure 1: Scheme of the genetic algorithm of the learning neural network procedure
    The process continues until an optimal solution or a given number of generations is obtained. At the
same time, each subsequent population must be better than the previous one. The solution to this
problem is a chromosome with the minimum value of the membership function, that determines the
optimal vector of weighting coefficients 𝑊𝑖, while the learning error 𝛼 becomes less than the given
value 𝛿. If the optimal solution isn’t found, and the given number of generations is reached, then the
learning procedure ends with the selection of an elite chromosome in one or more generations.
    Depending on the type of genetic operators and selection schemes used, different genetic algorithms
can be constructed, each of them will be effective in terms of convergence speed and the best approach
to the extremum when solving real problems.
    When solving the problem of forecasting the time series of the temperature of the natural
environment, an appropriate neural networks were synthesized, where the input and output values are
the temperature values.
    As a result of solving the optimization problem by the gradient method of the shortest descent, the
best NNs were selected: the radial basis function (errors: training - 2.617 0С, control - 2.617 0С, test -
2.06 0С), the linear with two neurons in the input layer (errors: training - 0.103 0С, control - 0.086 0С,
test - 0.097 0С), the linear with three neurons in the input layer (errors: training - 0.103 0С, control -
0.086 0С, test - 0.096 0С), the multilayer perceptron with five neurons in the hidden layer (errors:
training - 0.077 0С, control - 0.068 0С, test - 0.074 0С), the multilayer perceptron with two neurons in
the hidden layer (errors: training - 0.073 0С, control - 0.065 0С, test - 0.07 0С).
    To implement the learning algorithm of the neural network with the help of genetic algorithms, it
will use the network with the smallest errors - a multilayer perceptron with two neurons in the hidden
layer (Fig. 2).


Figure 2: Architecture of NN forecasting of temperature time series: multilayer perceptron with two
neurons in the hidden layer (MLP 2)

   The parameters of the solved problem are the weighting coefficients 𝑊, that is, the problem will
have 17 parameters, and the set of these parameters determines the point of the search space, and
accordingly, the possible solution.
   We assume that the solution search area 𝐷 is a finite set of solutions, in that each admissible solution
𝑊 ∈ 𝐷 is an 𝑛-dimensional vector 𝑊   ̅ = 𝑊1 , 𝑊2 , … , 𝑊𝑛 ), where 𝑛 = 17. Suppose that each component
𝑊𝑖, 𝑖 = 1, 𝑛 of the vector 𝑊 ∈ 𝐷 can take values from 0 to 1with a step of 0.004, then the
value of the weighting coefficient 𝑊𝑖 is encoded using a non-negative integer 𝑥𝑖 ∈ [0, 𝐾𝑗], where 𝑗 =
0,250 𝐾𝑗 is the number of possible discrete values of the 𝑖-th variable in the search area 𝐷 [5-7]. To
minimize the learning error of the neural network based on the genetic algorithm, we will match each
variant of the vector of weighting coefficients with a chromosome, presented in the form of a bit string.
    That is, let's match each vector 𝑊𝑖 with a vector 𝑥𝑖 , for the representation of that in the binary code,
it’s necessary to determine the maximum number of binary symbols 𝑔 , that is sufficient to represent
any value 𝑥𝑖 from the range of its permissible values 𝑥𝑖 ∈ [0, 𝐾𝑗]. The value 𝑔 must meet the
requirement 𝑘 ≤ 2𝑔 , where 𝑘 = 251 is the number of possible discrete values 𝑥𝑖 of the variable, then 𝑔
= 8 (Fig.3).


Figure 3: Symbolic representation of fixed variable value 𝑥𝑖 in binary code

   Where 𝑔𝑚 - the binary symbol 0 or 1 (𝑚 =1, 𝑔),; ℎ(𝑥𝑖 ) - the symbolic record of a variable 𝑥𝑖 . To
present an admissible solution 𝑥 ∈ 𝐷, it’s necessary to combine the symbol records of code sequences
describing all 𝑛 components of the vector 𝑥̅ = (𝑥1, 𝑥2, . . , 𝑥𝑛), where 𝑛 = 17, in the form of a linear
sequence of bit strings from binary symbols (Fig. 4).


Figure 4: Symbolic notation of the solution vector

    Thus, the symbolic model of the solution vector of the given problem 𝑥 ∈ 𝐷, can be represented in
the form of a bit string, that is used to specify a set of admissible solutions 𝑥𝑖, belonging to the search
domain 𝐷.
    A binary combination ℎ(𝑥𝑖) is taken as a gene, that determines the fixed value of the parameter 𝑊𝑖
in the binary code. And the smallest indivisible unit amenable to evolution is a person 𝐻1 (𝑟 - the
                                                                                                  𝑟 number
of the person in the population, 𝑡 - the moment in time of the evolutionary process), characterized by
𝑛-genes, each of that is responsible for the corresponding variable (Fig. 5).


Figure 5: Genotypic chromosome

   The chromosome that has specific values of alleles in its loci is called a genotype, that contains all
the hereditary genetic information of a person 𝐻𝑟1. The finite set of all admissible genotypes is the
gene pool.
   The assessment of the fitness of chromosomes in the population will be determined by calculating
the membership function (fitness function) for each chromosome of this population. In our case, this
assessment is performed using the membership function, that determines the difference between the
calculated and real output value with the same input action and represents the numerical value of the
function calculated for an admissible solution to the problem 𝑥 ∈ 𝐷 − 𝛼(𝐻1) = (𝑌 − 𝑌)2, and the
                                                                                  𝑟       𝑝
smaller the value of the membership function, the better the quality of the chromosome. The fitness
function always takes a non-negative value, in addition, to solve the optimization problem, this function
needs to be minimized.
   A collection of individuals (𝐻𝑡, … , 𝐻𝑡)        forms a population 𝑃 𝑡, where 𝑝 - the size of the
                                      1      𝑟
population, and 𝑡 = 0,1, … , 𝑇, where 𝑇 determines the period of its evolution. The goal of population
evolution is to increase the average value of the membership function of the population as a whole:
                                                          𝑝
                                                        1
                                              𝑆𝑖𝑡𝐶𝑃(𝑡) = ∑ 𝑆𝑖𝑡𝑖                                     (3)
                                                         𝑡1

    The effectiveness of the genetic algorithm, the quality of the obtained solution and the evolution as
a whole largely depends on the structure and quality of the initial population.
    Thus, the optimization procedure using the genetic algorithm is iterative and includes two stages:
    - the synthesis of new chromosomes (crossing and mutation);
    - the selection of chromosomes in a new population.
    The process continues until the optimal solution and the given number of generations are obtained.
At the same time, the goal of population evolution (2) is taken into account, that is, each subsequent
population should be better than the previous one. The solution of the problem corresponds to the
chromosome with the minimum value of the membership function, that determines the optimal vector
of weighting coefficients 𝑊𝑖, while the learning error (1) is less than the specified value 𝛼𝑚𝑖𝑛. If the stop
of the algorithm cannot be fulfilled according to the condition, then the procedure is completed
according to the option and with the selection of the best chromosome in one or more generations.
    The optimal weighting coefficients of a multilayer perceptron type neural network with two neurons
in the hidden layer (Fig. 1) were determined using a genetic algorithm with the following parameters:
the number of chromosomes in the population is 10; the number of populations - from 20 to 100;
crossover operator - one-point crossover; percentage of gene mutation - 0.001; the selection is elite.
Figure 6 shows the probability of finding extrema of the function from a given number of populations.
    The obtained results show that the speed of approaching the extremum is high and increases with
the increase in the number of populations [9-13]. It’s also possible to draw a conclusion about the low
percentage of experiments in that the exact value of the minimum was obtained, depending on the total
number of runs of the genetic algorithm, that is, on the number of populations [9-15].


Figure 6: Probability of finding extrema of the function for a given number of populations

    Analyzing the projections of time series of temperature (Fig. 7) [9], it should be noted that NN
relatively correctly predicted its decrease, increase and stabilization.


Figure 7: Graphs of deep learning prediction of temperature time series by neural networks
4. Conclusion
   1. Experiments on learning neural networks showed that known methods of local and global
optimization (gradient, stochastic, Newton, Hessian, etc.) require a significant number of learning steps,
that are sensitive to the accuracy of calculations, require a significant number of additional variables.
In most cases, they make it possible to find a local rather than a global extremum. That is why the search
and development of new methods of learning neural networks is an urgent task.
   2. In order to increase the reliability of decisions made on the basis of a neural network, it’s
necessary to investigate alternative optimization algorithms that allow finding the global extremum.
Therefore, there is a need to use such approaches, that wouldn’t have the mentioned disadvantages. The
genetic optimization algorithms are the most promising in this regard.
   3. The scientific and methodical foundations of learning neural networks are developed in the paper.
In contrast to traditional learning methods, the concept of learning a neural network using a genetic
algorithm is defined, that is the most effective method of optimizing weighting coefficients that
minimize the value of the error of the network.
   4. The study of the effectiveness of the application of the genetic algorithm to optimize the
functioning and deep learning of neural networks in operation established the perspective of such an
approach for biotechnical complexes that are exposed to external disturbances (external temperature).

5. References
[1] M. O. Korchemny, V. P. Lysenko, M. V. Chapny, V. M. Shtepa, Neural Networks, Kyiv, Agrar
     Media Group, 2010.
[2] M. O. Korchemny, V. P. Lysenko, M. V. Chapny, V. M. Shtepa, Neural Networks: Theory and
     Practice, 4th ed., Kyiv, 2013.
[3] I. Bratko, Algorithms of artificial intelligence in the PROLOG language, Moscow, Williams,
     2004.
[4] C. Igel, M. Kreutz, Operator adaptation in evolutionary computation and its application to structure
     optimization of neural networks, Neurocomputing, No. 55(1-2), 2003, pp. 347-361.
[5] T. Elsken, J. H. Metzen, F. Hutter, Efficient multi-objective neural architecture search via
     lamarckian evolution, ArXiv preprint arXiv:1804.09081, 2018.
[6] T. Elsken, J. H. Metzen, F. Hutter, The search for neural architecture: an overview. Journal of
     Machine Learning Research, No. 20(1), 2019.
[7] A. A. Ridkokasha, K. K. Golder, Basics of Artificial Intelligence Systems, Cherkasy, ECHO-
     PLUS, 2002.
[8] V. V. Kruglov, Artificial neural networks. Theory and practice, Moscow, Hotline – Telecom, 2002.
[9] V. P. Lysenko, N. A. Zayets, V. M. Shtepa, A. O. Dudnyk, Neural Network Forecasting of Time
     Series of Ambient Temperature, Bioresources and nature conservation 3. 3-4 (2011) 102-108.
[10] N. A. Zayets, V. M. Shtepa, The Use of A Genetic Algorithm For Solving Optimization Problems
     In Electrical Engineering, Scientific Bulletin of the National University of Bioresources and
     Nature Management of Ukraine 166. 4 (2011).
[11] N. A. Zayets, S. A. Shvorov, V. M. Shtepa, V. O. Osypa, The Use of Genetic Algorithms to
     Calculate the Optimal Settings for the Operation of a Robotic Complex, Collection of Scientific
     Works of the Military Institute of Taras Shevchenko Kyiv National University 38 (2012).
[12] N. A. Pasichnyk, S. A. Shvorov, Y. A. Gunchenko, I. Sharipova, T. M. Tereshchenko,
     Methodological Bases of Construction of Dispatchers Intensive Training Simulators of Air Traffic
     Control, in: Proceedings of IEEE 6th International Conferenceon Methods and Systems of
     Navigation and Motion Control, MSNMC, 2020, рр. 122–125.
[13] S. Shvorov, V. Lysenko, N. Pasichnyk, V. Lukin, A. Martsyfei, The method of determining the
     amount of yield based on the results of remote sensing obtained using UAV on the example of
     wheat, in: Proceedings of 15th International Conference on Advanced Trends in Radioelectronics,
     Telecommunications and Computer Engineering, TCSET 2020, 2020, рр. 245–248.
[14] Y. A. Gunchenko, S. A. Shvorov, N. D. Rudnichenko, V. D. Boyko, Methodical complex of
     accelerated training for operators of unmanned aerial vehicles, in: Proceedings of IEEE 4th
     International Conference Methods and Systems of Navigation and Motion Control, MSNMC 2016,
     2016, рр. 130–133.
[15] A. N. Voronin, A. G. Yasinsk, S. A. Shvorov. Synthesis of compromise-optimal trajectories of
     mobile objects in conflict environment, Journal of Automation and Information Sciences 34(2)
     (2002).

</pre>