=Paper= {{Paper |id=Vol-3013/20210074 |storemode=property |title=Automated Software Design for FPGAs on an Example of Developing a Genetic Algorithm |pdfUrl=https://ceur-ws.org/Vol-3013/20210074.pdf |volume=Vol-3013 |authors=Anatoliy Doroshenko,Volodymyr Shymkovych,Olena Yatsenko,Tural Mamedov |dblpUrl=https://dblp.org/rec/conf/icteri/DoroshenkoSYM21 }} ==Automated Software Design for FPGAs on an Example of Developing a Genetic Algorithm== https://ceur-ws.org/Vol-3013/20210074.pdf
Automated Software Design for FPGAs on an Example of
Developing a Genetic Algorithm
Anatoliy Doroshenko 1,2, Volodymyr Shymkovych 2, Olena Yatsenko 1, and Tural Mamedov 1
1
  Institute of Software Systems of National Academy of Sciences of Ukraine, Glushkov prosp. 40, Kyiv, 03187,
Ukraine
2
  National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Peremohy prosp. 37,
Kyiv, 03056, Ukraine


                 Abstract
                 The paper proposes the method and software tools for automated design and synthesis of
                 parallel programs for field-programmable gate arrays (FPGAs) based on the algebra-
                 algorithmic approach. The developed facilities provide the construction of parallel algorithm
                 schemes by superposition of language constructs of Glushkov’s system of algorithmic
                 algebra. Based on schemes, the corresponding source code in VHDL is automatically
                 generated, which is further executed on an FPGA. The flexibility of reconfigurable FPGA
                 architecture is very attractive for the realization of computationally complex algorithms and
                 allows synthesizing high-efficiency solutions that differ from other architectures by
                 substantially less energy consumption at similar performance rates. The approach to the
                 automated design of parallel programs for FPGA is illustrated with an example of developing
                 a genetic algorithm utilized at the training of multilayer neural networks. The results of the
                 experiment consisting in executing the generated program code on an FPGA are given.

                 Keywords 1
                 Algorithmic algebra, automated algorithm design, FPGA, genetic algorithm, parallel
                 computation, software synthesis, neural network

1. Introduction
    The rapid growth of integration degree and functional complexity of modern electronic devices
results in the necessity of improving and developing methods of designing and programming
integrated circuits, in particular, field-programmable gate arrays (FPGAs). FPGAs contain an array of
programmable logic blocks, and a hierarchy of reconfigurable interconnects that allow the blocks to
be wired together, like many logic gates that can be inter-wired in different configurations. Logic
blocks can be configured to perform complex combinational functions or merely simple logic gates.
In most FPGAs, logic blocks also include memory elements, which may be simple flip-flops or more
complete blocks of memory. Many FPGAs can be reprogrammed to implement different logic
functions, allowing flexible reconfigurable computing as performed in computer software. To define
the behavior of the FPGA, the user provides a design in a hardware description language (HDL) or as
a schematic design. The most common HDLs are VHDL [1] and Verilog [2] as well as extensions
such as SystemVerilog [3]. VHDL provides a high-level abstraction for describing hardware facilities
owing to the availability of a set of predefined data types and a possibility to create user-defined
hierarchically organized data types based on the basic ones built into the language. Designing in
HDLs is rather a complex process and has been compared to the equivalent of programming in
assembly languages. Therefore, there is a need to raise the abstraction level of design.

ICTERI-2021, Vol I: Main Conference, PhD Symposium, Posters and Demonstrations, September 28 – October 2, 2021, Kherson, Ukraine
EMAIL: doroshenkoanatoliy2@gmail.com (A. Doroshenko); v.shymkovych@kpi.ua (V. Shymkovych); oayat@ukr.net (O. Yatsenko);
tural.mamedov@outlook.com (T. Mamedov)
ORCID: 0000-0002-8435-1451 (A. Doroshenko); 0000-0003-4014-2786 (V. Shymkovych); 0000-0002-4700-6704 (O. Yatsenko);
0000-0003-3029-5834 (T. Mamedov)
            © 2021 Copyright for this paper by its authors.
            Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
            CEUR Workshop Proceedings (CEUR-WS.org)
   In the previous works [4–7], we had been developing a theory, methodology, and tools for
automated program construction, based on Glushkov’s system of algorithmic algebra (SAA) [4, 5].
The specific feature of the developed methodology consists in the formalization of processes of
design and synthesis of algorithms and programs. The algorithms are designed in terms of high-level
schemes represented in SAA. The formal facilities and software automation tools for designing
parallel programs for multicore CPUs [6], Nvidia GPUs using CUDA [5], and heterogeneous
platforms using OpenCL [7] were developed. The purpose of this paper is to apply our tools for the
automated design of parallel programs for FPGAs in VHDL language and illustrate the approach with
an example of designing a genetic algorithm used at the training of multilayer neural networks. The
results of the experiment consisting in executing the developed program on an FPGA are also given.

2. Algebra of Algorithms and Designing of Parallel Programs for FPGA
   The design of parallel programs for FPGA being proposed is based on the system of algorithmic algebra [4]
focused on high-level construction and transformation of algorithms represented in the form of
schemes. SAA is the two-sorted algebra GA = <{Pr, Op}; GA > , where Pr and Op are the sets of
predicates and operators defined on an information set;  GA is the signature consisting of logic
operations (disjunction, conjunction, negation) and operator constructs, in particular:
    •    serial execution of operators: “operator1”; “operator 2” ;
    •    branching: IF ‘condition’ THEN “operator1” ELSE “operator 2” END IF ;
    •    for loop: FOR (counter FROM start TO fin) “operator ” END OF LOOP .
    In SAA, identifiers of basic and compound predicates are enclosed in single quotes and identifiers
of operators are written with double ones. The representations of algorithms in SAA are called SAA
schemes. The main difference of Glushkov’s SAA from other procedural programming languages is
that it allows to specify programs in algebraic and natural linguistic form, and contains facilities for
formal program transformation. The developed Integrated toolkit for Designing and Synthesis of
programs (IDS) [4–6] provides automated construction of algorithm schemes and generation of
corresponding code in target programming languages (C, C++, Java). Algorithms are designed using a
list of SAA constructs and a tree. The user chooses the constructs from the list and adds them to an
algorithm tree. On each step of the construction process, the system allows a user to select only those
operations, the insertion of which into a scheme does not break its syntactical correctness. The
algorithm tree is then used for the automatic generation of SAA scheme text and program code. The
mapping of each SAA construct to a text in a programming language is specified as a code template in
the IDS database. In this paper, we add new SAA constructs intended for high-level design of
programs for FPGA in VHDL language.
    VHDL [1] is a formal notation aimed at the description and logic organization of a digital system.
The function of the system is defined as the conversion of values at inputs into values at outputs. The
organization of the system is defined by a set of connected components. The language is intended for
modeling primarily on a gate level, register-transfer level, and chip frames, and is used at a synthesis
of devices. VHDL has facilities for describing asynchronous parallel processes.
    An entity and an architecture belong to the main concepts in the VHDL language.
    The entity is defined as an interface of a project object. It’s a description of a project component
having well-defined inputs and outputs and performing a certain function. It can represent the whole
system being designed, some subsystem, device, node, chipboard, macrocell, logic unit, etc. The
description of the entity in the SAA language is the following:

   ENTITY entity_name IS
    PORT (“operator”)
   END OF ENTITY,

where entity_name is the identifier of the project object; “operator” are basic operator(s) declaring
input and output ports. The examples of operators declaring input and output ports can be the
following:
   “Signal (name) direction (dir) of type (tp)”;
   “Signal (name) direction (dir) of type (tp) and range (rng) with initial value (val)”,

where dir can be in, out, or inout.
    An example of an entity declaration for a combinatorial circuit implementing a logical function
f = (x1 and x2) or x3 (see Figure 1) is the following:

   ENTITY and_or IS
    PORT (
     “Signals (x1, x2, x3) direction (in) of type (bit)”;
     “Signal (f) direction (out) of type (bit)”);
   END OF ENTITY.

   The architecture defines the behavior of the system or its structure on a functional level of its
description. The description of the architecture in the SAA language is the following:

   ARCHITECTURE arch_name of entity_name IS
    DECLARATIONS (“operator1”);
    “operator2”
   END OF ARCHITECTURE,

where arch_name is the identifier of the architecture; entity_name is the name of the system (entity)
for which the architecture is defined; “operator1” defines architecture declarations which may
typically be any of the following: type, subtype, signal, constant, file, alias, component, attribute,
function, procedure, configuration specification; “operator2” are operator(s) describing the system
behavior.



                                    x1
                                             &        w
                                    x2                       1
                                                                    f
                                    x3
                                         architecture
                                         entity

                                 Figure 1: Combinational logic circuit

   The operator of a process belongs to parallel operators in VHDL. It defines the independent
sequential behavior of some part of a project, described by an ordered set of sequential operators. The
construction defining the process in SAA is the following:

   PROCESS name (signal1, signal2, …) IS
    DECLARATIONS (“operator1”);
    “operator2”
   END OF PROCESS,

where name is the identifier of the process followed by an optional list of signals which cause the
process to be activated; “operator2” defines the process body.
   As an example, consider the architectures a1 and a2 defining the behavior of the above
combinatorial circuit:

   ARCHITECTURE a1 of and_or IS
    (f <= (x1 and x2) or x3);
   END OF ARCHITECTURE;

   ARCHITECTURE a2 of and_or IS
    DECLARATIONS (“Signal (w) of type (bit)”);
    (w <= x1 and x2);
    PROCESS p1 (w, x3) IS
      (f <= w or x3);
    END OF PROCESS;
   END OF ARCHITECTURE,

where <= is the operation assigning a value to a signal.
  Other examples of application of the above operations are given in Section 3.

3. Designing a Parallel Genetic Algorithm for FPGA
    A genetic algorithm is a simple model of evolution in nature, implemented as a computer
program [8, 9]. It reflects the processes of genetic inheritance and natural selection. It imitates the
evolution of the population as a cyclic process of crossover and generation change. Genetic
algorithms in various forms are used for solving scientific and technical problems [10–19]. In
machine learning, they are used at designing neural networks and manipulation robotics.
    One of the main features of neural networks is a parallel processing of signals. Multilayer neural
networks are homogenous computing environments. According to the neuroinformatics terminology,
they are universal parallel computing structures intended for solving various classes of tasks. In
implementing neural networks on FPGA, each network layer is working in parallel with others, which
allows using the pipeline principle. Neurons in each layer work in parallel too according to the
principle of multiprocessor processing of data. That is, every neuron is a separate process, and
processing information in each neuron is carried out simultaneously. Each neuron is presented as a
separated block consisting of several parallel processes, and the neural network is a multiprocessor
system. Programming language (VHDL) allows to explicitly define signals launching a process. For
launching the computing process with a neuron, the input signal of the neuron is used. In our previous
works [11–13], a method for implementing nonlinear neural network activation functions on FPGA
was developed. Examples of realization of sigmoidal neural network activation functions and
Gaussian activation functions for radial-base neural networks are considered. Hardware
implementation of activation functions, artificial neurons, and a series of neural networks has been
performed. The comparison with existing analogues on parameters of speed, the used hardware
resource, and accuracy is executed.
    In this section, a parallel genetic algorithm for training neural networks on FPGA is designed in
SAA with the further generation of VHDL code. Every neuron corresponds to a process in the VHDL
language. Based on this, the neurons (separate processes) are naturally are combined into a network:
the outputs of the preceding network layer launch the processes of the next layer. The training of an
artificial neural network consists in the tuning of weight coefficients wi ,j of its basic elements, as a
result of which the network performs certain tasks as recognition, optimization, approximation, and
controlling.
   In developing the parallel computing system of optimization of weight coefficients of neural
networks, it is necessary to explore the characteristics of the algorithm being implemented [10].
Preparation of hardware implementation of neural network training procedures with the genetic
algorithm is done based on the graph
                                          GDF = ( A, D ) ,
where A is a set of vertices corresponding to operations; D is a set of edges representing data flows.
   Figure 2 shows the graph of computations in a genetic algorithm with tasks highlighted that can be
executed in parallel.
                 Figure 2: The graph of computations in a parallel genetic algorithm

    The implementation of the genetic algorithm on FPGA consists of the following steps.
    Step 1. Initialization of the initial population with chromosomes that contain information about the
values of the weight coefficients of the network with a given structure. A set of chromosomes is
represented as a two-dimensional array (see Table 1). Each row of the table corresponds to a
chromosome and contains information about the whole set of weight coefficients of the network.
    Step 2. Evaluation of chromosomes of the current population. Each chromosome is decoded to a
set of weight coefficients of the neural network. Values of the fitness function are calculated, which
takes into account an error and network complexity. This function defines the difference between the
obtained network output and the required one.
    Step 3. If one of the values satisfies the problem’s condition, then go to step 7.
    Step 4. Selection of chromosomes for further crossover and mutation, which is done by sorting
according to the value of the fitness function.
    Steps 5 and 6. Application of the operators of crossover and mutation for chromosomes selected
on a previous step.
Table 1
Representation of chromosomes’ population with a two-dimensional array
 Chromosome’s                                 Weight coefficients
    number
                           1st neuron                2nd neuron                        n-th neuron
                       1          2         3         4         5                      n–1        n
        1             -1         -1        -1        -1        -1      ...              -1       -1
        2             -1         -1        -1         0         0      ...               0        0
        3              1          1         1         1         1      ...               1        1
                                              …
       k–1             0          0         0        0.5       0.5     ...              -1        -1
        k            0.5        0.5       0.5        0.5       0.5     ...              0.5       0.5

   The following values are calculated:
                               1 + 1 + 8N
                          n2 =            , n1 = n2 − 0.5 −    ( n2 − 0.5) − 2 N ,
                                                                         2

                                    2
                            n1 n2 mMax b −1
                                                            B
                                                                      
                     H  =      H imk 2k + H jmb 2b +  H imk 2k , n1,2  N ,
                           i =1 j =i +1 m =1  k =1       k = b +1    
where H  is a new set of chromosomes, H is a previous one. Each chromosome is considered as a set
of bits. Then the first sum defines the first chromosome for crossover (its index is i), the second sum
defines the second chromosome (index j). The expression in brackets defines the operations with bits
of the new chromosome, where b and k are indexes of bits; 2 b , 2 k denote the position of a bit in a
binary number; Hik is k-th bit of i-th chromosome; H jb is a bit with number b in j-th chromosome; m
is a number of weight coefficient in a chromosome.
    According to the formula, all bits of the new chromosome are copied from і-th chromosome of the
preceding generation, except for the bit with number b which is copied from j-th chromosome. This
operation is implemented using for loops. After performing the operation, the new generation is
formed and the transition to step 2 is done.
    Step 7. The genetic algorithm is stopped and obtained values are substituted into the neural
network.
   The genetic algorithm block (see Figure 3) consists of the ports which receive input signals in1,
in2, in3, the port which saves the value zout to be obtained as a result of training, and the output port
Nout which shows the current value of the output after each training iteration. The signal CLC creates
delays in one picosecond.


                               in1(0:6)                        CLC
                               in2(0:6)
                               in3(0:6)
                              zout(0:6)                        Nout(0:20)
                                 Figure 3: The genetic algorithm block



   This block is defined as the following entity in SAA:

   ENTITY genVHDL is
    “Signal (in1) direction (in) of type (integer) and range (-100 to 100)
    with initial value (63)”;
    “Signal (in2) direction (in) of type (integer) and range (-100 to 100)
    with initial value (-82)”;
    “Signal (in3) direction (in) of type (integer) and range (-100 to 100)
    with initial value (70)”;
    “Signal (zout) direction (in) of type (integer) and range (0 to 100)
    with initial value (77)”;
    “Signal (CLC) direction (inout) of type (bit) with initial value (‘1’)”;
    “Signal (Nout) direction (inout) of type (integer) and range
     (-1048575 to 1048576);
   END OF ENTITY.

   The block implements operators of crossover, mutation, and selection of the next chromosome
generation. The architecture of the entity contains four main processes: neuron, ChrsToW, main, and
Genetic. The neuron process starts as soon as the value of the start signal changes its value, i.e. after
the training begins. The SAA scheme of this process is the following:

   PROCESS neuron (start) is
    DECLARATIONS (
     “Variable (LocalOut) of type (integer) and range (0 to 1023)”;
     “Variable (lin) of type (integer)”);
    (LocalOut := 100);
    (lin := (in1 * W(1) + in2 * W(2) + in3 * W(3)) / 16);
    FOR (a FROM 0 TO 49)
      IF (abs(lin) >= x(a)) AND (abs(lin) < x(a + 1))
      THEN (LocalOut := 50 + a); EXIT LOOP;
      END IF
    END OF LOOP;
    IF (lin < 0) THEN (LocalOut := 100 – LocalOut);
    END IF;
    IF (NOT (finishTeaching)) THEN
      (lout(i) := LocalOut);
      NFinish <= NOT NFinish;
    END IF;
    Nout <= LocalOut;
   END OF PROCESS,

where LocalOut is the current neuron output, which is compared with the expected result of variable
zout; W(1), W(2), W(3) are neuron weights; i is the number of the chromosome on the basis of which
weight coefficients were formed before launching the process which forms the output of the neural
network.
   The process ChrsToW converts the value of chromosome object into values of synapse weights.
The process Genetic performs sorting, crossover, and mutation of the chromosomes. In particular,
crossover and mutation is presented by the following scheme:

   (num3 := 0);
   FOR (k FROM 1 TO 3)
     FOR (i FROM k + 1 TO 4)
      (num3 := num3 + 1);
      (chrs(num3) := chrs2(k));
      FOR (j FROM 1 TO 3)
        FOR (m FORM 1 TO 2)
          (num5 := num5 * 29 / 8 rem 1048576);
          (num2 := (num5) rem 8);
          (chrs(num3)(j)(num2) := chrs2(i)(j)(num2));
       END OF LOOP;
       FOR (m FROM 1 TO 4)
        (num5 := num5 * 29 / 8 rem 1048576);
        (num2 := abs(num5) rem 8);
        (chrs(num3)(j)(num2) := NOT(chrs(num3)(j)(num2));
       END OF LOOP;
     END OF LOOP;
    END OF LOOP;
   END OF LOOP.

   Here chrs and chrs2 are bit arrays storing weight coefficients of chromosomes.
   IDS generated VHDL code based on the designed SAA schemes.

4. Experiment Results
   Consider the example of training one neuron with three inputs and three weight coefficients,
correspondingly. The hardware implementation of the genetic algorithm according to the proposed
methodology was done on Xilinx Spartan 3 XC3S200 FPGA in Xilinx ISE Design Suite 13.2
environment and modeled using ISE Simulator (ISim). The process of training is represented in
Figure 4.




             Figure 4: The process of neural network training with the genetic algorithm

    Signals in1, in2, in3, zout, clc, nout were described in Section 3. The purpose of the other signals
is the following:
    •     finishteaching stores the information that training of the network is finished;
    •     start launches the process, forms the output of the network for each new set of weight
    coefficients;
    •     i is a number of a chromosome, on the basis of which weight coefficients were formed before
    the launch of the process forming the network’s output (before setting signal start to 1);
    •     noutsforming is set to 1 while chromosomes are run through the neural network. It transits
    from 0 to 1 when all 6 chromosomes are processed, and the process can proceed to results’
    analysis: finish the algorithm or form new chromosomes using crossover and mutation. The
    transition of the signal from 0 to 1 means the start of the new iteration of the genetic algorithm;
    •     analysis is set to 1 while the information obtained at running chromosomes through neural
    networks is analyzed.
    The process of adjustment of weight coefficients of the neural network completed in three
iterations, which is shown in Figure 4. In the ISE Simulator (ISim) environment, a step lasting
1 picosecond on each operation of the genetic algorithm is defined programmatically for adjustment
and work demonstration. The time of training is 0.15 milliseconds.
    Processes of tuning of weight coefficients for neural networks with 2 and 3 neurons were also
modeled. The first network consisted of two sequentially connected neurons and four synapses. The
time of training was 0.2 milliseconds.
    The second neural network consisted of two neurons in the first (input) layer and one neuron in the
second (output) layer. Neurons are connected with four synapses. The time of training was 0.23
milliseconds.
    The comparison of the obtained results with equivalent results found in similar works [14, 15,
17, 18] is presented in Table 2.

Table 2
A comparison of the results obtained by other authors ( T1 ) and obtained in this work ( T2 )
    Reference to           N       K       Training time T1 ,      Training time T2 ,       Speedup,
    similar work                                  ms                      ms                  T1 / T2
         [14]             64      500            0,941                   0,625                  1,5
         [15]             16      256            0,800                   0,230                  3,5
         [17]             20      380           74,000                   0,285                  260
         [18]             32      200            1,600                   0,200                  8,0

   The comparisons were made with the highest similarity of parameters and implemented on the
same chip. The first column shows the references to the works. The next two columns indicate the
parameters of the genetic algorithm in corresponding works: N is the size of the chromosome, K is
the number of epochs in the genetic algorithm. The following columns show the training time
obtained in other papers, the time obtained in this work, and also respective speedup.
   The results of the modeling show that the developed method of training of neural networks with a
genetic algorithm at hardware implementation on FPGA allows significantly speed up the adaptation
of neural network components of control systems and thus increase their efficiency.

5. Related Work
    The proposed approach is related to works on the synthesis of programs from
specifications [20, 21], automated generation of VHDL code [22–24], and implementation of genetic
algorithms on FPGA [14–19, 25–27]. In paper [22], a Java library to read, manipulate, and write
(generate) VHDL code is presented. Paper [23] describes an automatic process of converting XSG
(Xilinx System Generator) specifications into efficient VHDL code. The process involves customized
fixed-point hardware definition, data flow graph extraction, resource-constrained, and latency-
constrained scheduling, and VHDL specification of the system. Work [24] presents a generator of
high-speed input (parser) and output (deparser) network blocks from the P4 language which is
designed for the description of modern packet processing devices. The tool converts a P4 description
to a synthesizable VHDL code suitable for the FPGA implementation.
    The main difference of our approach from the mentioned works is that it uses algebraic
specifications, based on Glushkov’s algebra of algorithms. Specifications are represented in a natural
linguistic form simplifying the understanding of algorithms and facilitating the achievement of
demanded software quality. Another advantage of our tools is the method of automated design of
syntactically correct algorithm specifications, which eliminates syntax errors during the construction
of algorithm schemes.
    Implementations of genetic algorithms on FPGAs are considered in works [14–19, 25–27].
Paper [14] proposes the modular realization of a genetic algorithm. However, the implementation
does not use a parallel processing strategy and uses several loops for each generation. In each
generation, it is necessary to read and write the generation from/to the memory. Implementations
considered in [15, 16] are applications of genetic algorithms in systems of digital signal processing
and control built into FPGA. Paper [15] presents a real-time genetic algorithm for adaptive filtering
program with all modules implemented in hardware, such as fitness function, selection, crossover,
mutation, and random number generator functions. The speed of 320 thousand generations per second
was reached. Paper [16] proposes a genetic algorithm for dynamic systems based on blocks of filters.
Several approaches of high-speed general-purpose hardware for accelerating genetic algorithms are
proposed in [17–19]. These approaches improve the configuration of parameters of genetic algorithms
in hardware architecture, but decrease the parallelization of hardware and reduce the high-speed
performance of a genetic algorithm. Paper [25] proposes a genetic algorithm for sequential and
parallel pipeline solutions on FPGA using Verilog HDL, which is applied for solving travelling
salesman problem (TSP). Paper [26] presents a hardware implementation of the crossover module in
the genetic algorithm for TSP. A combination of pipelining and parallelization with a genetic
algorithm processor to improve processing speed is employed. Work [27] describes a parallel
implementation of a genetic algorithm on FPGA which can optimize a wide range of functions in a
viable time for critical applications that require short time constraints or a large amount of data to be
processed in a short interval. Articles [25–27] are aimed at implementing the classical genetic
algorithm and solving problems of finding the extremum of a function. Our work differs from them
by using the SAA-based toolkit for automated design of such hardware. The hardware is implemented
according to the method developed in our previous works [10, 12] to optimize the weights of neural
networks using a genetic algorithm.

6. Conclusion
    The paper proposes the method and software tools for automated design and synthesis of parallel
programs for field-programmable gate arrays based on the algebra-algorithmic approach. The
developed facilities provide the construction of parallel algorithm schemes by superposition of
language constructs of Glushkov’s system of algorithmic algebra. Based on the schemes, the
corresponding source code in VHDL is automatically generated, which is further executed on an
FPGA. The particular feature of the approach consists in using high-level algebra-algorithmic
program specifications represented in a natural linguistic form. The specifications are the basis for the
automatic generation of source code in a programming language. The approach is illustrated on
developing a genetic algorithm applied at the training of multilayer neural networks. The experiment
results showed that with the developed genetic algorithm implemented on FPGA, neural network
training is significantly faster than in related works.

References
[1] P. P. Chu, RTL Hardware Design Using VHDL: Coding for Efficiency, Portability and
    Scalability, Wiley-Interscience, Hoboken, NJ, 2006.
[2] S. Palnitkar, Verilog HDL: A Guide to Digital Design and Synthesis, Prentice Hall PTR, Upper
    Saddle River, NJ, 2003.
[3] S. Sutherland, S. Davidmann, P. Flake, SystemVerilog for Design: A Guide to Using
    Systemverilog for Hardware Design and Modeling, Springer-Verlag, Berlin, 2006.
[4] A. Doroshenko, O. Yatsenko, Formal and Adaptive Methods for Automation of Parallel
    Programs Construction: Emerging Research and Opportunities, IGI Global, Hershey, 2021.
[5] P. I. Andon, A. Yu. Doroshenko, K. A. Zhereb, O. A. Yatsenko, Algebra-Algorithmic Models
    and Methods of Parallel Programming, Akademperiodyka, Kyiv, 2018.
[6] A. Doroshenko, K. Zhereb, O. Yatsenko, Developing and optimizing parallel programs with
    algebra-algorithmic and term rewriting tools, in: V. Ermolayev, H. C. Mayr, M. Nikitchenko,
    A. Spivakovsky, G. Zholtkevych (Eds.), ICTERI 2013, volume 412 of Communications in
    Computer and Information Science, Springer-Verlag, Cham, 2013, pp. 70–92. doi:10.1007/978-
    3-319-03998-5_5.
[7] A. Doroshenko, O. Beketov, M. Bondarenko, O. Yatsenko, Automated design of parallel
    programs for heterogeneous platforms using algebra-algorithmic tools, in: V. Ermolayev,
     F. Mallet, V. Yakovyna, H. Mayr, A. Spivakovsky (Eds.), ICTERI 2019, volume 1175 of
     Communications in Computer and Information Science, Springer-Verlag, Cham, 2020, pp. 3–23.
     doi:10.1007/978-3-030-39459-2_1.
[8] E. Dumesnil, P.-O. Beaulieu, M. Boukadoum, Fully parallel FPGA implementation of an
     artificial neural network tuned by genetic algorithm, in: Proceedings of the 16th. IEEE
     International New Circuits and Systems Conference, NEWCAS 2018, IEEE, New York, NY,
     2018, pp. 365–369. doi:10.1109/ NEWCAS.2018.8585580.
[9] M. F. Torquato, M. A. C. Fernandes, High-performance parallel implementation of genetic
     algorithm on FPGA 38 (2019) 4014–4039. doi:10.1007/s00034-019-01037-w.
[10] P. I. Kravets, V. M. Shymkovych, A method for optimizing the weighting coefficients of neural
     networks using a genetic algorithm when implemented on programmable logic integrated
     circuits, Èlektron. model. 35.3 (2013) 65–74.
[11] P. I. Kravets, V. N. Shimkovich, D. A. Ferens, Method and algorithms of implementation on
     PLIS the activation function for artificial neuron chains, Èlektron. model. 37.4 (2015) 63–74.
[12] V. Symkovych, P. Kravets, Hardware implementation neural network controller on FPGA for
     stability ball on the platform, in: Z. Hu, S. Petoukhov, I. Dychka, M. He (Eds.), Proceedings of
     the 2nd. International Conference on Computer Science, Engineering and Education
     Applications, volume 938 of ICCSEEA 2019, Springer Nature, Cham, 2020, pp. 247–256.
     doi:10.1007/978-3-030-16621-2_23.
[13] V. Shymkovych, S. Telenyk, P. Kravets, Hardware implementation of radial-basis neural
     networks with Gaussian activation functions on FPGA, Neural Comput. & Applic. 33
     (2021) 9467–9479. doi:10.1007/s00521-021-05706-3.
[14] F. Mengxu, T. Bin, FPGA implementation of an adaptive genetic algorithm, in: Proceedings of
     the 12th International Conference on Service Systems and Service Management, ICSSSM 2015,
     IEEE, New York, NY, 2015, pp. 1–5. doi:10.1109/ICSSSM.2015.7170318.
[15] H. Merabti, D. Massicotte, Hardware implementation of a real-time genetic algorithm for
     adaptive filtering applications, in: Proceedings of the 27th Canadian Conference on Electrical
     and Computer Engineering, CCECE 2014, IEEE, New York, NY, 2014, pp. 1–5.
     doi:10.1109/CCECE.2014.6901026.
[16] N. Sehatbakhsh, M. Aliasgari, S. M. Fakhraie, FPGA implementation of genetic algorithm for
     dynamic filter-bank-based multicarrier systems, in: Proceedings of the 8th International
     Conference on Design & Technology of Integrated Systems in Nanoscale Era, DTIS 2013, IEEE,
     New York, NY, 2013, pp. 72–77. doi:10.1109/DTIS.2013.6527781.
[17] M. S. B. Ameur, A. Sakly, FPGA based hardware implementation of Bat algorithm, Appl. Soft
     Comput. 58 (2017) 378–387. doi: 10.1016/j.asoc.2017.04.015.
[18] L. Guo, A. I. Funie, Z. Xie, D. Thomas, W. Luk, A general-purpose framework for
     FPGA-accelerated genetic algorithms, Int. J. Bio-Inspir. Comput. 7.6 (2015) 361–375.
     doi:10.1504/IJBIC.2015.073183.
[19] M. Peker, A fully customizable hardware implementation for general purpose genetic algorithms,
     Appl. Soft Comput. 62 (2018) 1066–1076. doi:10.1016/j.asoc.2017.09.044.
[20] P. Flener, Achievements and prospects of program synthesis, in: A. C. Kakas, F. Sadri (Eds.),
     Computational Logic: Logic Programming and Beyond, Essays in Honour of Robert A.
     Kowalski, volume 2407 of Lecture Notes in Artificial Intelligence, Springer-Verlag, London,
     2002, pp. 310–346.doi:10.1007/3-540-45628-7_13.
[21] S. Gulwani, Dimensions in program synthesis, in: Proceedings of the 12th. International ACM
     SIGPLAN symposium on Principles and practice of declarative programming, PPDP ’10, ACM,
     New York, NY, 2010, pp. 13–24. doi:10.1145/1836089.1836091.
[22] C. Pohl, C. Paiz, M. Porrmann, vMAGIC — automatic code generation for VHDL, International
     Journal of Reconfigurable Computing 2009 (2009) 1–9. doi:10.1155/2009/205149.
[23] P. Martín, E. Bueno, Fco. J. Rodríguez, O. Machado, B. Vuksanovic, An FPGA-based approach
     to the automatic generation of VHDL code for industrial control systems applications: a case
     study of MSOGIs implementation, Mathematics and Computers in Simulation 91 (2013) 178–
     192. doi:10.1016/j.matcom.2012.07.004.
[24] P. Benáček, V. Puš, H. Kubátová, T. Čejka, P4-To-VHDL: automatic generation of high-speed
     input and output network blocks, Microprocessors and Microsystems 56 (2018)
     22–33. doi:10.1016/j.micpro.2017.10.012.
[25] X. Sun, J. Li, F. Tian, Y. Chen, J. Yang, Design of FPGA hardware based on genetic algorithm,
     in: Proceedings of the 3rd. International Conference on Computer Engineering, Information
     Science & Application Technology, ICCIA 2019, volume 90 of Advances in Computer Science
     Research, Atlantis Press, Dordrecht, 2019, pp. 102–108. doi:https:10.2991/iccia-19.2019.15.
[26] N. Attarmoghaddam, K. F. Li, A. Kanan, FPGA implementation of crossover module of genetic
     algorithm, Information 10.6 (2019) 1–11. doi:10.3390/info10060184.
[27] M. F. Torquato, M. A. C. Fernandes, High-performance parallel implementation of genetic
     algorithm on FPGA, Circuits, Systems, and Signal Processing 38 (2019) 4014–4039.
     doi:10.1007/s00034-019-01037-w.