Multicriteria optimization of hybrid convolutional neural
                                network structural synthesis using evolutionary
                                algorithms
                                Illia Boryndo1, Victor Siveglazov2 and Michael Z. Zgurovsky3
                                1
                                  National Aviation University, Kyiv, Ukraine
                                2
                                  National Aviation University, Kyiv, Ukraine
                                3
                                  National Technical University of Ukraine, Kyiv, Ukraine

                                                 Abstract
                                                 This paper defines and describes the promising architectural solutions for convolutional neural networks
                                                 and considers their key parameters for further structural and parametric synthesis. It s been proven that
                                                 for better qualitive results these networks should include functional blocks such as SRU (Spatial
                                                 Reconstruction Unit), CRU (Channel Reconstruction Unit), dense residual attention unit, etc., to the
                                                 traditional supportive layers (batch normalization layer, 1x1 convolutional layer, dropout layer, etc).
                                                 Described the advantages and disadvantages of different blocks as well as reasoning of their usage during
                                                 structural synthesis. It is proposed to use a genetic evolutionary algorithm for structural-parametric
                                                 synthesis and reviewed modern approaches. It is shown and described the process of configuring
                                                 evolutionary algorithm. Based on optimization criteria the fitness function, selection, mutation and
                                                 crossover approaches were defined. The results of experimental evolutionary process were shown and
                                                 analyzed. It is considered an example of model generated by evolutionary algorithms that is based on
                                                 using the functional blocks aggregated from different CNN architectural approaches. The performance
                                                 criteria for each model during synthesis process is calculated, including average training time shortening,
                                                 their advantages and architectural integration details. Based on the experimental results it is proven that
                                                 utilizing complex structural blocks instead of traditional layers with flexible configuration of fitness
                                                 function for both qualitive and performant criteria shows significant improvement for resulting model.

                                                 Keywords
                                                 structural-parametric synthesis; convolutional neural networks; genetic algorithm.1


                                1. Introduction
                                Nowadays, even with significant progress in computer vision and the use of advanced
                                convolutional networks, especially visual transformers [1], many image processing challenges
                                remain unresolved and need tailored solutions. This is largely because of the unique characteristics
                                of training datasets and the high computational demands of complex neural network topologies,
                                like transformers, which face hardware constraints. Most CNN architectures suffer from low
                                performance, slow learning rates, and simultaneously demand high-quality, well-balanced training
                                datasets.
                                    The natural way out of this situation is to create hybrid convolutional neural networks [2, 3].
                                However, many problems arise in this process: optimal choice of the basic topology of the
                                convolutional neural network, optimal choice of various structural blocks to be used in the process
                                of synthesizing the structure of the hybrid convolutional neural network, optimal choice of their
                                locations. From the point of view of machine learning, there are problems of selecting learning
                                criteria when we have a single-criteria or multi-criteria optimization problem, hitting a local
                                extremum, overtraining, gradient drop, pre-training, pre-training, pre-training, creating a hybrid
                                learning algorithm.
                                    This paper will outline and discuss the research findings on the application and architectural
                                characteristics of hybrid convolutional neural networks (HCCN) and various building blocks
                                essential for their synthesis. It involves studying the performance of individual components,
                                examining contemporary CNN architectures, and applying various training data to address

                                ICST-2024: Information Control Systems & Technologies, September 23-25, 2023, Odesa, Ukraine.
                                   ibo.mistle@gmail.com (I. Boryndo); svm@nau.edu.ua (V. Sineglazov); zgurovsm@hotmail.com (M. Zgurovsky);
                                   0000-0001-5375-6272 (I. Boryndo); 0000-0002-3297-9060 (V. Sineglazov); 0000-0001-5896-7466 (M. Zgurovsky);
                                           © 2024 Copyright for this paper by its authors.
                                           Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
performance challenges. The primary focus of this study is to identify the best kinds and
arrangements of current CNNs, extract operational components, and incorporate them in the
HCNN synthesis using evolutionary algorithms to attain favorable performance, accuracy
outcomes and other optimization criteria. The main goal of this paper is to develop the
evolutionary mechanism that will utilize structural components of different CNN architectures to
create model that will satisfy predefined optimization criteria.

2. Related works and existing approaches
Convolutional Neural Networks have revolutionized the field of computer vision since their
inception, offering unparalleled performance in image classification, object detection, and
segmentation. Despite their success, the quest for improved accuracy and versatility has led
researchers to explore hybrid models that combine CNNs with other neural network architectures.
These hybrid models leverage the unique strengths of different networks, addressing the
limitations of pure CNNs and enhancing their applicability across a wider range of tasks.
    For structural synthesis of HCNNs there two main components evolutionary algorithm (EA)
itself and building components to be utilized by this algorithm. Ther
of evolutionary algorithms in integration with CNNs, but most of them have their own limitations.
    Modern evolutionary algorithms are inspired by the process of natural selection and genetic
evolution. They operate on a population of potential solutions, iteratively selecting, recombining,
and mutating individuals to produce new generations of solutions. Key modern EAs include:
    Genetic Algorithms (GAs):

   •   Operate with a population of candidate solutions.
   •   Use crossover and mutation to explore new architectures.
   •   Selection mechanisms favor better-performing individuals.

   Genetic Programming (GP) [4]:

   •   Similar to GAs but operates on tree-like structures.
   •   Can evolve entire programs or architectures.

   NeuroEvolution of Augmenting Topologies (NEAT) [5]:

   •   Evolves both the weights and the architecture of neural networks.
   •   Introduces innovations by tracking genes and structures through generations.

   Evolutionary Strategies (ES) [6]:

   •   Focus on the optimization of continuous parameters.
   •   Use strategies like covariance matrix adaptation to guide the search.

   The architecture of Hybrid Convolutional Neural Networks (HCNNs) [3] integrates CNNs with
various other neural network models, capitalizing on their respective strengths to create more
powerful and versatile models. Here, we delve into the specific architectures and mechanisms of
three prominent types of HCNNs: CNN-RNN hybrids, CNN-GNN hybrids, and CNN-Transformer
hybrids.
   The architecture of Hybrid Convolutional Neural Networks exemplifies the synergy achieved by
combining the complementary strengths of different neural network models. CNNs excel at spatial
feature extraction, making them a crucial component in these hybrid architectures. When paired
with RNNs, they can model temporal sequences effectively; when combined with GNNs, they can
handle relational data adeptly; and when integrated with transformers, they can capture both local
and global dependencies in data. These hybrid architectures have demonstrated superior
performance across various domains, from video classification and image captioning to social
network analysis and molecular graph prediction, showcasing their versatility and efficacy.
   The goal of this paper is to define main criteria for CNN synthesis such as accuracy,
computational cost, model robustness, etc., analyze and extract structural blocks of modern CNN
architectures, modify the existing solution of evolutionary algorithms and apply it to synthesize
optimal HCNN architecture.

3. Problem Statement
Hybrid Convolutional Neural Networks have demonstrated significant improvements in
performance across various complex tasks by integrating the strengths of Convolutional Neural
Networks with other neural network architectures. Despite their potential, designing optimal
HCNN architectures remains a challenging task due to the vast search space of possible
configurations and the intricate balance required between different components.
    To address this, multicriteria optimization [2, 7] provides a framework for systematically
evaluating and optimizing multiple performance metrics simultaneously. Evolutionary algorithms
(EAs) are particularly well-suited for this task due to their ability to explore large, complex search
spaces and their flexibility in handling multiple objectives.
    The goal of this research is to develop and configure evolutionary algorithm for the
multicriteria optimization of CNN structural synthesis using evolutionary algorithms. The key
objectives include:
    Defining the Optimization Criteria: Establishing a set of relevant performance metrics that
reflect the quality and efficiency of HCNN architectures. These criteria typically include [8, 9]:

   •   Accuracy: The ability of the HCNN to correctly classify or predict data.
   •   Computational Efficiency: Metrics such as inference time and memory usage.
   •
   •
       more complex tasks.

   CNN Structural Design Space: Identifying and parameterizing the components and
configurations of HCNNs, including:

   •   Type and Depth of CNN Layers: The number of convolutional, pooling, and fully connected
       layers.
   •   Type and Integration Method of Hybrid Components: The choice between structural blocks
       and components used for structural synthesis, types of integration components (e.g., SRU,
       CRU, LSTM, GRU, GCN, GAT, etc.).
   •   Connectivity Patterns: How different components are connected and the flow of data
       between them.
   •   Hyperparameters: Parameters such as learning rates, batch sizes, and dropout rates.

  Develop Evolutionary Algorithm: Design and implement an evolutionary algorithm tailored for
CNN structural synthesis. This involves:

   •   Encoding CNNs: Developing an encoding scheme for representing HCNN architectures in a
       manner suitable for evolutionary operations.
   •   Fitness Function: Formulating a multi-objective fitness function that balances accuracy,
       efficiency, robustness, and scalability.
   •   Selection Mechanism: Implementing a selection process to maintain diversity and guide the
       search towards optimal solutions.
   •   Crossover and Mutation Operators: Creating operators to generate new HCNN
       architectures by recombining and modifying existing ones based on preselected structural
       blocks.
   By addressing these objectives, this research aims to significantly advance the state-of-the-art in
hybrid neural network design and optimization, providing a powerful tool for the development of
more efficient, robust, and scalable deep learning models.

4. Decomposition of modern CNN architectures and evaluation of
   their structural blocks
Currently, there is continuous advancement in the topology analysis of modern convolutional
neural networks, with new architectural solutions and applications being continuously proposed.
Due to increased applied tasks instead of iteratively increasing the complexity most modern
convolutional neural network architectures started to implement functional structural components.
These functional structural components incorporate the set of basic layers enhanced with unique
connectivity approach, combinations and functional postprocessing. Such components could
greatly increase qualitive parameters of networks and could be both used independently or as part
of more complex structure.
    For structural synthesis process of CNNs the main components are the structural blocks that
will be utilized to form the generated architecture. To define the list of suitable blocks and to label
them based on their qualitive parameters and functional means it is necessary to analyze and
define such blocks within modern CNN architectures. Most of them are well known so we will take
a look on modern corner cases instead.
    Each of these blocks possesses its own distinct conceptual framework and attributes,
necessitating their evaluation both as individual entities and as paired combinations within a single
neural network. Subsequently, it is essential to conduct performance evaluations to analyze their
intrinsic properties, effects on the overall system performance, and variations in accuracy [10].
    In our experimental study, we will examine these individual blocks as well as pairs of blocks
while employing genetic algorithms for HCNN synthesis. The designated CNN should be
extremely uncomplicated and easy to understand. Simplicity reduces outside influences and
randomness, making it easier to focus on each block's internal impact. The testing and training will
be conducted using a sample known as "CIFAR-100".
Following the implementation of genetic algorithms to produce a fundamental CNN lacking
distinct blocks, the system achieves an initial accuracy of 86.3% with a learning duration of 5.3
hours. These values will act as the baseline for future performance test comparisons. It's important
to recognize that the overall training time depends on the hardware utilized, so only the differences
in time should be taken into account.
     The fundamental test-driven CNN structure is created using a series of predefined blocks, as
per the algorithm. Next, the generated result model undergoes performance analysis [11]. By
adding structural blocks to generation process the result values are listed in Table 1.

Table 1
Result parameter comparison table of single-used blocks
Block type            Top-1            Top-5          Time (H)           GFLOPS          Diff (~)
                      Error(%)         Error(%)
Densely connected     22.80            7.8            5.88               3.8             3.2
layer
SCConv block          22.96            7.1            5.1                3.91            2.2
SCConc-A block        22.1             6.67           5.5                3.94            2.6
SE-BN-Inception       22.68            6.94           4.92               2.87            1.8
module
Convolutional block   24.66            8.34           5.42               3.71            5.9
attention module
DenRes-Att module     23.21            8.04           8.81               4.41            1.7
Inception-ResNet-V2 19.91              6.88           12                 11.2            4.12
PolyInception module 24.48             8.25           8                  3.98            2.7
Non-local Block       23.11            7.94           8.45               4.17            3.1
    When facing issues with low learning performance, several solutions can be implemented. One
effective approach is to enhance the current system's architecture by incorporating supportive
blocks. The main ones are: batch normalization layer [15]; 1x1 convolution layer; dropout layer
[12]; residual block.

   5. Proposed structural synthesis of HCNN using evolutionary
      algorithm with functional modules and optimization-based
      construction blocks
   5.1. Optimal combination and placement of structural blocks
The structural synthesis of Hybrid Convolutional Neural Networks involves the integration of
various advanced building blocks to enhance network performance, efficiency, and robustness.
This process requires a careful selection of building blocks such as Channel Boosting-Based CNNs
(CB-CNN), Squeeze-and-Excitation (SE) blocks [13], Split-Combine-Convolutions (SCConv), and
attention-based blocks. Each of these components brings unique strengths to the architecture, and
their optimal combination can significantly improve the overall capabilities of HCNNs.
    Comprehensive Integration: The optimal synthesis of HCNNs involves a strategic combination
and placement of the aforementioned building blocks to leverage their strengths synergistically.
For instance, an HCNN might start with CB-CNN layers to enhance initial feature extraction,
followed by SCConv layers to capture diverse patterns. SE blocks can be interspersed throughout
the network to recalibrate channel-wise features, while attention-based blocks can be integrated
into deeper layers to focus on important contextual information.
    Task-Specific Configuration: The choice and arrangement of these building blocks should be
tailored to the specific requirements of the task at hand. For instance, in image classification, a
configuration emphasizing CB-CNN and SE blocks might be optimal, whereas in object detection, a
combination of SCConv and attention-based blocks could provide the best performance [1, 16].
    Computational resources and efficiency should also guide the integration strategy. SCConv and
attention-based blocks can be computationally intensive, so their use should be balanced with the
overall resource budget.
    This involves:

   •
       features.
   •   Scaling Width: Increasing the number of channels in each layer to improve feature
       representation.
   •   Scaling Resolution: Using higher resolution input images to capture more detail.

   5.2. Evolutionary algorithm for structural synthesis of Hybrid Convolutional
        Neural Networks
Genetic algorithms are part of evolutionary computing, a field of artificial intelligence. They are
inspired by evolution and natural selection, where the strongest traits are passed down from
generation to generation. The multicriteria genetic algorithm (MCGA) is an
Figure 1: Algorithmic scheme of applying a multicriteria evolutionary algorithm to obtain an
optimal topology of CNN

extension of this process. It focuses on optimizing multiple objectives simultaneously. Each
solution provided by the algorithm is associated with a set of objective function values. The BCGA
optimizes these values and provides a set of Pareto-optimal solutions.
   In multi-objective optimization problems, there are several conflicting objectives that need to be
optimized. This results in a set of possible solutions, known as Pareto solutions, where no other
solution can improve all objectives simultaneously. Therefore, the goal is not to find a single
optimal solution, but to generate a set of Pareto-optimal solutions that provide a trade-off between
the conflicting objectives [2].
   Multi-criteria genetic algorithms, such as NSGA-II1, SPEA3, have demonstrated strong
performance in various engineering optimization problems. By utilizing selection, mutation, and
crossover operators in iteration, competitive individuals can be generated, drawing inspiration
from the evolutionary theory of "survival of the fittest." These individuals, who cannot surpass
each other in every aspect, form a group known as the non-dominance front. From the point of
view of physical optimization problems, in which evaluations are always computationally complex,
the population size in MOGA is usually small due to limited computing resources. On the Figure 1
presented the logical flow for application of evolutionary algorithm to structurally synthesize
optimal CNN structure. This is a high-level block-diagram and it does not reflect low level logic.
   For structural synthesis of target CNNs we consider to user modified SPEA-3[8] algorithm to
overcome the aforementioned problems.

5.2.1. Defining an Individual
In context of this paper, an individual is representing a specific CNN architecture. Based on the
criteria defined in problem statement section, the target individual considered to be encoded
directly into the string that explicitly describes the architecture. Genome structure is considered to
encapsulate the following:

   •   number of layers;
   •    types of layers/blocks (SCConv, SE-BE-Inc, Dense block, standard convolutional, pooling,
        1x1, batch normalization, etc.);
   •    kernel sizes;
   •    number of filters;
   •    stride and padding;
   •    activation functions;
   •    block-related specific parameters;
   •    learning rate, batch size, etc.

   We will use a mixed encoding scheme where each individual (genome) consists of a series of
structural blocks and hyperparameters. Each gene in the genome represents either a layer or a
block, with specific parameters encoded within it.
   The qualitive criteria on which the final individual is selected are defined and described in
details in the next section.
   Simplified example of the genome could be the following and represented in the unified JSON
format:
   [{"type": "Conv", "filters": 32, "kernel_size": 3, "stride": 1, "padding": "same", "activation": "relu"},
   {"type": "DenseBlock", "num_layers": 4, "growth_rate": 12, "bottleneck_size": 4},
   {"type": "SEBlock", "reduction_ratio": 16},
   {"type": "SCConv", "filters": 64, "kernel_size": 3, "stride": 1, "padding": "same"},
   {"type": "Pooling", "pool_size": 2, "stride": 2, "pool_type": "MaxPooling"},
   {"type": "FC", "units": 10, "activation": "softmax"}].

5.2.2. Formulating a Multi-Objective Fitness function and evaluation criteria
To quantify the performance of Hybrid Convolutional Neural Networks based on multiple
objectives, we need to construct a multi-objective fitness function that integrates various
performance metrics. These metrics typically include accuracy, computational efficiency,
robustness, and scalability. The proposed fitness function balances these aspects to provide a
comprehensive evaluation of HCNN architectures.
   Criteria that is considered for formulating fitness functions are following:

   •    Accuracy (A): The primary measure of how well the network performs on the task, such as
        classification accuracy on a validation dataset.
   •    Computational Efficiency (E): Metrics like inference time and memory usage indicate the
        efficiency of the network.
   •    Robustness (R): The network's resilience to adversarial attacks or noisy data, often
        measured by accuracy under adversarial conditions or performance degradation.
   •    Scalability (S): The ability to maintain performance when scaled to larger datasets or more
        complex tasks, often evaluated by the change in accuracy and efficiency when scaling the
        network.

   Let 𝑤𝐴 , 𝑤𝐸 , 𝑤𝑅    𝑤𝑆 be the weights assigned to each performance metric to reflect their
relative importance. The fitness function FFF is formulated as a weighted sum of the normalized
performance metrics:
                       𝐴 − 𝐴𝑚𝑖𝑛                 𝑇𝑚𝑎𝑥 − 𝑇          𝑀𝑚𝑎𝑥 − 𝑀
             𝐹 = 𝑤𝐴 ⋅               + 𝑤𝐸 ⋅ (𝛼 ⋅              +𝛽⋅                )+
                      𝐴𝑚𝑎𝑥 − 𝐴𝑚𝑖𝑛               𝑇𝑚𝑎𝑥 − 𝑇𝑚𝑖𝑛       𝑀𝑚𝑎𝑥 − 𝑀𝑚𝑖𝑛
             𝐴𝑎𝑑𝑣 −𝐴𝑎𝑑𝑣,𝑚𝑖𝑛                   Δ𝐴       −Δ𝐴    ,              Δ𝐸       −Δ𝐸
   +𝑤𝑅 ⋅ 𝐴                     + 𝑤𝑆 ⋅ (𝛾 ⋅ Δ𝐴                         + 𝛿 ⋅ Δ𝐸                  ) (1)
           𝑎𝑑𝑣,𝑚𝑎𝑥 −𝐴𝑎𝑑𝑣,𝑚𝑖𝑛                       ,    −Δ𝐴       ,               ,   −Δ𝐸   ,


                                   𝐴−𝐴𝑚𝑖𝑛
   Accuracy Normalization: 𝐴                  normalizes the accuracy 𝐴 between its minimum 𝐴𝑚𝑖𝑛 and
                                  𝑚𝑎𝑥 −𝐴𝑚𝑖𝑛
maximum 𝐴𝑚𝑎𝑥 possible values.
                                   𝑇      −𝑇
   Inference Time Normalization: 𝑇𝑚𝑎𝑥 −𝑇                     normalizes the inference time 𝑇        , where 𝑇𝑚𝑖𝑛 and
                                    𝑚𝑎𝑥        𝑚𝑖𝑛
𝑇𝑚𝑎𝑥 are the minimum and maximum inference times.

                                   𝑀𝑚𝑎𝑥 −𝑀
   Memory Usage Normalization: 𝑀                              normalizes memory usage 𝑀            , where 𝑀𝑚𝑖𝑛 and
                                       𝑚𝑎𝑥 −𝑀𝑚𝑖𝑛
𝑀𝑚𝑎𝑥 are the minimum and maximum memory usages.

        and are weights to balance the contributions of inference time and memory usage.
                              𝐴   −𝐴𝑎𝑑𝑣,𝑚𝑖𝑛
   Robustness Normalization: 𝐴 𝑎𝑑𝑣 −𝐴         normalizes the robustness metric 𝐴𝑎𝑑𝑣 , which
                                 𝑎𝑑𝑣,𝑚𝑎𝑥       𝑎𝑑𝑣,𝑚𝑖𝑛
measures accuracy under adversarial conditions.
                                           Δ𝐴                 −Δ𝐴    ,
   Accuracy Scaling Normalization: Δ𝐴                          −Δ𝐴
                                                                                 normalizes the change in accuracy
                                                         ,                   ,
(Δ𝐴     ) when scaling the network.
                                       Δ𝐸                     −Δ𝐸
   Efficiency Scaling Normalization: Δ𝐸                        −Δ𝐸
                                                                                 normalizes the change in efficiency
                                                     ,                   ,
(Δ𝐸    ) when scaling the network.           and are weights to balance the contributions of
accuracy and efficiency scaling. The weights 𝑤𝐴 , 𝑤𝐸 , 𝑤𝑅   𝑤𝑆 , as well as the sub-weights , , ,
and , are chosen based on the specific requirements and priorities of the task. For example, if
accuracy is paramount, 𝑤𝐴 should be set higher relative to the other weights. The values of these
weights could vary from 0 to 1.

5.2.3. Selection of Individual
Selection is the process of choosing individuals from the current population to serve as parents for
the next generation. The goal is to favor individuals with higher fitness scores, ensuring that
better-performing CNN architectures are more likely to propagate their genetic material. Fitness
Evaluation: Evaluate each individual's fitness using a predefined fitness function described in
previous section. Regarding the selection method, the are several ones, such as:

   •   Roulette Wheel Selection: Assign a selection probability to each individual proportional to
       its fitness. Randomly select individuals based on these probabilities.
   •   Tournament Selection: Randomly select a subset of individuals (tournament) and choose
       the best-performing one as a parent. Repeat to select the required number of parents.
   •   Rank-Based Selection: Rank individuals by fitness and assign selection probabilities based
       on rank, giving higher-ranked individuals a better chance of being selected.

   In our approach we choose the rank-based selection due to its implementation simplicity and
straight forward evaluation-based approach.

5.2.4. Defining crossover approach
Crossover combines the genetic material of two parent solutions to produce one or more offspring.
This process creates new architectures by mixing and matching structural blocks and
hyperparameters from the parents. For our implementation we decided to utilize modified multi-
point crossover. This method enhances genetic diversity and allows for a more extensive
exploration of the solution space compared to single-point crossover. In the context of CNN
structural synthesis, multi-point crossover involves combining different structural blocks and
hyperparameters from two parent CNN architectures to create new offspring architectures.
Proposed crossover process is the following [1, 17, 18]:
   1. Selection of Crossover Points:

   •   Choose multiple crossover points randomly within the genomes of the parent CNN
       architectures.
   •   The number of crossover points is typically predefined and can vary depending on the
       complexity of the problem.
   2. Splitting and Combining:

   •   Split the parent genomes at the chosen crossover points.
   •   Alternate the segments from each parent to form the offspring genomes.

   3. Reconstruction of Offspring:

   •   Reconstruct the offspring genomes by merging the alternating segments.
   •   Ensure that the resulting offspring are valid CNN architectures.

   Example:
   Parent 1: [Conv, 32, 3x3, relu]-[DenseBlock, 4, 12, 4]-[SEBlock, 16]-[SCConv, 64, 3x3]
   Parent 2: [Conv, 64, 5x5, relu]-[SCConv, 32, 3x3]-[SEBlock, 8]-[DenseBlock, 6, 16, 4]
   Offspring 1: [Conv, 32, 3x3, relu]-[SCConv, 32, 3x3]-[SEBlock, 16]-[DenseBlock, 6, 16, 4]
   Offspring 2: [Conv, 64, 5x5, relu]-[DenseBlock, 4, 12, 4]-[SEBlock, 8]-[SCConv, 64, 3x3]

5.2.5. Mutation of Individuals
Mutation introduces random changes to the genetic material of individuals to maintain diversity in
the population and explore new regions of the search space. For proposed implementations we
propose the following transformations of individuals:

   •   Parameter Mutation: Randomly alter the parameters of a structural block.

                Example: Change the number of filters in a Conv layer from 32 to 64.

   •   Block Mutation: Replace one structural block with another of the same type but with
       different parameters.

                Example: Replace [DenseBlock, 32, 3x3, relu] with [SCConv, 64, 5x5].

   •   Layer Addition/Removal: Randomly add or remove a structural block in the genome.

                Example: Add [SEBlock, 16] after a Conv layer or remove an existing DenseBlock.

   •   Swap Mutation: Swap the positions of two structural blocks.

                Example: Swap [DenseBlock, 4, 12, 4] with [SEBlock, 16].

6. Experiment
In the same way as other implementations of evolutionary algorithms, our implementation
requires a large computational capability to perform the task that makes it hard to evaluate it on
large and complex                             -                                   -100 to synthesize
set of potentially good architectures and then migrate it to larger-scale environment.

6.1. Settings and results
                                                   -                    of 32x32 color images. The
task is to receive most suitable CNN model based on criteria described in fitness function definition
section. Population size is set to 10, generations 50, mutation rate 5.5%, initial population -
randomly generated CNN architecture with a selective mix of convolutional layers, Dense blocks,
SE-blocks, SCConv blocks, attention-based modules, SRU, CRU, Batch Normalization layers, 1x1
Conv layers, Dropout layers, and Fully Connected layers. The results based on the described
evolutionary algorithm is shown in Table 2. and accuracy change dynamics is shown on Figure 2.
   Figure 2: Recognition accuracy (%) change over generations on the CIFAR-100 dataset and
example of accuracy change for each individual as parent-to-child relation

Table 2
Recognition accuracy (%) change over generations on the CIFAR-100 dataset
  Gen Max(%) Min(%) Avg(%) Med(%) Time(H) Network Structure
  01    75.13      72.53     73.83     73.61      6.31      Conv(64,5x5,relu)-
                                                            SCConv(32,3x3)-SEBlock(8)-
                                                            DenseBlock(6,16,4)-
                                                            Pooling(2,Max)-
                                                            Conv(128,3x3,relu)-Dropout(0.5)-
                                                            BatchNorm-SRU(32,3x3)-
                                                            SEBlock(16)-Pooling(2,Avg)-
                                                            FC(512,relu)-FC(100,softmax)
  02    75.65      72.75     74.2      74.48      5.93      Conv(32,3x3,relu)-
                                                            DenseBlock(4,12,4)-SEBlock(16)-
                                                            Conv(64,3x3,relu)-
                                                            Pooling(2,Max)-
                                                            Conv(128,3x3,relu)-Dropout(0.5)-
                                                            BatchNorm-CRU(64,3x3)-
                                                            SEBlock(8)-Pooling(2,Avg)-
                                                            FC(256,relu)-FC(100,softmax)
  05    76.22      74.19     75.20     75.32     6.02       Conv(64,3x3,relu)-SEBlock(16)-
                                                            SCConv(32,3x3)-
                                                            DenseBlock(4,12,4)-
                                                            Pooling(2,Max)-
                                                            Conv(128,3x3,relu)-Dropout(0.5)-
                                                            BatchNorm-DenseBlock(6,16,4)-
                                                            SEBlock(8)-Pooling(2,Avg)-
                                                            FC(256,relu)-FC(100,softmax)
  10    78.91      75.88     77.39     77.44     5.51       Conv(32,3x3,relu)-
                                                            DenseBlock(4,12,4)-SEBlock(16)-
                                                            Conv(64,3x3,relu)-
                                                            Pooling(2,Max)-
                                                            Conv(128,3x3,relu)-Dropout(0.5)-
                                                            BatchNorm-SRU(32,3x3)-
                                                            SEBlock(8)-Pooling(2,Avg)-
                                                            FC(256,relu)-FC(100,softmax)
  20    81.03      79.74     80.38     80.24     5.37       Conv(64,5x5,relu)-
                                                            SCConv(32,3x3)-SEBlock(8)-
                                                            DenseBlock(6,16,4)-
                                                            Pooling(2,Max)-
                                                            Conv(128,3x3,relu)-Dropout(0.5)-
                                                            BatchNorm-CRU(64,3x3)-
                                                            SEBlock(16)-Pooling(2,Avg)-
                                                            FC(512,relu)-FC(100,softmax)
  30    83.24      82.15     82.69     82.23     5.13       Conv(32,3x3,relu)-
                                                            DenseBlock(4,12,4)-SEBlock(16)-
                                                            Conv(64,3x3,relu)-
                                                            Pooling(2,Max)-
                                                            Conv(128,3x3,relu)-Dropout(0.5)-
                                                            BatchNorm-DenseBlock(6,16,4)-
                                                            SEBlock(8)-Pooling(2,Avg)-
                                                            FC(256,relu)-FC(100,softmax)
  50    84.91      83.69     84.05     84.13     4.87       Conv(64,3x3,relu)-SEBlock(16)-
                                                            SCConv(32,3x3)-
                                                            DenseBlock(4,12,4)-
                                                            Pooling(2,Max)-Dropout(0.5)-
                                                            BatchNorm-CRU(64,3x3)-
                                                            Pooling(2,Avg)-FC(256,relu)-
                                                            FC(100,softmax)

   The maximum error percentage decreased significantly from 37% to 16% over 50 generations.
This indicates that even the worst-performing models in the population improved significantly.
The midpoint error rate saw a substantial improvement, reflecting overall population
improvement.

7. Conclusions
The implementation of a multi-objective fitness function enabled a comprehensive evaluation of
CNN architectures. This function incorporated normalized metrics for accuracy, inference time,
memory usage, robustness under adversarial conditions, and scalability. By weighting these
metrics appropriately, we could balance their contributions and achieve a holistic assessment of
network performance
   Our experimental results shown in Table 2 and Figure 2, obtained through training on the
CIFAR-100 dataset, illustrated the significant impact of different structural blocks on HCNN
performance. By testing blocks both individually and in combination, we identified optimal
configurations that enhanced feature representation and reduced computational load.
   Based on the experimental results, we received optimal CNN architecture for recognition based
on CIFAR-100 dataset. Due to computational and time restrictions, the experiment was executed in
low-                                                                 -
continue research within more complex conditions using more advanced hardware support in
future.
   In conclusion, this research advances the state-of-the-art in hybrid neural network design and
optimization. The proposed hybrid learning algorithm and multi-objective optimization framework
offer powerful tools for developing more efficient, robust, and scalable deep learning models.

References
[1] J. Li, Y. Wen and L. He, "SCConv: Spatial and Channel Reconstruction Convolution for Feature
    Redundancy," in: 2023 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
    Vancouver, BC, Canada, 2023, pp. 6153-6162, doi: 10.1109/CVPR52729.2023.00596.
[2] S. Jiang and S. Yang, "A Strength Pareto Evolutionary Algorithm Based on Reference Direction
    for Multiobjective and Many-Objective Optimization," in: IEEE Transactions on Evolutionary
    Computation, vol. 21, no. 3, 2017, pp. 329-346. doi: 10.1109/TEVC.2016.2592479.
[3] M. Zgurovsky, V. Sineglazov, E. Chumachenko, Classification and Analysis of Multicriteria
    Optimization Methods n: Artificial Intelligence Systems Based on Hybrid Neural Networks,
    vol 904, 2021, pp.59-174, doi: 10.1007/978-3-030-48453-8_2.
[4] M. Meza-Sánchez, E. Clemente, M.C. Rodríguez-Liñán, G. Olague, "Synthetic-analytic
    behavior-based control framework: Constraining velocity in tracking for nonholonomic
    wheeled mobile robots" Information Sciences 501 (2019) 436-459, doi: 10.1016/j.ins.2019.06.025.
[5]                                                                                            , in:
    Parallel Problem Solving from Nature, Springer, Berlin, Heidelberg, vol. 4193, 2006.
    doi:10.1007/11844297_68.
[6] N. Hansen, D. V. Arnold and A.                                       8, doi: 10.1007/978-3-319-
    07124-4_13.
[7] V. Sineglazov, K. Riazanovskiy, A. Klanove

     Biol. Med. 147 (2022) 105800, doi: 10.1016/j.compbiomed.2022.105800.
[8]                                                       Optimization        International Conference
     on Learning Representations, Jan 2017, pp. 1 15.
[9] C. Nagpal and S. R. Dubey, "A Performance Evaluation of Convolutional Neural Networks for
     Face Anti Spoofing," in: International Joint Conference on Neural Networks, Budapest,
     Hungary, 2019, pp. 1-8. doi: 10.1109/IJCNN.2019.8852422.
[10] N. Shone, T. N. Ngoc, V. D. Phai and Q. Shi, "A Deep Learning Approach to Network Intrusion
     Detection," in: IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2,
     no. 1, 2018, pp. 41-50. doi: 10.1109/TETCI.2017.2772792.
[11] M. K. Yadav and K. P. Sharma, "Intrusion Detection System using Machine Learning
     Algorithms: A Comparative Study," in: 2nd International Conference on Secure Cyber
     Computing and Communications, Jalandhar, India, 2021, pp. 415-420, doi:
     10.1109/ICSCCC51823.2021.9478086.
[12] C. Cao, Y. Huang, Y. Yang, L. Wang, Z. Wang and T. Tan, "Feedback Convolutional Neural
     Network for Visual Localization and Segmentation", in: IEEE Transactions on Pattern Analysis
     and Machine Intelligence, vol. 41, no. 7, 2019, pp. 1627-1640. doi: 10.1109/TPAMI.2018.2843329.
[13] J. Hu, L. Shen, S. Albanie, G. Sun and E. Wu, "Squeeze-and-Excitation Networks" in: IEEE
     Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 8, 2020, pp. 2011-2023.
     doi: 10.1109/TPAMI.2019.2913372.
[14] F. Chollet, "Xception: Deep Learning with Depthwise Separable Convolutions," in: 2017 IEEE
     Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017,
     pp. 1800-1807. doi: 10.1109/CVPR.2017.195.
[15]
                                      37 (2015) 448-456.
[16] S. Bell, C. Zitnick, K. Bala, R. Girshick, "Inside-Outside Net: Detecting Objects in Context with
     Skip Pooling and Recurrent Neural Networks" in: 2016 IEEE Conference on Computer Vision
     and Pattern Recognition, Las Vegas, USA, 2016 pp. 2874-2883, doi: 10.1109/CVPR.2016.314.
[17] A. Newell, K. Yang, an
     ECCV (2016). doi: 10.1007/978-3-319-46484-8_29.
[18] V. Sineglazov, A. Kot, "Design of hybrid neural networks of the ensemble structure" Eastern-
     European Journal of Enterprise Technologies 77 (2022) 31-45, doi: 10.15587/1729-
     4061.2021.225301.