<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Diagnosis Based on Gene Expression Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sergii Babichev</string-name>
          <email>sergii.babichev@ujep.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ihor Liakh</string-name>
          <email>ihor.lyah@uzhnu.edu.ua</email>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vasyl Morokhovych</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrii Honcharuk</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anatolii Balanda</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleksandr Zaitsev</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Jan Evangelista Purkyne University in Usti nad Labem</institution>
          ,
          <addr-line>Pasteurova, 15, Usti nad Labem, 400 96</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kherson State University</institution>
          ,
          <addr-line>University street, 27, Kherson, 73000</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Military Academy named after Eugene Bereznyak</institution>
          ,
          <addr-line>Yria Il'enka street, 81, Kyiv, 04050</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Uzhhorod National University</institution>
          ,
          <addr-line>University street, 14, Uzhhorod, 88000</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>metrics, considering type I and II errors. Furthermore</institution>
          ,
          <addr-line>we</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Applying deep learning techniques, such as convolutional or recurrent neural networks, to process gene expression data for developing complex disease diagnostic systems is one of modern bioinformatics's current focuses. Deep learning algorithms can identify specific patterns in the hierarchical representation of data and craft distinct functions that allow for precise identification of the subjects being studied. In this paper, we present our research findings on applying a convolutional neural network (CNN) in diagnosing various types of cancer based on gene expression data. The experimental data were sourced from The Cancer Genome Atlas (TCGA) and comprised 3269 samples. These samples can be categorized into nine classes based on the type of cancer. We introduced an ordered search-by-grid algorithm to pinpoint the optimal set of hyperparameters for the CNN. We assessed the model's efficacy using classification quality introduced an integrated F1-score index, drawing from the Harrington desirability function. The obtained results demonstrate the high efficacy of our proposed approach in diagnosing cancer based on gene expression data. The simulation results have shown that the single-layer CNN is more efficient for this type of data by all classification quality criteria. The number of correctly identified samples was 955 out of 981. The classification accuracy was 97.3%. Gene expression profiles, cancer disease, Harrington desirability function, convolution neural Proceedings</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>network, classification quality criteria</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction and literature review</title>
      <p>The appropriateness of using deep learning methods for gene expression data processing is
determined by the structure of the experimental data and its large volume. Typically, experimental data
contains thousands of objects and more than ten thousand attributes. One of the primary advantages of
deep learning methods is their ability to process complex and unstructured data. Deep learning
algorithms can identify specific patterns in the hierarchical representation of data and formulate
functions that allow for high-precision identification of the objects under investigation. Another
significant advantage of models based on deep learning methods is their high accuracy and efficiency.
Moreover, models based on deep learning can formulate appropriate functions directly from raw data,
enabling the discovery of hidden patterns and intricate relationships in the data that are challenging to
uncover using traditional methods. Deep learning-based models possess a scalability feature. This
means these models can be efficiently scaled to process large volumes of data, benefiting from parallel
or distributed computing architectures, significantly accelerating training and inference processes. In</p>
      <p>2023 Copyright for this paper by its authors.
CEUR</p>
      <p>ceur-ws.org
the case of gene expression data, the correct application of deep learning methods enhances the
effectiveness of diagnostic systems for complex objects due to the higher accuracy in identifying the
studied objects on the one hand and increasing objectivity in determining the state of an object through
parallel data processing on the other hand. All the above points highlight the relevance of the current
research.</p>
      <p>At present, there are several deep learning (DL) methods that can be applied to gene expression data
to extract hidden patterns and make predictions regarding the state of the respective object [1]. Figure
1 illustrates a block diagram of the most common deep learning methods focused on processing gene
expression data and analyzing genomic sequences, as well as possible directions for their application.</p>
      <p>As can be seen from Figure 1, the main deep learning (DL) methods can be listed as follows:
1. Convolutional Neural Networks (CNN) [2-4]. They are used for analyzing gene expression data
that can be represented as a vector (one-dimensional CNNs) or as images or heat maps
(twodimensional CNNs). Among the advantages of CNNs are their ability to detect hidden
dependencies and to form a vector of useful features from genomic data. Depending on the
problem formulation, the following application directions for CNNs can be distinguished:
analysis of genomic sequences, analysis of gene expression heat maps, disease diagnosis,
identification of genetic markers, and reconstruction and modelling of gene regulatory networks
(GRN).
2. Recurrent Neural Networks (RNN) [5,6]. They are a powerful tool for analyzing and processing
gene expression data, including time series of gene expression values. Typically, when using
gene expression data, RNNs are used for solving the following tasks: time series analysis, disease
prediction, and generating new sequences.
3. Graph Convolutional Networks (GCN) [7,8]. GCN is a method for processing gene expression
data represented in the form of graphs, where genes are represented as nodes and relationships
between genes (such as expression interconnections or regulatory interactions) are depicted as
edges. In this case, the following application directions for GCNs are possible: predicting gene
functions, imputing missing gene expression values, reconstructing gene regulatory networks,
clustering, and community detection, and predicting drug responses.
4. Variational Autoencoders (VAE) [9,10]: VAEs are generative models that are capable of
discerning patterns of gene interactions based on the low-dimensional representation of gene
expression data and generating new samples with similar expression profiles. In the context of
gene expression data processing, VAEs can be applied to address the following tasks: data
dimensionality reducing, generation of new samples, the discovery of latent structures, and
interpolation and modification of expression profiles.
5. Deep Belief Network (DBN) [11,12]: DBNs consist of several layers of Restricted Boltzmann
Machines (RBMs) and can represent the distribution of gene expression data in the form of a
hierarchical structure. The potential applications of DBNs in this context may include data
dimensionality reduction, clustering and classification, generation of new samples, and
identifying the nature of interactions between genes.</p>
      <p>Within the framework of current research, the problem of improving the efficiency of diagnostic
systems of complex diseases based on gene expression data is being addressed. The solution to this
problem involves identifying co-expressed genes in the first stage and classifying objects based on the
formed subsets of gene expression profiles in the second stage. This fact limits the number of deep
learning (DL) methods that can be applied to solve the stated problem. For instance, classifying objects
based on gene expression data can be solved using convolutional or recurrent neural networks. In this
case, there is a challenge in determining the optimal network structure and hyperparameter vector that
govern the network's performance. Identifying subsets of co-expressed gene expression profiles is
possible using a deep belief network. Still, in addition to determining the optimal structure and network
hyperparameters, there's a challenge in proving its advantage compared to classic gene expression
profile clustering algorithms currently used in this domain. Graph convolutional neural networks can
also be used in classification systems. However, their application requires a gene regulatory network
reconstruction process in the preliminary stage to represent it as a graph. This, in turn, requires
identifying a subset of co-expressed gene expression profiles by applying a clustering procedure to the
gene expression data. Implementing this process is possible through model hybridization by using
different DL methods at relevant data processing stages. This requires thorough research to assess the
efficiency of the appropriate method and determine the optimal network structure and hyperparameter
vector.</p>
      <p>The choice of CNN for gene expression data processing is determined by their ability to
automatically and adaptively learn spatial hierarchies of features from input data. CNNs can capture
complex patterns and interactions between genes, aiding in tasks such as classification, clustering, and
prediction of gene functions or disease associations. This ability to learn and generalize from the data
makes CNNs a powerful tool for extracting meaningful insights from gene expression datasets,
potentially leading to new biological discoveries and advancements in personalized medicine.</p>
      <p>Numerous studies have focused on utilizing CNNs for diagnosing various objects. For instance, in
[13], the authors introduced a fault diagnosis model for rolling bearings based on a multi-dimensional
input convolutional neural network (MDI-CNN). The model presented by the authors featured multiple
input layers. This design enabled them to combine both original and processed signals, leveraging the
strengths of the CNN to automatically learn the characteristics of the original signal. This, in turn,
enhanced the recognition accuracy and anti-jamming capability. In [14], the authors shared research
findings on predicting cancer types using hybrid CNN + BiLSTM models, which analyzed microarray
gene expressions. This approach enabled them to distinguish between different forms of cancers. The
results they obtained surpassed those of existing CNN and RNN classifiers in terms of classification
quality criteria.</p>
      <p>However, it's worth noting that while there are certain advantages in this field, the challenge of
effectively implementing deep learning techniques for gene expression data remains unresolved. A
primary issue is the objective selection of hyperparameters for specific deep learning models,
considering the relevant quality criteria. In this study, we build upon previous research on gene
expression data processing [15,16] and the use of CNNs for disease diagnosis based on such data
[17,18].</p>
      <p>The primary objective of our research is to devise a method for determining the optimal list of
hyperparameters for CNN when analyzing gene expression data.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Convolutional neural network</title>
      <p>The general architecture of the multilayer CNN is depicted in Figure 2 [13]. Usually, it includes the
following main components:
1. Input layer: Accepts input data, which can be represented as a one-dimensional data vector (a
vector of gene expression values that define the state of the object) or a two-dimensional matrix
(a heatmap of gene expression values, images, etc.). Depending on the type of input data,
onedimensional (1D) or two-dimensional (2D) convolutional layers are formed.
2. Convolutional layers: Used to detect local features in the input data. Each convolutional layer
consists of a set of filters that perform the convolution operation on the input data. Convolution
is the basic operation in CNN. Typically, in the convolutional layer, the feature map of the
previous layer is a convolution using convolutional kernels, and the nonlinear activation
function creates the output feature map. The computational process in this case can be expressed
as follows [13]:
   =  ( ∑    −1 ∗</p>
      <p>
        +    )
 ∈ 
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
where:    and    −1) are the j-th and i-th features of the data at levels l and l-1 respectively;  
is the set of input feature maps (determined by the filter applied to the input data at the
corresponding convolutional level);   is the convolutional kernel connecting the i-th feature
map of input data with the j-th feature map at the convolution level l;    is the bias; f(∙) is a
nonlinear activation function, ∗ stands for the convolution operation.
3. Pooling layers: These are used to reduce the spatial dimensions of the feature vector or matrix
to decrease the number of parameters. The max pooling layer transforms the data vector or
matrix into a single value equal to the maximum value from that region.
4. Fully Connected Layers: The data is passed to the fully connected layers after several
convolutional and pooling layers. Every neuron in a fully connected layer is connected to every
neuron of the previous layer. The fully connected layers are used for classification or regression
based on the features obtained. They take the features from the flattened layers and generate an
output vector that can be presented as the model's output.
5. Activation Functions: After each convolutional layer, an activation function is applied. In most
cases, these are nonlinear, allowing the network to detect complex dependencies in the data
during the learning process. The most common activation functions are ReLU (Rectified Linear
Unit), sigmoid, and hyperbolic tangent (tanh).
6. Loss Function: It determines the difference between the predicted and expected values. The
derivatives of the loss function are used to update the weights and biases in the network during
the backpropagation of the error. This allows the model to assess its accuracy and adjust its
weights during training.
      </p>
      <p>As mentioned above, the hyperparameters of CNN determine the network's architecture and training
parameters. They are set during the initialization of the network and affect its learning and
generalization capabilities. Some of the key hyperparameters of a CNN include:
•</p>
      <p>Number of Convolutional Layers: It defines the number of layers where convolutional filters
detect features in the input data. Having more layers can help the model learn more complex
dependencies, but it can also lead to greater complexity, longer training times and overfitting of
the network. Overfitting can be determined by evaluating the convergence of accuracy values
and the loss function calculated on the training and validation data during network training.
• Size of Convolutional Filters: It determines the size (width and height) of the filters that move
over the input data to perform convolution. Larger filters can detect larger patterns but may also
lead to increased computational load.
•
•
•</p>
      <p>Number of Filters in Convolutional Layer: It determines the number of filters applied to the input
data in each convolutional layer. Each filter generates a feature map corresponding to a specific
feature. Typically, the number of filters increases with each subsequent convolutional layer.
• Size of Pooling Window: This refers to the window size (width and height) that moves across
the feature map to perform pooling operations.</p>
      <p>Activation Function: This is the function used to introduce non-linearity in the network after each
layer. Common activation functions include ReLU (Rectified Linear Unit), Sigmoid, and
Hyperbolic Tangent (tanh).</p>
      <p>Number of Fully Connected Layers: This determines the number of fully connected layers that
should be added after the convolutional and pooling layers. These layers connect every neuron
in one layer to every neuron in the next layer. They are typically used for classification or
regression based on the features extracted by the preceding layers.</p>
      <p>Within the framework of the current research, the optimal combination of CNN hyperparameters
was determined using the ordered empirical grid search method by evaluating all possible combinations
of hyperparameter values within predefined ranges.</p>
      <p>The implementation of this procedure involves the following stages:
1. Definition of the range of hyperparameter values variation that are subject to optimization.
2. Determination of the metric for evaluating the efficiency of a particular combination of
hyperparameter values during their sequential enumeration. Since the current research involved
classifying objects based on gene expression data, metrics based on the assessment of type I and
type II errors were applied [14]:
correctly identified:
•</p>
      <p>
        Classification Accuracy – determines the proportion of the total number of samples that are
• F1-score is a measure to identify the correctness of the samples distribution into the relevant class
and is calculated as the harmonic mean of precision (PR) and recall (RC):

=
=
+ 
+  
 1 =
where: precision is defined as the probability of correctly identifying samples of the relevant
class and is calculated as the ratio of correctly identified samples in the relevant class to the total
number of samples in this class:
Recall is defined as the probability of correctly identifying true positive cases of the relevant
class and is calculated as the ratio of correctly identified samples in the relevant class to the total
number of samples that really belong to that class:
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
In the hereinbefore presented formulas, TP (True Positive) and TN (True Negative) represent the
number of objects correctly classified to their respective classes, while FP (False Positive) and
FN (False Negative) represent the number of objects misclassified. It's obvious that the maximum
value of criteria (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) and (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) is equal to 1, corresponding to the perfect classification. It should be
noted that when solving a multi-class problem, criterion (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) defines the overall accuracy of
sample classification among classes, while criterion (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) defines the accuracy of sample
classification in each class individually.
3. Creation of a grid of all possible combinations of hyperparameters within the range specified in
item 1. Each cell of this grid structure represents a unique combination of the model's
hyperparameters.
4. For each combination of hyperparameters:
4.1. Construct a neural network model, the architecture and parameters of which correspond to the
current combination of hyperparameters.
4.2. Training, validation, and testing of the model.
4.3. Calculation of the quality criteria for sample identification according to formulas (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) and (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ).
5. Analysis of the values of the obtained quality criteria for sample classification. Selection of the
combination of hyperparameters that corresponds to the maximum values of the sample
classification quality criteria.
      </p>
      <p>It should be noted that a drawback of the empirical grid search method is the significant
computational time it requires. However, it ensures systematic exploration of the hyperparameter space.
It helps to choose the optimal combination for the neural network model, considering both the research
objective and the type of experimental data.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Experiment, results, and discussion</title>
      <p>The modeling process was carried out using gene expression data from patients who were studied
for various types of cancer diseases. The data is freely available in The Cancer Genome Atlas (TCGA)
[19]. Gene expression data obtained on the Illumina platform [20] was used by applying the method of
RNA molecules genomic sequencing, and for each sample, the number of respective genes determining
the state of the sample under study was identified. In the initial state, the experimental data contained
3269 samples and 19947 genes. The structure of the experimental data was the following:
•
•
•
•
•
•
•
•
•</p>
      <p>Adrenocortical carcinoma – ACC (79 samples).</p>
      <p>Glioblastoma multiforme – GB (169 samples).</p>
      <p>Sarcoma – SARC (263 samples).</p>
      <p>Lung squamous cell carcinoma – LUSC (502 samples).</p>
      <p>Lung adenocarcinoma – LUAD (541 samples).</p>
      <p>Stomach adenocarcinoma – STAD (415 samples).</p>
      <p>Kidney renal clear cell carcinoma – KIRC (542 samples).</p>
      <p>Brain Lower Grade Glioma – LGG (534 samples).</p>
      <p>Cancer is not identified – Normal (224 samples).</p>
      <p>According to the methodology presented in [17,21], in the first stage, the absolute values of the
number of genes were transformed into a range more convenient for further processing (Count Per
Million – CPM) according to the formula:

where:</p>
      <p>is the count of the s-th type of gene corresponding for the p-th sample; m is the total
number of different types of genes studied during the experiment performing.</p>
      <p>
        The implementation of this step significantly reduced the range of absolute values variation
determining the expression (activity level) of respective genes. In the second stage, data normalization
was carried out by applying the 
genes were removed according to the condition 
2(
)function to all values. In the third stage, non-expressed
2(
)≤ 0 for all samples under study. The
number of genes at this stage was reduced by 682, and the matrix of experimental gene expression data
took the form:  = (3269 × 19265). In the final stage, negative gene expression values were replaced
with zeros, representing non-expressed genes for some samples. For proper initialization of CNN filters,
the number of gene expression profiles was increased to 19,300 by supplementing with profiles with
zero expression.
stages of the neural network's operation. To determine the optimal combination of hyperparameters
within the grid search concept, a heuristic search algorithm was proposed. The heuristic functions used
were data classification accuracy across all classes (a coarse estimate) and the precision of sample
distribution across individual classes by calculating the F1-score for each class (detailed analysis).
Considering that when dealing with a large number of classes, analyzing the corresponding F1-score
values to choose the optimal alternative from the hyperparameter list can be problematic, an integrated
F1-score value was calculated based on the previously obtained values using Harrington's desirability
method, which is one of the effective methods for solving multi-criteria problems and is currently
successfully used in various fields of scientific research [22].
case, during the first step, the coefficients of the linear equation are calculated as follows:
Then, a direct transformation of F1-score values into Y values is carried out:
2.3. Calculation of private desirabilities d for each F1-score value:


=  +  ∙  1
=  +  ∙  1
 =  +  ∙  1
 = exp (−exp (− ))
(
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
(
        <xref ref-type="bibr" rid="ref8">8</xref>
        )
(
        <xref ref-type="bibr" rid="ref9">9</xref>
        )
Step 3. Calculation of the integrated F1-score value.
3.1. For each column of the matrix obtained in step 2, calculate the integrated F1-score value as the
geometric average of all private desirabilities:
 1

9
      </p>
      <p>9
= √∏  
where j denotes the corresponding column of the matrix of private desirabilities.</p>
      <p>Step 4. Analysis of the obtained results.
4.1. Creation of a diagram showing the dependency of the integrated F1-score value on the
corresponding hyperparameter values. Selection of the optimal hyperparameter value
corresponding to the maximum of the integrated F1-score.</p>
      <p>During the simulation process, the following activation functions were applied to the output layer of
neurons: softmax, softplus, softsign, and swish. For the fully connected inner layer of neurons, the elu,
gelu, linear, relu, and selu activation functions were sequentially applied. Other activation functions
applied to this layer showed unsatisfactory results. The following activation functions were applied to
the inner convolutional layer: elu, gelu, sigmoid, linear, relu, and selu. In the initial data preprocessing
stage, the data was split into two subsets in a 0.7/0.3 ratio (2288/981 samples). The first subset (2288
samples) was further divided into two subsets in a 0.8/0.2 ratio (1830/458). 1830 samples were used for
training the network, 458 for validating the model during its training, and 981 samples were used for
testing the model. The model's quality assessment was based on the analysis of the loss function value
calculated during the model's validation, the classification accuracy, and the F1-score value calculated
when applying the test data. The simulation results for determining the optimal activation function of
the neurons' output layer are shown in Figure 4.
activation function for the output layer of neurons in the neural network model (CNN)
As seen from Figure 4, the use of the softmax function allows obtaining the best classification results
for samples in terms of accuracy, which was calculated on the test data subset, and in terms of the loss
function, which</p>
      <p>was calculated on the validation data. When using the softplus function, the
classification results are slightly worse. When using other functions, the classification results are
unsatisfactory. These conclusions are confirmed by the analysis of F1-score values calculated for each
of the nine classes. Due to the clear results obtained, the diagrams showing the dependence of the
F1score values on the type of activation function used are not shown.</p>
      <p>In Figure 5, the simulation results are presented concerning determining the optimal activation
function for the CNN's neurons for the dense layer. The analysis of the simulation results allows
concluding that in terms of sample classification accuracy (Figure 5a) and loss function value (Figure
5b), the optimal activation functions are elu and selu, which to some extent does not match the results
based on the analysis of F1-score values (Figure 5c, d). The analysis of the integrated F1-score criterion
values (Figure 5d) allows concluding that the highest values of this criterion correspond to relu and
gelu methods. Slightly lower values are achieved when using the selu method. The analysis of the
distribution character of the F1-score values for individual clusters (Figure 5c) confirms this conclusion.
Thus, based on the analysis of the values of all the criteria, the selu method was determined as optimal
at this stage of research.</p>
      <p>In Figure 6, the simulation results for the selection of the optimal activation function for the neurons
of the convolutional layer are shown.</p>
      <p>As can be seen, in terms of sample classification accuracy and the integrated value of the F1-score,
the sigmoidal (sigmoid) and linear (linear) activation functions are optimal. However, in terms of the
loss function value, the sigmoidal function is more preferable.</p>
      <p>In Figures 7-10, similar results are depicted for determining other types of CNN's optimal
hyperparameters. The analysis of the obtained results suggests that in terms of the classification
accuracy of the samples (Figure 7a) and the integrated value of the F1-measure (Figure 7d), the optimal
value of the hyperparameter maximal pooling could be 2 or 3. However, in terms of the loss function
value, 2 corresponds to better results.</p>
      <p>The analysis of the diagrams, which depict the dependence of classification quality criteria on the
size of the kernel function for neurons of the dense layer (dense kernel), shown in Figure 8, indicates
that choosing the optimal kernel size based on the F1-score is problematic since the results are almost
indistinguishable for values 32, 64, 128, and 256 (Figure 8d). In terms of classification accuracy,
optimal values are 64 and 128 (Figure 8a). In terms of the loss function value, 64, in this case, appears
to be more attractive (Figure 8b).</p>
      <p>The analysis of simulation results, as shown in Figure 9, suggests that, by all quality criteria of
sample classification, the optimal kernel size for the functions of convolutional layer neurons is 3.
Analysis of the distribution diagrams of quality criteria for classification at different values of the
number of convolutional layer filters (Figure 10) indicates that according to the accuracy criterion, the
optimal options are 8 and 32 filters. The loss value when using 32 filters is slightly less. The integrated
F1-score value suggests a slightly higher attractiveness of using eight filters. In this case, a compromise
decision was made to use 32 filters, as reducing the number of filters can lead to a decrease in the
sensitivity of the CNN, which is not acceptable within the current research framework.</p>
      <p>The obtained simulation results allowed us to compile a list of optimal hyperparameters for a 1D
single-layer CNN, the values of which are presented in Table 1. The next step in the simulation process
is to compare the efficiency of a 1D single-layer, two-layer, and three-layer CNN when using the
hyperparameters determined in the previous simulation stage. On the first convolutional layer, a filter
(100 × 193) was applied to the gene expression value vector, on the second (50 × 386), and on the
third (25 × 772).</p>
      <p>The simulation results are presented in Figure 11. The training time for the model was almost the
same in all cases, approximately 83 seconds.
Figure 11a also displays the total number of samples that made up the test data subset and the number
of samples correctly identified in each case. An analysis of the results allows us to conclude that, by all
criteria, the single-layer CNN is more efficient for this type of data. The number of correctly identified
samples is 955 out of 981. The classification accuracy is 97.3%. The F1-score values, calculated for all
classes using a single-layer network, are also higher than the two- and three-layer networks. The loss
function value, computed using the validation data in this case, is also the lowest.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusions</title>
      <p>In the paper, we have presented the research results on applying a convolutional neural network
(CNN) for the classification of samples based on gene expression data. Various architectures of
onedimensional CNNs are considered. As optimization hyperparameters, the following were studied:
activation functions of output, convolutional and dense layers, the number of filters, the kernel size of
neurons of convolutional and dense layers, and max pooling. As criteria for evaluating the quality of
the corresponding model, the classification accuracy of samples (Accuracy), the loss function value
calculated on the data subset for model validation, and the F1-score, which includes errors of the first
and second kind (sensitivity and specificity) as components and is one of the effective criteria for the
quality of sample distribution into separate classes. An integrated F1-score criterion has been proposed,
the calculation of which involves applying Harrington's desirability function to partial F1-score values
calculated for individual classes. The simulation results regarding applying different CNN architectures
for the classification of gene expression data are presented. The experimental data used were gene
expression data from patients who were studied for various types of cancer and contained eight classes
of samples taken from patients with the corresponding type of cancer. The ninth group included samples
from patients in whom no cancer was detected. According to the simulation results, a single-layer CNN
demonstrated higher effectiveness across a set of quality criteria. When using the ordered grid search
algorithm, the following hyperparameters were identified as optimal: softmax activation function for
the output layer neurons, selu activation function for dense layer neurons, sigmoid activation function
for convolutional layer neurons, max pooling of 2, kernel size for dense layer neurons of 64, kernel size
for convolutional layer neurons of 3, and the number of filters in the convolutional layer of 32.</p>
      <p>Future directions for the authors' research include exploring alternative algorithms for neural
network hyperparameter optimization and incorporating other deep learning methods within the
complex object diagnosis system.</p>
    </sec>
    <sec id="sec-6">
      <title>5. References</title>
      <p>
        Audio, Speech, and Music Processing, 2023, vol. 2023(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), art. no. 24. DOI:
10.1186/s13636-02300288-5.
[10] G. Nikolentzos, M. Vazirgiannis, C. Xypolopoulos, M. Lingman, E.G. Brandt. Synthetic electronic
health records generated with variational graph autoencoders. Digital Medicine, 2023,
vol. 6(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), art. no. 83. DOI: 10.1038/s41746-023-00822-x.
[11] X. Lu, P. Li. Research on gearbox temperature field image fault diagnosis method based on transfer
learning and deep belief network. Scientific Reports, 2023, vol. 13(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), art. no. 6664.
      </p>
      <p>
        DOI: 10.1038/s41598-023-33858-w
[12] D. Ma, P. Jiang, L. Shu, Y. Qiu, Y. Zhang, S. Geng. DBN-based online identification of porosity
regions during laser welding of aluminum alloys using coherent optical diagnosis. Optics and
Laser Technology, 2023, vol. 165, art. no. 109597. DOI: 10.1016/j.optlastec.2023.109597.
[13] T. Zan, H. Wang, M. Wang, et al. Application of Multi-Dimension Input Convolutional Neural
Network in Fault Diagnosis of Rolling Bearings. Applied Sciences, 2019, vol. 9, art no. 2690. DOI:
10.3390/app9132690.
[14] P. Metipatil, P. Bhuvaneshwari, S.M. Basha, S.S. Patil. An Efficient Framework for Predicting
Cancer Type Based on Microarray Gene Expressions Using CNN-BiLSTM Technique. SN
Computer Science, 2023, vol. 4(
        <xref ref-type="bibr" rid="ref4">4</xref>
        ), art. no. 381. DOI: 10.1007/s42979-023-01774-5.
[15] S. Babichev, J. Krejci, J. Bicanek, V. Lytvynenko. Gene expression sequences clustering based on
the internal and external clustering quality criteria. Proceedings of the 12th International Scientific
and Technical Conference on Computer Sciences and Information Technologies, CSIT 2017,
2017, vol. 1, art. no. 8098744, pp. 91-94. DOI: 10.1109/STC-CSIT.2017.8098744.
[16] S. Babichev, V. Osypenko, V. Lytvynenko, M. Voronenko, M. Korobchynskyi. Comparison
Analysis of Biclustering Algorithms with the use of Artificial Data and Gene Expression Profiles.
2018 IEEE 38th International Conference on Electronics and Nanotechnology, ELNANO 2018
Proceedings, 2018, art. no. 8477439, pp. 298-304. DOI: 10.1109/ELNANO.2018.8477439
[17] S. Babichev, L. Yasinska-Damri, I. Liakh, J. Škvor. Hybrid Inductive Model of Differentially and
Co-Expressed Gene Expression Profile Extraction Based on the Joint Use of Clustering Technique
and Convolutional Neural Network. Applied Sciences (Switzerland), 2022, vol. 12(22), art.
no. 11795, DOI: 10.3390/app122211795.
[18] L. Yasinska-Damri, S. Babichev, B. Durnyak, T. Goncharenko. Application of Convolutional
Neural Network for Gene Expression Data Classification. Lecture Notes on Data Engineering and
Communications Technologies, 2023, vol. 149, pp. 3-24. DOI: 10.1007/978-3-031-16203-9_1.
[19] The Cancer Genome Atlas Program (TCGA). National Cancer Institute. Center for Cancer
Genomics. URL:
https://www.cancer.gov/about-nci/organization/ccg/research/structuralgenomics/tcga
[20] Illumina. URL: https://www.illumina.com/
[21] T. Girke. R &amp; Bioconductor Manual. Institute for Integrative Genome Biology. URL:
http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual
[22] V. Koilo. Financial performance under stress: the case of the Norwegian maritime cluster. Public
and Municipal Finance, 2019, vol. 8(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), pp. 54-72. DOI: 10.21511/pmf.08(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ).2019.05
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.M.</given-names>
            <surname>Nikolados</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.A.</given-names>
            <surname>Oyarzún</surname>
          </string-name>
          .
          <article-title>Deep learning for optimization of protein expression</article-title>
          .
          <source>Current Opinion in Biotechnology</source>
          ,
          <year>2023</year>
          , vol.
          <volume>81</volume>
          , art. no.
          <issue>102941</issue>
          . DOI:
          <volume>10</volume>
          .1016/j.copbio.
          <year>2023</year>
          .102941
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Mustafa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.K.</given-names>
            <surname>Jadoon</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>Khaliq-uz-</article-title>
          <string-name>
            <surname>Zaman</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          <string-name>
            <surname>Humayun</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maray</surname>
          </string-name>
          .
          <article-title>An Ensembled Framework for Human Breast Cancer Survivability Prediction Using Deep Learning</article-title>
          . Diagnostics, vol.
          <volume>13</volume>
          (
          <issue>10</issue>
          ), art. no.
          <issue>1688</issue>
          . DOI:
          <volume>10</volume>
          .3390/diagnostics13101688.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zuo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <article-title>Gene Regulatory Network Inference Using Convolutional Neural Networks from scRNA-seq Data</article-title>
          .
          <source>Journal of Computational Biology</source>
          ,
          <year>2023</year>
          , vol.
          <volume>30</volume>
          (
          <issue>5</issue>
          ), pp.
          <fpage>619</fpage>
          -
          <lpage>631</lpage>
          . DOI:
          <volume>10</volume>
          .1089/cmb.
          <year>2022</year>
          .
          <volume>0355</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.P.S.</given-names>
            <surname>Chauhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.K.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          .
          <article-title>Prediction of Enhancers in DNA Sequence Data using a Hybrid CNN-DLSTM Model</article-title>
          .
          <source>IEEE/ACM Transactions on Computational Biology and Bioinformatics</source>
          ,
          <year>2023</year>
          , vol.
          <volume>20</volume>
          (
          <issue>2</issue>
          ), pp.
          <fpage>1327</fpage>
          -
          <lpage>1336</lpage>
          . DOI:
          <volume>10</volume>
          .1109/TCBB.
          <year>2022</year>
          .
          <volume>3167090</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.P.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vijayabaskar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Murali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ramaswamy</surname>
          </string-name>
          .
          <article-title>Design of optimal Elman Recurrent Neural Network based prediction approach for biofuel production</article-title>
          .
          <source>Scientific Reports</source>
          ,
          <year>2023</year>
          , vol.
          <volume>13</volume>
          (
          <issue>1</issue>
          ), art. no.
          <issue>8565</issue>
          . DOI:
          <volume>10</volume>
          .1038/s41598-023-34764-x.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Mauro</surname>
          </string-name>
          , K. LeShane, D. Densmore.
          <article-title>ICOR: improving codon optimization with recurrent neural networks</article-title>
          .
          <source>BMC Bioinformatics</source>
          ,
          <year>2023</year>
          , vol.
          <volume>24</volume>
          (
          <issue>1</issue>
          ), art.
          <source>no. 132. DOI: 10.1186/s12859-023-05246-8.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mo</surname>
          </string-name>
          .
          <article-title>Retention time prediction for chromatographic enantioseparation by quantile geometry-enhanced graph neural network</article-title>
          .
          <source>Nature Communications</source>
          ,
          <year>2023</year>
          , vol.
          <volume>14</volume>
          (
          <issue>1</issue>
          ), art.
          <source>no. 3095. DOI: 10.1038/s41467-023-38853-3.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Du</surname>
          </string-name>
          , et al.
          <article-title>Chemistry-intuitive explanation of graph neural networks for molecular property prediction with substructure masking</article-title>
          .
          <source>Nature Communications</source>
          ,
          <year>2023</year>
          , vol.
          <volume>14</volume>
          (
          <issue>1</issue>
          ), art.
          <source>no. 2585. DOI: 10.1038/s41467-023-38192-3.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Comanducci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gioiosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zanoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Antonacci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sarti</surname>
          </string-name>
          .
          <article-title>Variational Autoencoders for chord sequence generation conditioned on Western harmonic music complexity</article-title>
          .
          <source>Eurasip Journal on</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>