Applying Convolutional Neural Network for Cancer Disease Diagnosis Based on Gene Expression Data Sergii Babicheva,b, Ihor Liakhc, Vasyl Morokhovychc, Andrii Honcharukd, Anatolii Balandad, Oleksandr Zaitsevd a Kherson State University, University street, 27, Kherson, 73000, Ukraine b Jan Evangelista Purkyne University in Usti nad Labem, Pasteurova, 15, Usti nad Labem, 400 96, Czech Republic c Uzhhorod National University, University street, 14, Uzhhorod, 88000, Ukraine d Military Academy named after Eugene Bereznyak, Yria Il’enka street, 81, Kyiv, 04050, Ukraine Abstract Applying deep learning techniques, such as convolutional or recurrent neural networks, to process gene expression data for developing complex disease diagnostic systems is one of modern bioinformatics's current focuses. Deep learning algorithms can identify specific patterns in the hierarchical representation of data and craft distinct functions that allow for precise identification of the subjects being studied. In this paper, we present our research findings on applying a convolutional neural network (CNN) in diagnosing various types of cancer based on gene expression data. The experimental data were sourced from The Cancer Genome Atlas (TCGA) and comprised 3269 samples. These samples can be categorized into nine classes based on the type of cancer. We introduced an ordered search-by-grid algorithm to pinpoint the optimal set of hyperparameters for the CNN. We assessed the model's efficacy using classification quality metrics, considering type I and II errors. Furthermore, we introduced an integrated F1-score index, drawing from the Harrington desirability function. The obtained results demonstrate the high efficacy of our proposed approach in diagnosing cancer based on gene expression data. The simulation results have shown that the single-layer CNN is more efficient for this type of data by all classification quality criteria. The number of correctly identified samples was 955 out of 981. The classification accuracy was 97.3%. Keywords 1 Gene expression profiles, cancer disease, Harrington desirability function, convolution neural network, classification quality criteria 1. Introduction and literature review The appropriateness of using deep learning methods for gene expression data processing is determined by the structure of the experimental data and its large volume. Typically, experimental data contains thousands of objects and more than ten thousand attributes. One of the primary advantages of deep learning methods is their ability to process complex and unstructured data. Deep learning algorithms can identify specific patterns in the hierarchical representation of data and formulate functions that allow for high-precision identification of the objects under investigation. Another significant advantage of models based on deep learning methods is their high accuracy and efficiency. Moreover, models based on deep learning can formulate appropriate functions directly from raw data, enabling the discovery of hidden patterns and intricate relationships in the data that are challenging to uncover using traditional methods. Deep learning-based models possess a scalability feature. This means these models can be efficiently scaled to process large volumes of data, benefiting from parallel or distributed computing architectures, significantly accelerating training and inference processes. In IDDM’2023: 6th International Conference on Informatics & Data-Driven Medicine, November, 17–19, 2023, Bratislava, Slovakia EMAIL: sergii.babichev@ujep.cz (A 1); ihor.lyah@uzhnu.edu.ua (A 2); vs.mor75@gmail.com (A 3); papandreas1972@gmail.com (A 4); anatol_ssu@ukr.net (A 5); a.zaysev@gmail.com (A 6) ORCID: 0000-0001-6797-1467 (A 1); 0000-0001-5417-9403 (A 2); 0000-0002-4939-6566 (A 3); 0009-0001-6190-958X (A 4); 0000-0002- 3047-3090 (A 5); 0000-0003-2475-3800 (A 6) ©️ 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings the case of gene expression data, the correct application of deep learning methods enhances the effectiveness of diagnostic systems for complex objects due to the higher accuracy in identifying the studied objects on the one hand and increasing objectivity in determining the state of an object through parallel data processing on the other hand. All the above points highlight the relevance of the current research. At present, there are several deep learning (DL) methods that can be applied to gene expression data to extract hidden patterns and make predictions regarding the state of the respective object [1]. Figure 1 illustrates a block diagram of the most common deep learning methods focused on processing gene expression data and analyzing genomic sequences, as well as possible directions for their application. Figure 1: Block diagram of existing deep learning methods and their application directions for analyzing gene expression data and genomic sequences As can be seen from Figure 1, the main deep learning (DL) methods can be listed as follows: 1. Convolutional Neural Networks (CNN) [2-4]. They are used for analyzing gene expression data that can be represented as a vector (one-dimensional CNNs) or as images or heat maps (two- dimensional CNNs). Among the advantages of CNNs are their ability to detect hidden dependencies and to form a vector of useful features from genomic data. Depending on the problem formulation, the following application directions for CNNs can be distinguished: analysis of genomic sequences, analysis of gene expression heat maps, disease diagnosis, identification of genetic markers, and reconstruction and modelling of gene regulatory networks (GRN). 2. Recurrent Neural Networks (RNN) [5,6]. They are a powerful tool for analyzing and processing gene expression data, including time series of gene expression values. Typically, when using gene expression data, RNNs are used for solving the following tasks: time series analysis, disease prediction, and generating new sequences. 3. Graph Convolutional Networks (GCN) [7,8]. GCN is a method for processing gene expression data represented in the form of graphs, where genes are represented as nodes and relationships between genes (such as expression interconnections or regulatory interactions) are depicted as edges. In this case, the following application directions for GCNs are possible: predicting gene functions, imputing missing gene expression values, reconstructing gene regulatory networks, clustering, and community detection, and predicting drug responses. 4. Variational Autoencoders (VAE) [9,10]: VAEs are generative models that are capable of discerning patterns of gene interactions based on the low-dimensional representation of gene expression data and generating new samples with similar expression profiles. In the context of gene expression data processing, VAEs can be applied to address the following tasks: data dimensionality reducing, generation of new samples, the discovery of latent structures, and interpolation and modification of expression profiles. 5. Deep Belief Network (DBN) [11,12]: DBNs consist of several layers of Restricted Boltzmann Machines (RBMs) and can represent the distribution of gene expression data in the form of a hierarchical structure. The potential applications of DBNs in this context may include data dimensionality reduction, clustering and classification, generation of new samples, and identifying the nature of interactions between genes. Within the framework of current research, the problem of improving the efficiency of diagnostic systems of complex diseases based on gene expression data is being addressed. The solution to this problem involves identifying co-expressed genes in the first stage and classifying objects based on the formed subsets of gene expression profiles in the second stage. This fact limits the number of deep learning (DL) methods that can be applied to solve the stated problem. For instance, classifying objects based on gene expression data can be solved using convolutional or recurrent neural networks. In this case, there is a challenge in determining the optimal network structure and hyperparameter vector that govern the network's performance. Identifying subsets of co-expressed gene expression profiles is possible using a deep belief network. Still, in addition to determining the optimal structure and network hyperparameters, there's a challenge in proving its advantage compared to classic gene expression profile clustering algorithms currently used in this domain. Graph convolutional neural networks can also be used in classification systems. However, their application requires a gene regulatory network reconstruction process in the preliminary stage to represent it as a graph. This, in turn, requires identifying a subset of co-expressed gene expression profiles by applying a clustering procedure to the gene expression data. Implementing this process is possible through model hybridization by using different DL methods at relevant data processing stages. This requires thorough research to assess the efficiency of the appropriate method and determine the optimal network structure and hyperparameter vector. The choice of CNN for gene expression data processing is determined by their ability to automatically and adaptively learn spatial hierarchies of features from input data. CNNs can capture complex patterns and interactions between genes, aiding in tasks such as classification, clustering, and prediction of gene functions or disease associations. This ability to learn and generalize from the data makes CNNs a powerful tool for extracting meaningful insights from gene expression datasets, potentially leading to new biological discoveries and advancements in personalized medicine. Numerous studies have focused on utilizing CNNs for diagnosing various objects. For instance, in [13], the authors introduced a fault diagnosis model for rolling bearings based on a multi-dimensional input convolutional neural network (MDI-CNN). The model presented by the authors featured multiple input layers. This design enabled them to combine both original and processed signals, leveraging the strengths of the CNN to automatically learn the characteristics of the original signal. This, in turn, enhanced the recognition accuracy and anti-jamming capability. In [14], the authors shared research findings on predicting cancer types using hybrid CNN + BiLSTM models, which analyzed microarray gene expressions. This approach enabled them to distinguish between different forms of cancers. The results they obtained surpassed those of existing CNN and RNN classifiers in terms of classification quality criteria. However, it's worth noting that while there are certain advantages in this field, the challenge of effectively implementing deep learning techniques for gene expression data remains unresolved. A primary issue is the objective selection of hyperparameters for specific deep learning models, considering the relevant quality criteria. In this study, we build upon previous research on gene expression data processing [15,16] and the use of CNNs for disease diagnosis based on such data [17,18]. The primary objective of our research is to devise a method for determining the optimal list of hyperparameters for CNN when analyzing gene expression data. 2. Convolutional neural network The general architecture of the multilayer CNN is depicted in Figure 2 [13]. Usually, it includes the following main components: 1. Input layer: Accepts input data, which can be represented as a one-dimensional data vector (a vector of gene expression values that define the state of the object) or a two-dimensional matrix (a heatmap of gene expression values, images, etc.). Depending on the type of input data, one- dimensional (1D) or two-dimensional (2D) convolutional layers are formed. 2. Convolutional layers: Used to detect local features in the input data. Each convolutional layer consists of a set of filters that perform the convolution operation on the input data. Convolution is the basic operation in CNN. Typically, in the convolutional layer, the feature map of the previous layer is a convolution using convolutional kernels, and the nonlinear activation function creates the output feature map. The computational process in this case can be expressed as follows [13]: 𝑋𝑗𝑙 = 𝑓 ( ∑ 𝑋𝑖𝑙−1 ∗ 𝜔𝑖𝑗 𝑙 + 𝑏𝑗𝑙 ) (1) 𝑖∈𝑀𝑖 where: 𝑋𝑗𝑙 and 𝑋𝑖𝑙−1 ) are the j-th and i-th features of the data at levels l and l-1 respectively; 𝑀𝑖 is the set of input feature maps (determined by the filter applied to the input data at the 𝑙 corresponding convolutional level); 𝜔𝑖𝑗 is the convolutional kernel connecting the i-th feature map of input data with the j-th feature map at the convolution level l; 𝑏𝑗𝑙 is the bias; f(∙) is a nonlinear activation function, ∗ stands for the convolution operation. Figure 2: The general architecture of the multilayer convolutional neural network 3. Pooling layers: These are used to reduce the spatial dimensions of the feature vector or matrix to decrease the number of parameters. The max pooling layer transforms the data vector or matrix into a single value equal to the maximum value from that region. 4. Fully Connected Layers: The data is passed to the fully connected layers after several convolutional and pooling layers. Every neuron in a fully connected layer is connected to every neuron of the previous layer. The fully connected layers are used for classification or regression based on the features obtained. They take the features from the flattened layers and generate an output vector that can be presented as the model's output. 5. Activation Functions: After each convolutional layer, an activation function is applied. In most cases, these are nonlinear, allowing the network to detect complex dependencies in the data during the learning process. The most common activation functions are ReLU (Rectified Linear Unit), sigmoid, and hyperbolic tangent (tanh). 6. Loss Function: It determines the difference between the predicted and expected values. The derivatives of the loss function are used to update the weights and biases in the network during the backpropagation of the error. This allows the model to assess its accuracy and adjust its weights during training. As mentioned above, the hyperparameters of CNN determine the network's architecture and training parameters. They are set during the initialization of the network and affect its learning and generalization capabilities. Some of the key hyperparameters of a CNN include: • Number of Convolutional Layers: It defines the number of layers where convolutional filters detect features in the input data. Having more layers can help the model learn more complex dependencies, but it can also lead to greater complexity, longer training times and overfitting of the network. Overfitting can be determined by evaluating the convergence of accuracy values and the loss function calculated on the training and validation data during network training. • Size of Convolutional Filters: It determines the size (width and height) of the filters that move over the input data to perform convolution. Larger filters can detect larger patterns but may also lead to increased computational load. • Number of Filters in Convolutional Layer: It determines the number of filters applied to the input data in each convolutional layer. Each filter generates a feature map corresponding to a specific feature. Typically, the number of filters increases with each subsequent convolutional layer. • Size of Pooling Window: This refers to the window size (width and height) that moves across the feature map to perform pooling operations. • Activation Function: This is the function used to introduce non-linearity in the network after each layer. Common activation functions include ReLU (Rectified Linear Unit), Sigmoid, and Hyperbolic Tangent (tanh). • Number of Fully Connected Layers: This determines the number of fully connected layers that should be added after the convolutional and pooling layers. These layers connect every neuron in one layer to every neuron in the next layer. They are typically used for classification or regression based on the features extracted by the preceding layers. Within the framework of the current research, the optimal combination of CNN hyperparameters was determined using the ordered empirical grid search method by evaluating all possible combinations of hyperparameter values within predefined ranges. The implementation of this procedure involves the following stages: 1. Definition of the range of hyperparameter values variation that are subject to optimization. 2. Determination of the metric for evaluating the efficiency of a particular combination of hyperparameter values during their sequential enumeration. Since the current research involved classifying objects based on gene expression data, metrics based on the assessment of type I and type II errors were applied [14]: • Classification Accuracy – determines the proportion of the total number of samples that are correctly identified: 𝑇𝑃 + 𝑇𝑁 𝐴𝐶𝐶 = (2) 𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁 • F1-score is a measure to identify the correctness of the samples distribution into the relevant class and is calculated as the harmonic mean of precision (PR) and recall (RC): 2 ∙ 𝑃𝑅 ∙ 𝑅𝐶 𝐹1 = (3) 𝑃𝑅 + 𝑅𝐶 where: precision is defined as the probability of correctly identifying samples of the relevant class and is calculated as the ratio of correctly identified samples in the relevant class to the total number of samples in this class: 𝑇𝑃 𝑃𝑅 = (4) 𝑇𝑃 + 𝐹𝑃 Recall is defined as the probability of correctly identifying true positive cases of the relevant class and is calculated as the ratio of correctly identified samples in the relevant class to the total number of samples that really belong to that class: 𝑇𝑃 𝑅𝐶 = (5) 𝑇𝑃 + 𝐹𝑁 In the hereinbefore presented formulas, TP (True Positive) and TN (True Negative) represent the number of objects correctly classified to their respective classes, while FP (False Positive) and FN (False Negative) represent the number of objects misclassified. It's obvious that the maximum value of criteria (2) and (3) is equal to 1, corresponding to the perfect classification. It should be noted that when solving a multi-class problem, criterion (2) defines the overall accuracy of sample classification among classes, while criterion (3) defines the accuracy of sample classification in each class individually. 3. Creation of a grid of all possible combinations of hyperparameters within the range specified in item 1. Each cell of this grid structure represents a unique combination of the model's hyperparameters. 4. For each combination of hyperparameters: 4.1. Construct a neural network model, the architecture and parameters of which correspond to the current combination of hyperparameters. 4.2. Training, validation, and testing of the model. 4.3. Calculation of the quality criteria for sample identification according to formulas (2) and (3). 5. Analysis of the values of the obtained quality criteria for sample classification. Selection of the combination of hyperparameters that corresponds to the maximum values of the sample classification quality criteria. It should be noted that a drawback of the empirical grid search method is the significant computational time it requires. However, it ensures systematic exploration of the hyperparameter space. It helps to choose the optimal combination for the neural network model, considering both the research objective and the type of experimental data. 3. Experiment, results, and discussion The modeling process was carried out using gene expression data from patients who were studied for various types of cancer diseases. The data is freely available in The Cancer Genome Atlas (TCGA) [19]. Gene expression data obtained on the Illumina platform [20] was used by applying the method of RNA molecules genomic sequencing, and for each sample, the number of respective genes determining the state of the sample under study was identified. In the initial state, the experimental data contained 3269 samples and 19947 genes. The structure of the experimental data was the following: • Adrenocortical carcinoma – ACC (79 samples). • Glioblastoma multiforme – GB (169 samples). • Sarcoma – SARC (263 samples). • Lung squamous cell carcinoma – LUSC (502 samples). • Lung adenocarcinoma – LUAD (541 samples). • Stomach adenocarcinoma – STAD (415 samples). • Kidney renal clear cell carcinoma – KIRC (542 samples). • Brain Lower Grade Glioma – LGG (534 samples). • Cancer is not identified – Normal (224 samples). According to the methodology presented in [17,21], in the first stage, the absolute values of the number of genes were transformed into a range more convenient for further processing (Count Per Million – CPM) according to the formula: 𝑐𝑜𝑢𝑛𝑡𝑝𝑠 𝐶𝑃𝑀𝑝𝑠 = 𝑚 ⋅ 106 (6) ∑𝑠=1 𝑐𝑜𝑢𝑛𝑡𝑝𝑠 where: 𝑐𝑜𝑢𝑛𝑡𝑝𝑠 is the count of the s-th type of gene corresponding for the p-th sample; m is the total number of different types of genes studied during the experiment performing. The implementation of this step significantly reduced the range of absolute values variation determining the expression (activity level) of respective genes. In the second stage, data normalization was carried out by applying the 𝑙𝑜𝑔2 (𝐶𝑃𝑀) function to all values. In the third stage, non-expressed genes were removed according to the condition 𝑙𝑜𝑔2 (𝐶𝑃𝑀) ≤ 0 for all samples under study. The number of genes at this stage was reduced by 682, and the matrix of experimental gene expression data took the form: 𝐸 = (3269 × 19265). In the final stage, negative gene expression values were replaced with zeros, representing non-expressed genes for some samples. For proper initialization of CNN filters, the number of gene expression profiles was increased to 19,300 by supplementing with profiles with zero expression. Figure 3 depicts the flowchart of a 1-D single-layer CNN with relevant hyperparameters at different stages of the neural network's operation. To determine the optimal combination of hyperparameters within the grid search concept, a heuristic search algorithm was proposed. The heuristic functions used were data classification accuracy across all classes (a coarse estimate) and the precision of sample distribution across individual classes by calculating the F1-score for each class (detailed analysis). Considering that when dealing with a large number of classes, analyzing the corresponding F1-score values to choose the optimal alternative from the hyperparameter list can be problematic, an integrated F1-score value was calculated based on the previously obtained values using Harrington's desirability method, which is one of the effective methods for solving multi-criteria problems and is currently successfully used in various fields of scientific research [22]. Figure 3: Flowchart of a 1-D single-layer CNN for determining the optimal hyperparameter vector of the neural network The algorithm for implementing this procedure involves the following steps: Step 1: Initialization. 1.1. Representation of F1-score values in the form of a matrix where rows are classes and columns are hyperparameter values, the combination of which is being studied at this stage. Step 2. Calculation of private desirabilities. 2.1. Determination of the minimum and maximum values of the F1-score at the corresponding stage of the CNN operation (when applying the corresponding combination of hyperparameters). 2.2. Transformation of F1-score value scales into a linear scale of the dimensionless indicator Y, considering the boundary values of the F1-score determined in the previous step (the value of the parameter Y according to the desirability method varies in the range from -2 to 5). In this case, during the first step, the coefficients of the linear equation are calculated as follows: 𝑌𝑚𝑖𝑛 = 𝑎 + 𝑏 ∙ 𝐹1𝑚𝑖𝑛 (7) 𝑌𝑚𝑎𝑥 = 𝑎 + 𝑏 ∙ 𝐹1𝑚𝑎𝑥 Then, a direct transformation of F1-score values into Y values is carried out: 𝑌 = 𝑎 + 𝑏 ∙ 𝐹1 (8) 2.3. Calculation of private desirabilities d for each F1-score value: 𝑑 = exp (−exp (−𝑌)) (9) Step 3. Calculation of the integrated F1-score value. 3.1. For each column of the matrix obtained in step 2, calculate the integrated F1-score value as the geometric average of all private desirabilities: 9 9 𝑗 (10) 𝐹1𝑖𝑛𝑡 = √∏ 𝑑𝑖𝑗 𝑖=1 where j denotes the corresponding column of the matrix of private desirabilities. Step 4. Analysis of the obtained results. 4.1. Creation of a diagram showing the dependency of the integrated F1-score value on the corresponding hyperparameter values. Selection of the optimal hyperparameter value corresponding to the maximum of the integrated F1-score. During the simulation process, the following activation functions were applied to the output layer of neurons: softmax, softplus, softsign, and swish. For the fully connected inner layer of neurons, the elu, gelu, linear, relu, and selu activation functions were sequentially applied. Other activation functions applied to this layer showed unsatisfactory results. The following activation functions were applied to the inner convolutional layer: elu, gelu, sigmoid, linear, relu, and selu. In the initial data preprocessing stage, the data was split into two subsets in a 0.7/0.3 ratio (2288/981 samples). The first subset (2288 samples) was further divided into two subsets in a 0.8/0.2 ratio (1830/458). 1830 samples were used for training the network, 458 for validating the model during its training, and 981 samples were used for testing the model. The model's quality assessment was based on the analysis of the loss function value calculated during the model's validation, the classification accuracy, and the F1-score value calculated when applying the test data. The simulation results for determining the optimal activation function of the neurons' output layer are shown in Figure 4. Figure 4: Distribution diagrams of classification quality criteria when determining the optimal activation function for the output layer of neurons in the neural network model (CNN) As seen from Figure 4, the use of the softmax function allows obtaining the best classification results for samples in terms of accuracy, which was calculated on the test data subset, and in terms of the loss function, which was calculated on the validation data. When using the softplus function, the classification results are slightly worse. When using other functions, the classification results are unsatisfactory. These conclusions are confirmed by the analysis of F1-score values calculated for each of the nine classes. Due to the clear results obtained, the diagrams showing the dependence of the F1- score values on the type of activation function used are not shown. In Figure 5, the simulation results are presented concerning determining the optimal activation function for the CNN's neurons for the dense layer. The analysis of the simulation results allows concluding that in terms of sample classification accuracy (Figure 5a) and loss function value (Figure 5b), the optimal activation functions are elu and selu, which to some extent does not match the results based on the analysis of F1-score values (Figure 5c, d). The analysis of the integrated F1-score criterion values (Figure 5d) allows concluding that the highest values of this criterion correspond to relu and gelu methods. Slightly lower values are achieved when using the selu method. The analysis of the distribution character of the F1-score values for individual clusters (Figure 5c) confirms this conclusion. Thus, based on the analysis of the values of all the criteria, the selu method was determined as optimal at this stage of research. In Figure 6, the simulation results for the selection of the optimal activation function for the neurons of the convolutional layer are shown. Figure 5: Simulation results regarding determining the optimal activation function for the CNN's dense layer neurons Figure 6: Simulation results for determining the optimal activation function for the neurons of the convolutional layer of the CNN model As can be seen, in terms of sample classification accuracy and the integrated value of the F1-score, the sigmoidal (sigmoid) and linear (linear) activation functions are optimal. However, in terms of the loss function value, the sigmoidal function is more preferable. In Figures 7-10, similar results are depicted for determining other types of CNN's optimal hyperparameters. The analysis of the obtained results suggests that in terms of the classification accuracy of the samples (Figure 7a) and the integrated value of the F1-measure (Figure 7d), the optimal value of the hyperparameter maximal pooling could be 2 or 3. However, in terms of the loss function value, 2 corresponds to better results. Figure 7: Results of simulation to determine the optimal value of maximal pooling for neurons of the convolutional layer Figure 8: Results of simulation to determine the optimal kernel size for the dense layer neurons (dense kernel) The analysis of the diagrams, which depict the dependence of classification quality criteria on the size of the kernel function for neurons of the dense layer (dense kernel), shown in Figure 8, indicates that choosing the optimal kernel size based on the F1-score is problematic since the results are almost indistinguishable for values 32, 64, 128, and 256 (Figure 8d). In terms of classification accuracy, optimal values are 64 and 128 (Figure 8a). In terms of the loss function value, 64, in this case, appears to be more attractive (Figure 8b). Figure 9: Results of simulation to determine the optimal kernel size for the convolutional layer neurons (kernel size) Figure 10: Results of simulation to determine the optimal number of filters for the convolutional layer neurons The analysis of simulation results, as shown in Figure 9, suggests that, by all quality criteria of sample classification, the optimal kernel size for the functions of convolutional layer neurons is 3. Analysis of the distribution diagrams of quality criteria for classification at different values of the number of convolutional layer filters (Figure 10) indicates that according to the accuracy criterion, the optimal options are 8 and 32 filters. The loss value when using 32 filters is slightly less. The integrated F1-score value suggests a slightly higher attractiveness of using eight filters. In this case, a compromise decision was made to use 32 filters, as reducing the number of filters can lead to a decrease in the sensitivity of the CNN, which is not acceptable within the current research framework. The obtained simulation results allowed us to compile a list of optimal hyperparameters for a 1D single-layer CNN, the values of which are presented in Table 1. The next step in the simulation process is to compare the efficiency of a 1D single-layer, two-layer, and three-layer CNN when using the hyperparameters determined in the previous simulation stage. On the first convolutional layer, a filter (100 × 193) was applied to the gene expression value vector, on the second (50 × 386), and on the third (25 × 772). Table 1 Optimal hyperparameters values for a 1D single-layer CNN Number of Kernel size Dense Maximal Activation Activation Activation filters kernel pooling function of function of function of convolutional dense layer output layer layer 32 3 64 2 sigmoid selu softmax The simulation results are presented in Figure 11. The training time for the model was almost the same in all cases, approximately 83 seconds. Figure 11: Simulation results for determining the number of convolutional layers in 1D CNN Figure 11a also displays the total number of samples that made up the test data subset and the number of samples correctly identified in each case. An analysis of the results allows us to conclude that, by all criteria, the single-layer CNN is more efficient for this type of data. The number of correctly identified samples is 955 out of 981. The classification accuracy is 97.3%. The F1-score values, calculated for all classes using a single-layer network, are also higher than the two- and three-layer networks. The loss function value, computed using the validation data in this case, is also the lowest. 4. Conclusions In the paper, we have presented the research results on applying a convolutional neural network (CNN) for the classification of samples based on gene expression data. Various architectures of one- dimensional CNNs are considered. As optimization hyperparameters, the following were studied: activation functions of output, convolutional and dense layers, the number of filters, the kernel size of neurons of convolutional and dense layers, and max pooling. As criteria for evaluating the quality of the corresponding model, the classification accuracy of samples (Accuracy), the loss function value calculated on the data subset for model validation, and the F1-score, which includes errors of the first and second kind (sensitivity and specificity) as components and is one of the effective criteria for the quality of sample distribution into separate classes. An integrated F1-score criterion has been proposed, the calculation of which involves applying Harrington's desirability function to partial F1-score values calculated for individual classes. The simulation results regarding applying different CNN architectures for the classification of gene expression data are presented. The experimental data used were gene expression data from patients who were studied for various types of cancer and contained eight classes of samples taken from patients with the corresponding type of cancer. The ninth group included samples from patients in whom no cancer was detected. According to the simulation results, a single-layer CNN demonstrated higher effectiveness across a set of quality criteria. When using the ordered grid search algorithm, the following hyperparameters were identified as optimal: softmax activation function for the output layer neurons, selu activation function for dense layer neurons, sigmoid activation function for convolutional layer neurons, max pooling of 2, kernel size for dense layer neurons of 64, kernel size for convolutional layer neurons of 3, and the number of filters in the convolutional layer of 32. Future directions for the authors' research include exploring alternative algorithms for neural network hyperparameter optimization and incorporating other deep learning methods within the complex object diagnosis system. 5. References [1] E.M. Nikolados, D.A. Oyarzún. Deep learning for optimization of protein expression. Current Opinion in Biotechnology, 2023, vol. 81, art. no. 102941. DOI: 10.1016/j.copbio.2023.102941 [2] E. Mustafa, E.K. Jadoon, S. Khaliq-uz-Zaman, M.A. Humayun, M. Maray. An Ensembled Framework for Human Breast Cancer Survivability Prediction Using Deep Learning. Diagnostics, vol. 13(10), art. no. 1688. DOI: 10.3390/diagnostics13101688. [3] G. Mao, Z. Pang, K. Zuo, J. Liu. Gene Regulatory Network Inference Using Convolutional Neural Networks from scRNA-seq Data. Journal of Computational Biology, 2023, vol. 30(5), pp. 619- 631. DOI: 10.1089/cmb.2022.0355. [4] A. Kaur, A.P.S. Chauhan, A.K. Aggarwal. Prediction of Enhancers in DNA Sequence Data using a Hybrid CNN-DLSTM Model. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2023, vol. 20(2), pp. 1327-1336. DOI: 10.1109/TCBB.2022.3167090. [5] N.P. Kumar, S. Vijayabaskar, L. Murali, K. Ramaswamy. Design of optimal Elman Recurrent Neural Network based prediction approach for biofuel production. Scientific Reports, 2023, vol. 13(1), art. no. 8565. DOI: 10.1038/s41598-023-34764-x. [6] R. Jain, A. Jain, E. Mauro, K. LeShane, D. Densmore. ICOR: improving codon optimization with recurrent neural networks. BMC Bioinformatics, 2023, vol. 24(1), art. no. 132. DOI: 10.1186/s12859-023-05246-8. [7] H. Xu, J. Lin, D. Zhang, F. Mo. Retention time prediction for chromatographic enantioseparation by quantile geometry-enhanced graph neural network. Nature Communications, 2023, vol. 14(1), art. no. 3095. DOI: 10.1038/s41467-023-38853-3. [8] Z. Wu, J. Wang, H. Du, et al. Chemistry-intuitive explanation of graph neural networks for molecular property prediction with substructure masking. Nature Communications, 2023, vol. 14(1), art. no. 2585. DOI: 10.1038/s41467-023-38192-3. [9] L. Comanducci, D. Gioiosa, M. Zanoni, F. Antonacci, A. Sarti. Variational Autoencoders for chord sequence generation conditioned on Western harmonic music complexity. Eurasip Journal on Audio, Speech, and Music Processing, 2023, vol. 2023(1), art. no. 24. DOI: 10.1186/s13636-023- 00288-5. [10] G. Nikolentzos, M. Vazirgiannis, C. Xypolopoulos, M. Lingman, E.G. Brandt. Synthetic electronic health records generated with variational graph autoencoders. Digital Medicine, 2023, vol. 6(1), art. no. 83. DOI: 10.1038/s41746-023-00822-x. [11] X. Lu, P. Li. Research on gearbox temperature field image fault diagnosis method based on transfer learning and deep belief network. Scientific Reports, 2023, vol. 13(1), art. no. 6664. DOI: 10.1038/s41598-023-33858-w [12] D. Ma, P. Jiang, L. Shu, Y. Qiu, Y. Zhang, S. Geng. DBN-based online identification of porosity regions during laser welding of aluminum alloys using coherent optical diagnosis. Optics and Laser Technology, 2023, vol. 165, art. no. 109597. DOI: 10.1016/j.optlastec.2023.109597. [13] T. Zan, H. Wang, M. Wang, et al. Application of Multi-Dimension Input Convolutional Neural Network in Fault Diagnosis of Rolling Bearings. Applied Sciences, 2019, vol. 9, art no. 2690. DOI: 10.3390/app9132690. [14] P. Metipatil, P. Bhuvaneshwari, S.M. Basha, S.S. Patil. An Efficient Framework for Predicting Cancer Type Based on Microarray Gene Expressions Using CNN-BiLSTM Technique. SN Computer Science, 2023, vol. 4(4), art. no. 381. DOI: 10.1007/s42979-023-01774-5. [15] S. Babichev, J. Krejci, J. Bicanek, V. Lytvynenko. Gene expression sequences clustering based on the internal and external clustering quality criteria. Proceedings of the 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2017, 2017, vol. 1, art. no. 8098744, pp. 91-94. DOI: 10.1109/STC-CSIT.2017.8098744. [16] S. Babichev, V. Osypenko, V. Lytvynenko, M. Voronenko, M. Korobchynskyi. Comparison Analysis of Biclustering Algorithms with the use of Artificial Data and Gene Expression Profiles. 2018 IEEE 38th International Conference on Electronics and Nanotechnology, ELNANO 2018 - Proceedings, 2018, art. no. 8477439, pp. 298-304. DOI: 10.1109/ELNANO.2018.8477439 [17] S. Babichev, L. Yasinska-Damri, I. Liakh, J. Škvor. Hybrid Inductive Model of Differentially and Co-Expressed Gene Expression Profile Extraction Based on the Joint Use of Clustering Technique and Convolutional Neural Network. Applied Sciences (Switzerland), 2022, vol. 12(22), art. no. 11795, DOI: 10.3390/app122211795. [18] L. Yasinska-Damri, S. Babichev, B. Durnyak, T. Goncharenko. Application of Convolutional Neural Network for Gene Expression Data Classification. Lecture Notes on Data Engineering and Communications Technologies, 2023, vol. 149, pp. 3-24. DOI: 10.1007/978-3-031-16203-9_1. [19] The Cancer Genome Atlas Program (TCGA). National Cancer Institute. Center for Cancer Genomics. URL: https://www.cancer.gov/about-nci/organization/ccg/research/structural- genomics/tcga [20] Illumina. URL: https://www.illumina.com/ [21] T. Girke. R & Bioconductor Manual. Institute for Integrative Genome Biology. URL: http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual [22] V. Koilo. Financial performance under stress: the case of the Norwegian maritime cluster. Public and Municipal Finance, 2019, vol. 8(1), pp. 54-72. DOI: 10.21511/pmf.08(1).2019.05