=Paper=
{{Paper
|id=Vol-3682/Paper7
|storemode=property
|title=Breast Cancer Classification Using Seahorse Swarm Optimization
|pdfUrl=https://ceur-ws.org/Vol-3682/Paper7.pdf
|volume=Vol-3682
|authors=Tanya Dixit,Arunima Jaiswal,Manaswini De
|dblpUrl=https://dblp.org/rec/conf/sci2/DixitJD24
}}
==Breast Cancer Classification Using Seahorse Swarm Optimization==
Breast Cancer Classification Using Seahorse Swarm Optimization Tanya Dixit1, *, Arunima Jaiswal1 and Manaswini De1 1 Department of Computer Science and Engineering, Indira Gandhi Delhi Technical University for Women, Kashmere Gate, Delhi - 110006 Abstract Globally, breast cancer ranks as the most widespread form of cancer among women. Machine Learning and Deep Learning approaches provide more effective means for detecting and managing this condition when compared to conventional detection methods. Advanced deep learning techniques, including LSTM (Long Short-Term Memory), Gate Recurrent Units (GRU), and Deep Belief Networks (DBN) have been used for classification of cancer. In this paper, the publicly available breast cancer dataset namely Wisconsin dataset is employed to investigate the efficacies of these deep learning techniques for classification of breast cancer. Further, the network architecture parameters are tuned for achieving better results using one of the latest swarm intelligence technique namely Sea Horse Optimization. Success rate of 95.61%, 96.49% and 98.24% respectively have been achieved by the proposed SHO-LSTM, SHO-GRU and SHO-DBN models when applied to the Wisconsin dataset. Keywords Breast Cancer Classification, Long Short-term Memory Network, Deep Belief Network, Gated Recurrent Unit, Sea horse Optimization1 Symposium on Computing & Intelligent Systems (SCI), May 10, 2024, New Delhi, INDIA ∗ Corresponding author. † These authors contributed equally. tanya011btcse20@igdtuw.ac.in (T. Dixit); arunimajaiswal@igdtuw.ac.in (A. Jaiswal); manaswini080btcse20@igdtuw.ac.in (M. De) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 1. Introduction Breast cancer happens to be the most common cancer type among women globally. According to the Global Cancer Observatory (GLOBOCAN 2020) report [1], India ranked third worldwide in terms of cancer cases and it has been predicted that the number of cases would be increasing drastically in future. Therefore, studies on prediction of breast cancer have great importance of prevention and control of this disease. Various studies have been carried out to highlight the present scenario, challenges and cancer awareness among women in India [2-4]. Leveraging Machine Learning (ML) and Deep Learning (DL) techniques for breast cancer classification not only enhances the faster detection at an early stage with improved accuracy but also contributes to better patient care by reducing the subjectivity of human interpretation of medical information. Authors of this paper have earlier studied the impact of reduction of features on breast cancer classification data by employing classical and quantum machine learning algorithms using the Wisconsin dataset [5]. Further, the effect tuning of hyper-parameters on deep learning models such as Long Short-Term Memory (LSTM) networks, Gate Recurrent Units (GRU) networks and Deep Belief Networks (DBN) has been investigated in this paper. Various deep learning techniques e.g., MLP, RNN, LSTM, GRU and DBN play crucial roles in breast cancer detection by capturing context and features using sequential data. LSTM and GRU are two variants of Recurrent Neural Networks (RNN) that use gating mechanism to selectively update information overtime, with GRU being simpler and faster than LSTM [6]. DBNs, a form of artificial neural networks, are adept at unsupervised feature learning. They consist of layers of Restricted Boltzmann Machines (RBMs), which are generative models capable of learning valuable features from raw input data without supervision [7]. In Machine Learning, selection of optimal values of various parameters of a model is termed as hyper-parameter tuning. Hyper-parameters are basically the configuration settings that control the learning process of a model, such as learning rate, the number of neural network layers, model architecture, batch size, activation functions, etc. specific to the model employed. Various strategies such as grid search and random search, meta- heuristic optimization, swarm optimization, Bayesian Optimization, etc. are generally used as optimization technique in hyper-parameter tuning. Among these approaches, swarm intelligence plays a significant role in achieving optimal solutions and reducing detection time for efficient control of disease. Sea-horse optimization (SHO) is a new swarm intelligence-based meta-heuristic approach rooted in swarm intelligence, drawing inspiration from the captivating behaviors observed in sea horses in their natural environment [8]. In this study, the detection of breast cancer using deep learning models namely LSTM, GRU and DBN has been attempted and the results of the optimized SHO- LSTM, SHO-GRU and SHO-DBN are compared when applied to the Wisconsin dataset for breast cancer classification. The structure of this paper is organized as follows: Section 2 offers a concise literature review covering diverse techniques employed for breast cancer detection using ML, DL and hyper-parameter optimization. Section 3 delves into details regarding the dataset employed in this study. Methodology is elucidated in Section 4, followed by a discussion of the results in Section 5. Lastly, conclusions based on the findings are drawn in Section 6. 2. Related Work A brief overview of a few relevant research works carried out in the field for detection of breast cancer using various techniques has been presented in this section. The authors of [6], proposed a stacked GRU-LSTM-BRNN deep learning model using Recurrent Neural Networks (RNNs) to classify patients’ health records as benign or malignant for breast cancer diagnosis using the Wisconsin Breast Cancer Dataset. The paper compares the three baseline modes: RNN, stacked LSTM and stacked GRU using metrics like accuracy, MSE and Cohen-Kappa score. The results reported by the proposed model outperform the baseline models on all metrics, achieving 97.34% accuracy, 0.97 F1- score, 0.03 MSE and 0.94 Cohen-Kappa score. In [8], the researchers proposed a swarm intelligence-based meta-heuristic technique called the Sea-horse optimizer (SHO). SHO replicates diverse movement patterns and the probabilistic predation mechanism observed in sea horses. The performance of SHO has been assessed across 23 established functions and the CEC2014 benchmark function. Experimental findings showcase SHO as a proficient optimizer capable of effectively addressing constraint problems. The authors of [9] introduced a novel Chaotic Sea Horse Optimization with DL models (CSHODL-PDC) for classification of pneumonia on CXR images. CSHODL-PDC makes use of NASNet Large model and Fuzzy Deep Neural Network. The proposed model achieved the maximum accuracy of 99.22%, 98.96% precision and recall of 99.22%. The researchers of [10] proposed a Particle Swarm Optimization (PSO) optimized Multilayer Perceptron Neural Network (MLP). This model is compared with other machine learning models like K-Nearest Neighbors, Decision Tree and Naïve Bayes and shows higher accuracy, sensitivity and specificity. In [11], a stacked GRU (SGRU) for Deep Transfer Learning is used for breast cancer classification along Chaotic Sparrow Search Algorithm (CSSA) for hyper-parameter optimization. The proposed model achieves the accuracy of 98.61% compared to existing models when applied on a benchmark image dataset. The paper [12] aims to overcome the limitations of Back Propagation Learning Algorithm for RNNs, such as slow convergence, local minima and long term dependencies by using the R programming language to implement proposed models and compare them with standard RNN and LSTM. It employs four distinct meta-heuristic algorithms – Harmony Search, Ant Lion Optimization, Sine Cosine and Grey Wolf Optimizer – to train the LSTM model for classification tasks using real and medical time-series datasets, including the Breast Cancer Wisconsin and Epileptic Seizure Recognition datasets. The proposed models have been reported to achieve higher accuracy rates than the standard ones on both data sets. The study outline in [13] presents an arithmetic optimization algorithm combined with deep-learning-based histopathological breast cancer classification (AOADL-HBCC), which comprises four sequential steps: noise elimination and contrast enhancement, feature extraction utilizing AOA and SqueezeNet, feature selection employing DBN, and classification utilizing the Adamax Optimizer. The results show that the proposed model achieves the highest accuracy of 96.77% on the 100x dataset and 96.4% on 200x dataset, outperforming the other models. The study in [7] introduces a hybrid model that individually trains Random Forest (RF), MLP, and DBN on the Wisconsin Breast Cancer dataset. These models are then integrated using a weighted average method to achieve final classification. The proposed model achieves 96.5% accuracy against individual accuracies of 93.9%, 91.3% and 97.5% respectively. The paper [14] proposes an Enhanced Sea Horse Optimization (ESHO) combined with sine-cosine and Tent Choatic Mapping to adaptively tune the ResNet-50 parameters and optimize its performance on two agricultural image datasets: jade fungus and corn diseases. The ResNet-50 model optimized by ESHO achieves an accuracy of 96.7% for corn disease image recognition and 96.4% for jade fungus image recognition. In [15], a novel hybrid PSO-SHO algorithm combining the advantages of PSO and SHO is proposed. The proposed approach strives to minimize the real power losses and the voltage deviation of the power system by optimizing the generator voltages, transformer tap settings and reactive power compensators. 3. Data The dataset that is employed is the Wisconsin Breast Cancer Dataset (WBC) [16]. The WBC dataset consists of 569 samples of breast cancer patients with a distribution of 357 benign and 212 malignant cases. 30 characteristic features are quantified from digitized images of breast masses. They are numerical measures of attributes like cell nuclei, means, standard errors and worst values of texture, perimeter, radius, smoothness, compactness, area, concavity, symmetry, concave points and fractal dimension. The classifying labels are denoted by 0 (benign) and 1 (malignant) under the attribute ‘diagnosis’. Figure 1. Data Visualization 3.1. Data Pre-Processing The data preprocessing involves cleaning, scaling features, and encoding labels for the target variables. Subsequently, it is portioned into training and testing sets, with 80% allocated for training data and 20% for testing data. Further, reshaping is used of the training and testing feature arrays to add an additional dimension representing number of channels. LSTMs and GRUs expect input data in a specific 3D shape, typically represented as (batch_size, timesteps, and features). For deep learning models, particularly those using RNNs, input data should be reshaped to meet the input requirements of the model [17]. 4. Proposed Method In this paper, Gate Recurrent Unit Networks (GRU), Long Short-Term Memory (LSTM) and Deep Belief Networks (DBN) have been used to predict breast cancer. Further, we have designed these models with optimized parameters for our classification problem. In order to achieve this, we have used the Seahorse Optimization algorithm to optimize the hyper-parameters and achieve better accuracy. The focus of optimization is on various parameters involved in these neural networks, like learning rate, filter numbers, neurons, epochs etc. The fitness function optimizes based on the classification accuracy. The figure 2 demonstrates a flowchart of our proposed method. Figure 2. Proposed Method Flowchart The process begins with by selecting a deep learning mode, such as GRU, LSTM, or DBN. The input data is preprocessed. Seahorse Optimizer’s (SHO) parameters like epochs, population size are initialized along with the Objective function that optimizes based on the model’s accuracy. SHO then generates hyper-parameters for the chosen model, and the respective model is trained with these hyper-parameters. These steps are repeated till the maximum number of iterations is completed. Overtime, SHO produces the highest global accuracy, and we subsequently assess the model’s performance using various traditional performance metrics based on the parameters associated with this accuracy. The models and optimizer’s involved are discussed in detail in the coming sections. 4.1. Long Short-Term Memory (LSTM) Long Short-Term Memory (LSTM) is a deep learning model that addresses the challenges of learning long-term dependencies [18], which traditional recurrent neural networks struggle with. They are specialized neural networks designed to handle sequential data and learn long-term dependencies. LSTMs extend the basic RNN cell [19]. The basic RNN cell takes input at each step, computes a hidden state based on this input and its previous hidden state. The output from the cell helps in training and prediction. Instead of this simple cell, an LSTM cell contains three key components: (1) Input Gate: This gate controls how the amount of new information enters the cell, (2) Forget Gate: Determines which information from the previous hidden state needs to be forgotten, (3) Output Gate: Sets the output based on the current input and hidden state. Additional to the hidden state (similar to RNN’s hidden state), LSTMs have a cell state which is essentially a separate memory component that stores long-term information. Cell state is updated with the combined information of input, forget and output gates. 4.2. Gate Recurrent Units (GRU) Gated Recurrent Units (GRUs) stem from RNNs, incorporating a gating mechanism initially introduced in [20]. Similar to LSTMs, GRUs employ gating mechanism to selectively incorporate or forget certain features, albeit lacking an output gate, thereby resulting in fewer parameters compared to LSTMs. GRU also finds use in processing sequential data such as text, speech and time-series data. GRU can control the flow of information from previous activation state while computing the new activation state [6]. Compared to LSTM, GRU has a superior convergence rate as it has lesser number of parameters and can outperform LSTM models [18]. The GRU comprises two gating mechanism: the reset gate and the update gate. The reset gate regulates the extent to which the previous hidden state is forgotten, while the update gate determines the amount of new input required to update the hidden state. 4.3. Deep Belief Networks (DBN) Deep Belief Network (DBN) is a generative graphical model, composed of multiple layers of latent variables or more popularly known as “hidden units”. They have connections between the layers but no between units within each layer [19]. Consider visualizing these systems as intricate, multi-layer networks with each layer processing information from the preceding one, progressively constructing a sophisticated comprehension of the entire dataset. DBNs are built by layering simple, unsupervised networks such as Restricted Boltzmann Machines (RBMs) or auto encoders. The configuration of the output layer in a DBN is contingent upon the specific task at hand. For instance, in a classification task involving k classes, the DBN would utilize k SoftMax units, each dedicated to one class [7]. The hidden layer of each sub-network serves as the visible layer for the next one in sequence. The training process involves contrastive divergence applied layer by layer, from the lowest visible layer, which serves as the training set [21]. 4.4. Hyper-parameter Optimization of Deep Learning Models Hyper-parameters are external configuration variables set by programmers to operate model training. They are parameters that define the details of learning process. Examples of hyper-parameter optimization encompass various instances such as learning rate (which regulates the magnitude of steps taken during gradient descent optimization), batch size (which dictates the quantity of training examples utilized in each iteration of gradient descent), the number of hidden layers, activation functions (like ReLU, sigmoid, tanh), dropout rate (indicating the fraction of neurons within dense layers, among others. Hyper-parameter optimization (HPO) is the process of selecting optimal values for a machine/deep learning model’s hyper-parameters. HPO can be seen as the last step of model design and the initial step of neural network training [22]. It finds a tuple of hyper- parameters that gives an optimal model with enhanced accuracy or prediction. Over the years many techniques have been employed for hyper-parameter optimization such as Grid Search, Random Search, Bayesian Optimization, Genetic Algorithms and even swarm based techniques like Particle Swarm Optimization and Ant Colony Optimization [23]. For the optimization of DL models proposed in this study we have taken the hyper-parameters as represented in table 1, along with the new Seahorse Optimization Algorithm. Table 1. Hyper-parameters optimized using SHO Model Hyper-parameters LSTM Filters, neurons, batch-size, epochs GRU Filters, neurons, batch-size, epochs DBN Hidden layers, learning rate, epochs 4.5. Seahorse Optimization Algorithm Introduced in 2022, the Seahorse Optimization (SHO) algorithm represents a novel swarm-based meta-heuristic optimization approach [8]. SHO replicates the natural movement, hunting and breeding patterns observed in seahorses. Seahorse movement behavior encompasses two scenarios: (1) Spiral Movement of the hippocampus in conjunction with the ocean vortex and (2) Brownian motion of the hippocampus amidst the waves. Further, predatory behavior consists of the following two situations: success and failure. The breeding behavior of seahorse is described by random mating and offspring inherit traits of both parents. An equal mix of male and female seahorses is taken in the population. To enhance the balance of the SHO algorithm, global strategies are applied to motion behavior and local strategies are applied to predation behavior. The following equations (1) and (2) denote the spiral and Brownian movement behaviors of seahorses respectively. SHO utilizes Lévy flight to emulate the spiral movement observed in seahorses, which aids in preventing SHO from becoming trapped in local optima. In the spiral movement, the three dimensional vector of coordinates (x, y, z) is denoted by x, y and z. Regarding the Brownian equationl denotes the constant coefficient and βt represents the coefficient for the motion random walk. 𝑖 (𝑡 + 1) = 𝑋𝑖 (𝑡) + 𝐿𝑒𝑣𝑦(λ)((𝑋𝑒𝑙𝑖𝑡𝑒 (𝑡) − 𝑋𝑖 (𝑡) × 𝑥 × 𝑦 × 𝑧 + 𝑋𝑒𝑙𝑖𝑡𝑒 (𝑡)) (1) 𝑋𝑛𝑒𝑤 X1new (t + 1) = X i (t) + rand × l × βt × (X i (t) − βt × X elite ) (2) Equation (3) is the mathematical representation of predation, withr2 representing a random number generated by SHO to distinguish between success and failure scenarios. If r2 exceeds 0.1, the predation by the seahorse is deemed successful; otherwise, it results in failure. α denotes the step size of the seahorse’s movement in pursuit of prey. 1 (𝑡)) 2 (𝑡 𝛼 × (𝑋𝑒𝑙𝑖𝑡𝑒 − 𝑟𝑎𝑛𝑑 × 𝑋𝑛𝑒𝑤 + (1 − 𝛼) × 𝑋𝑒𝑙𝑖𝑡𝑒 , 𝑟2 > 0.1 𝑋𝑛𝑒𝑤 + 1) = 𝑓(𝑥) = { 1 1 (𝑡), (3) (1 − 𝛼) × (𝑋𝑛𝑒𝑤 (𝑡) − 𝑟𝑎𝑛𝑑 × 𝑋𝑒𝑙𝑖𝑡𝑒 ) + 𝛼 × 𝑋𝑛𝑒𝑤 𝑟2 ≤ 0.1 Equation (4) and (5) denote the mathematical equations used to calculate parent seahorses. Xsort 2 is fitness values of the population in ascending order. (6) denotes the mathematical equation representing an offspring. r3 is a random number between [0, 1]. 2 (1: 𝑓𝑎𝑡ℎ𝑒𝑟 = 𝑋𝑠𝑜𝑟𝑡 𝑝𝑜𝑝⁄2) (4) 2 (𝑝𝑜𝑝⁄ 𝑚𝑜𝑡ℎ𝑒𝑟 = 𝑋𝑠𝑜𝑟𝑡 2 + 1: 𝑝𝑜𝑝) (5) 𝑜𝑓𝑓𝑠𝑝𝑟𝑖𝑛𝑔 𝑓𝑎𝑡ℎ𝑒𝑟 𝑋𝑖 = 𝑟3 𝑋𝑖 + (1 − 𝑟3 )𝑋𝑖𝑚𝑜𝑡ℎ𝑒𝑟 (6) The algorithm starts with a randomly generated population of seahorses, with each seahorse representing a potential solution. It uses a normal distribution to decide among the two movement behaviors of seahorses. It mimics the high success rate of seahorses in hunting to enhance exploitation capabilities. It draws inspiration from the breeding behavior of seahorses to generate new solutions, hoping to improve upon the current best solution. If the hunt is successful, the seahorse (problem) moves towards the prey (best solution) otherwise search space exploitation continues. The algorithm followed by SHO is explained in the following flowchart : Figure 2. Flowchart illustrating the working of SHO algorithm The SHO algorithm maintains a balance between exploration (diversification) and exploitation (intensification) to mitigate the risk of getting stuck in local optima and to efficiently locate the global optima. This algorithm has been applied to various engineering design problems, and shows promising results. 5. Results Various Performance metrics namely accuracy, F1-score, recall, precision and specificity have been used for comparison of results between standard LSTM, GRU and DBN and their SHO optimized counterparts and these parameters are given in the following equations (7)-(11). 𝑇𝑁 + 𝑇𝑃 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑥100 (7) 𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁 𝑇𝑃 𝑅𝑒𝑐𝑎𝑙𝑙 = × 100 (8) 𝑇𝑃 + 𝐹𝑁 𝑇𝑁 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = (9) 𝑇𝑁 + 𝐹𝑃 2 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙 𝐹1 𝑆𝑐𝑜𝑟𝑒 = (10) 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙 𝑇𝑃 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (11) 𝐹𝑃 + 𝑇𝑃 Accuracy is the ratio of correctly classified instances to the aggregate instances. TP (True Positive), TN (True Negative), FP (False Positive) and FN (False Negative) are the number of correctly and incorrectly classified positive and negative cases. Sensitivity or recall is the fraction of positive cases that the classifier correctly identifies, whereas specificity is the fraction of negative cases that a classifier correctly identifies. Precision is the proportion of true positives out of all predicted positives. F1-score is a measure of performance that combines precision and recall as it is the harmonic mean of the two metrics. The results obtained from DL models with and without SHO optimization are presented in the following Table 2 and Table 3 respectively. Table 2. Results obtained without Optimization on DL Models Model Accuracy F1-Score Recall Precision Specificity LSTM 92.10 91.08 97.87 85.18 88.05 GRU 93.85 92.13 87.23 97.61 98.50 DBN 92.11 90.11 87.23 93.18 95.52 Table 1. Results obtained with Seahorse Hyper-parameter Optimization Model Accuracy F1-Score Recall Precision Specificity SHO-LSTM 95.61 94.73 95.74 93.75 95.52 SHO-GRU 96.49 95.91 100.00 92.15 94.02 SHO-DBN 98.25 97.87 97.87 97.87 98.51 Before optimization, the LSTM and GRU models achieved accuracies of 92.10% and 93.85%, respectively, while the DBN model achieved an accuracy of 92.11%. However after the hyper-parameter optimization with SHO, the performances of all models significantly improved. The SHO-LSTM achieved an accuracy of 95.61%, SHO-GRU demonstrated an accuracy of 96.49% and SHO-DBN acquired an accuracy of 98.25%. Furthermore, the SHO-DBN model consistently outperformed the other models across all metrics, demonstrating its effectiveness in breast cancer detection. Notably, the SHO- GRU model achieved perfect recall (100%), indicating its ability to correctly identify all positive cases of breast cancer. GRU demonstrated better performance than LSTM across all metrics with or without optimization, indicating its greater efficiency due to lesser number of model parameters. The graphical comparisons of performance metrics for the un-optimized and optimized DL models are presented in Figure 4(a), 4(b), and 4(c). Figure 3. Performance metrics (a) LSTM, (b) GRU and (c) DBN 6. Conclusion Machine learning and deep learning techniques offer more efficient ways to detect and manage breast cancer compared to traditional methods. Advanced DL models, namely LSTM, GRU and DBN have been successfully applied to Wisconsin dataset for breast cancer classification. Additionally, network architecture parameters are fine-tuned using the state-of-the-art swarm intelligence technique known as Sea Horse Optimization. In this study, we explored the effectiveness of these deep learning techniques for achieving improved performances when applied to two different breast cancer datasets. Specifically, the SHO-DBN model emerges as the most effective model for this task, with the highest accuracy, precision, specificity and F1-Score. These findings underscore the potential of optimized deep learning models as valuable tools in the early detection and management of breast cancer. References [1] H. Sung, J. Ferlay, R. L. Siegel, M. Laversanne, I. Soerjomataram, A. Jemal, F. Bray, “Global cancer statistics 2020:GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries”, CA: A Cancer Journal for Clinicians, vol. 71, no. 3, pp. 209-249, 2021. [2] A A. Gupta, K. Shridhar, P. K. Dhillon, “A review of breast cancer awareness among women in India: Cancer literate or awareness deficit?”, European J. Cancer , vol. 51, no. 14, pp. 2058-2066, 2015. [3] R. Mehrotra, K. Yadav, Breast cancer in India: “Present scenario and the challenges ahead”, World journal of clinical oncology, vol. 13.3, pp. 209-218, 2022, doi:10.5306/wjco.v13.i3.209. [4] K. Sathishkumar, M. Chaturvedi, P. Das, S. Stephen, P. Mathur, “Cancer incidence estimates for 2022 & projection for 2025: Result from National Cancer Registry Programme”, The Indian journal of medical research, vol. 156, no. 4 & 5, pp. 598-607, 2022, doi:10.4103/ijmr.ijmr_1821_22. [5] M. De, A. Jaiswal, T. Dixit, “Comparative analysis of classical and quantum machine learning for breast cancer classification”, International Conference on Computational Intelligence and Mathematical Applications, 2023. [6] S. Dutta, J. K. Mandal, T. H. Kim, S. K. Bandyopadhyay, “Breast Cancer Prediction Using Stacked GRU-LSTM-BRNN”, Applied Computer Systems, vol. 25, no. 2, pp. 163-171, 2020, doi: 10.2478/acss-2020-0018. [7] S. Yamani, Z. H. Choudhury, “Integrating Random Forest, MLP and DBN in a Hybrid Ensemble Model for Accurate Breast Cancer Detection”, International Journal of Innovative Science and Research Technology, vol 8, no. 7, pp. 1556-1564, 2023, doi:10.31219 /osf.io/sdjqf. [8] S. Zhao, T. Zhang, S. Ma, M. Wang, “Sea-horse optimizer: a novel nature-inspired meta- heuristic for global optimization problems”, Applied Intelligence, vol. 53, no. 10, pp. 11833- 11860, 2023, doi:10.1007/s10489-022-03994-3. [9] V. Parthasarathy, S. Saravanan, “Chaotic Sea Horse Optimization with Deep Learning Model for lung disease pneumonia detection and classification on chest X-ray images”, Multimedia Tools and Applications, 2024, doi: 10.1007/s11042-024-18301-0. [10] M. Alimardani, M. Almasi, “Investigating the application of particle swarm optimization algorithm in the neural network to increase the accuracy of breast cancer prediction”, International Journal of Computer Trends and Technology, vol. 68, no. 4, pp. 65-72, 2020, doi: 10.14445/22312803/IJCTT-V68I4P112. [11] K. Shankar, A. K. Dutta, S. Kumar, G. P. Joshi, I. C. Doo, “Chaotic Sparrow Search Algorithm with Deep Transfer Learning Enabled Breast Cancer Classification on Histopathological Image”, Cancers, vol. 14, no. 11, 2022, p. 2770, doi: 10.3390/cancers14112770. [12] T. A. Rashid, P. Fattah, D. K. Awla, “Using accuracy measure for improving the training of LSTM with metaheuristic algorithms”, Procedia Computer Science, vol. 140, pp. 324-333, 2018, doi:10.1016/j.procs.2018.10.307. [13] M. Obayya, S. Alkhalaf, F. Alrowais, S. Alshahrani, A. Alzahrani, A. Alzahrani, “Hyperparameter Optimizer with Deep Learning-Based Decision-Support Systems for Histopathological Breast Cancer Diagnosis”, Cancers, vol. 15, p. 885, 2023, doi: 10.3390/cancers15030885. [14] Z. Li, S. Qu, Y. Xu, X. Hao, N. Lin, “Enhanced Sea Horse Optimization Algorithm for Hyperparameter Optimization of Agricultural Image Revolution”, Mathematics, vol. 12, no. 3, p. 368, 2024, doi:10.3390/math12030368. [15] H. M. Hasanien, I. Alsaleh, M. Tostado-Véliz, M. Zhang, A. Alateeq, F. Jurado, A. Alassaf, “Hybrid particle swarm and sea horse optimization algorithm-based optimal reactive power dispatch of power systems comprising electric vehicles”, Energy, vol. 286, pp. 129583, 2024, doi:10.1016/j.energy.2023.129583. [16] “Diagnostic Wisconsin Breast Cancer Database”, UCI Machine Learning Repository. Available: https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic. [17] M. Chhetri, S. Kumar, P. P. Roy, B.-G. Kim, Deep BLSTM-GRU “Model for Monthly Rainfall Prediction: A Case Study of Simtokha”, Bhutan, Remote Sensing, vol. 12, no. 19, p. 3174, 2020, doi:10.3390/rs12193174. [18] A. Sherstinsky, “Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network”, Physica D: Nonlinear Phenomena, vol. 404, p. 132306, 2020, doi: 10.1016/j.physd.2019.132306. [19] G. E. Hinton, “Deep belief networks”. In E. N. Zalta (Ed.), Scholarpedia, vol. 4.5, p. 5947, 2009, doi:10.4249/scholarpedia.5947. Available: http://www.scholarpedia.org/article/Deep_ belief_networks. [20] J. Chung, C. Gulcehre, K. Cho, Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling”, arXiv preprint, arXiv: 1412.3555v1, 2014, doi: 10.48550/arXiv.1412.3555. [21] Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, “Greedy layer-wise training of deep networks”. In B. Schölkopf, J. Platt, T. Hoffman (Eds.), Advances in Neural Information Processing Systems, MIT Press, vol. 19, pp. 153-160, 2006. Available: https://deeplearning.cs.cmu.edu/ pdfs/Bengio_et_al_NIPS_2006.pdf. [22] T. Yu, H. Zhu, “Hyper-Parameter Optimization: A Review of Algorithms and Applications”, 2020. Available: https://arxiv.org/abs/2003.05689. [23] W.-C. Yeh, Y.-P. Lin, Y.-C. Liang, C.-M. Lai, X.-Z. Gao, “Simplified Swarm Optimisation for the Hyperparameters of a Convolutional Neural Network”, 2023. Available: https://arxiv.org/ftp/arxiv/papers/2103/2103.03995.pdf. [24] H. Alahmer, A. Alahmer, M. I. Alamayreh, M. Alrbai, R. Al-Rbaihat, A. Al-Manea, R. Alkhazaleh, “Optimal water addition in emulsion diesel fuel using machine learning and sea-horse optimizer to minimize exhaust pollutants from diesel engine”, Atmosphere, vol. 14, p. 449, 2023, doi: 10.3390/atmos14030449.