<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>HC@AIxIA</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Features selection throught autoencoder filtering and DeepShap: an iterative algorithm</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Edoardo De Rose</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlo Adornetto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Calimeri</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gianluigi Greco</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Mathematics and Computer Science, University of Calabria</institution>
          ,
          <addr-line>Via Pietro Bucci, Rende</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>25</volume>
      <fpage>25</fpage>
      <lpage>28</lpage>
      <abstract>
        <p>In many fields, such as functional genomics or finance, data analysis, and predictive modeling are always challenging for the course of dimensionality and noisy data. In these cases, efective feature selection algorithms, based on Machine and Deep Learning, can perform and improve the identification of important features, leading to more treatable problems in terms of dimensionality. The paper proposes a novel algorithm to perform Feature Selection on highly dimensional data, which exploits the reconstruction capabilities of autoencoders and an ad-hoc defined Explainable Artificial Intelligence-based score to select the most informative feature for predictions. We benchmark such an approach on several state-of-the-art datasets and against the previously proposed algorithm in the literature, showcasing its efectiveness.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Deep Learning</kwd>
        <kwd>Explainable AI</kwd>
        <kwd>Genomics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In the field functional genomics, starting from the results of the Human Genome Project, the evolution
of sequencing techniques provides big volumes of data for each single patient by taking advantage of
the high-throughput and next-generation sequencing i.e., a set of time and cost-efective techniques for
sequencing DNA and RNA. By means of them, it is possible to measure the expression of thousands of
genes for each individual and hence to collect quantitative gene expression profiles (GEP) to be used for
research and clinical purposes. But despite GEP datasets represent a valuable source of information in
healthcare—they are indeed used for diagnosis, prevention, and precision medicine—their analysis
results challenging for three main reasons. The first one is the course of dimensionality: a genomics
dataset typically consists of a very large number of features (genes) and a small number of samples
(patients); the second problem concerns imbalanced classes: in the analysis of diferent groups of
patients, genomics data are often stratified in classes according to diferent pathologies. In most cases,
there is a significant diference between the number of instances in each class; finally, sequencing data
are typically collected from multiple sources, diferent laboratories, and sequencing tools. This results
in noisy datasets which are dificult to analyze [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        In recent years, Machine Learning (ML) and Deep Learning (DL) have been widely adopted in this field,
providing breakthrough results and meaningful insights into the relationship between genomics and
cancer [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. Although still very promising, DL models are in general not immediately interpretable,
meaning that it is dificult to understand the causal relationship between the inputs and their outcomes.
This is an even more severe problem in the bioinformatics domain, where it is crucial to understand, for
example, in the case of genomics, how the expression of a gene can afect the progression of oncological
patients.
      </p>
      <p>We propose a new algorithm, based on DL and Explainable Artificial Intelligence (XAI), for genomics
whose aim is threefold: first, select the most meaningful genes for a regression/classification problem;
second, provide a more accurate prediction model; third, quantify and evaluate the efect of features
on the predictions, through XAI. We used our algorithm for the GEP analysis of acute lymphoblastic
leukemia (ALL) patients, identifying a meaningful subset of genes for the disease prognosis. The
following sections are organized as follows. First, we review the most relevant related works in
Section 2, and we then give a formal definition of the algorithm in Section 3. The application and the
results obtained by the algorithm for the CLL study are discussed in Section 4. Finally, directions for
further research are proposed in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        A number of recent studies propose and evaluate new approaches for feature selection (FS) on GEP
datasets for cancer diagnosis and prognosis[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Such methodologies mainly aim at selecting the most
informative genes, which are able to characterize classes and identify groups of patients. In this context,
the adoption of XAI methods has started to gain momentum for interpretability purposes as well as
to enhance FS[
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ]. A widely used approach to overcome the course of dimensionality problem is
to perform dimensionality reduction using AEs [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. While this has been proven to be efective, the
encoding is typically a non-linear projection of the variables into a lower-dimensional space, which
makes it dificult to provide the proper interpretations of the results. In this work we propose a novel
approach, which uses AEs for selecting the most informative genes without any change into the original
features space, hence enhancing the explainability of the results, and still exploiting the representation
abilities of AEs.
      </p>
      <p>
        We moreover use an ad-hoc defined XAI-based score in order to iteratively select the features by taking
advantage of the Shapely Additive ex-Planation method (SHAP)[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], a cooperative game theory-based
approach for computing the shapely values. Such values measure, locally (at the sample level), the
contribution of each feature to the predictions of an ML model. In particular, for a given sample , the
set of features  , the contribution of the feature  ∈  is defined as:
 =
      </p>
      <p>
        ∑︁
⊆  ∖{}
||!(| | − | | − 1)!
| |!
[∪{}(∪{}) − ()]
(1)
with  ∈ R and where ∪{} and ∪{} denote the prediction model and the sample considering
the only subset of features  without the -th one. In words, SHAP computes the contribution of a
feature by comparing the model predictions obtained with and without a feature, for all the possible
combinations . Since the computation of the Equation 1 is ineficient in the case of NN as a prediction
model—a NN should be re-trained for each combination of features (2| |)—the authors demonstrate in
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] that shapely values can be computed by solving a weighted linear least square regression with the
proper shapely kernel. Although we used such an alternative method, we omitted the details and focus
on the only definition of shapely values.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. The Algorithm</title>
      <p>The proposed algorithm is based on two main ideas: (1) we use a clustered correlation matrix in order to
group features that enclose similar patterns and we then filter the redundant information for each group
by using AEs. In contrast with previous works, in which AEs are used for dimensionality reduction,
we still work at the level of the original features. In particular, we take advantage of the encoding and
reconstruction abilities of the AEs assuming that the more accurate is the reconstruction of a feature,
the more that feature is representative of the cluster it belongs to. We hence provide a more treatable
dataset in terms of dimensionality, without loss of representativeness, by filtering redundant features;
(2) we train NNs and we iteratively select the most meaningful features using a new ad-hoc defined
SHAP score. We repeat the analysis by removing at each iteration, the previously selected features. We</p>
      <p>Clusters k1 . . . kq
Number of genes ≥ N Collect selected genes Number of genes &lt; N</p>
      <p>Explainability-based Selection
Select the most meaningful genes
according to an ad-hoc defined</p>
      <p>SHAP-based score</p>
      <p>Correlation Clustering</p>
      <p>Autoencoder (AE) Filter</p>
      <p>Patients
used a1sfea.tu.re.sforneach
k1 . . . kq</p>
      <p>AE.1
.</p>
      <p>. AEq
Remove selected genes
from k1 . . . kq</p>
      <p>NN Training
eventually use the set of selected features (from all the iterations) to train and explain a final model.
Figure 1 shows the main algorithm phases.</p>
      <sec id="sec-3-1">
        <title>3.1. Formal Setting</title>
        <p>Let be  = {,  } a dataset such that  ∈ R×  is the matrix of inputs, and  ∈ R×  is the matrix
of the corresponding labels. Let us further assume  ≫  meaning that the dataset is characterized by
a way larger set of features with respect to the number of samples.</p>
        <p>As a novelty contribution, we introduce a new impact score, which, by means of the SHAP local
explanation, measures the global impact of each feature on model predictions. We hence associate to
each feature (column)  of , used to train a model N, a couple ( ,N ,  ,N) were  ,N is the correlation
between the -th columns of  and their shapely values {1, , ..., , }, and  ,N is defined as follows:
∑︀=1 |, | * 2,
 ,N = ∑︀ℎ=1 ∑︀=1 |,ℎ| * 2,ℎ
(2)
With  ,N and  ,N we want to emphasize how and how much, respectively, a feature globally afect the
predictions of N.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Algorithm</title>
        <p>For sake of clarity, we introduce our algorithm by first defining a set of sub-procedures. The first one
(Algorithm 1) computes the pairwise correlation matrix  ∈ R×  between the features (columns) of
a generic real-valued matrix  . Finally it clusters  in order to return a set  = {1, ..., } such that
for each  = 1, .., ,  is a set of indexes—a partition (cluster) for the columns of  .
The second sub-procedure, defined in Algorithm 2, trains an AE for each cluster, by using the transpose
of the input matrix  —meaning that, for the AE model, each feature represents a sample and vice
versa. The rationale here is that we assume the best-reconstructed feature (over the samples) to be
the most representative of the cluster it belongs to. We denote  ∈ R×| | as a matrix including
the only columns of  which indexes are in . The  function provides the column indexes
of  associated with the best-reconstructed feature. Finally, the sub-procedure returns a set  of 
indexes—one for each cluster.</p>
        <sec id="sec-3-2-1">
          <title>Algorithm 1 Corr. &amp; Clustering</title>
          <p>function CorrClustering()
 ← ()
 ← ()
return 
end function</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Algorithm 2 AE Filtering</title>
          <p>function AEFiltering(,)
 ← ∅
for  ∈  do</p>
          <p>
            AE ← ( )
 ←  ∪ (AE ,  )
end for
return 
end function
The last sub-procedure, reported in Algorithm 3, takes as input: the data, a matrix of shapely values
Φ and the threshold  ∈ R, with  ∈ [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ]. It first computes the correlations between each column
of  and the corresponding columns of Φ . Subsequently, it computes the intensity for each feature
following the denfiition of equation 2. It then selects the column indexes according to  and the mean
intensity, to finally provide a set ˜ of column indexes for  .
          </p>
          <p>Algorithm 3 Selection
function select(Φ, ,  )
 ← (Φ, )
 ←← |1|∑︀∈(Φ, )
˜ ← {  | | | &gt;  ∧   &gt; , ∀  ∈ , ∀  ∈ }
return ˜
end function</p>
          <p>The main procedure is described by Algorithm 4. After clustering the correlation matrix, it selects
a set of meaningful features index to be added to . It then removes the selected indexes from their
corresponding clusters in  and proceeds by repeating the analysis. Here we denote  ∈ R×| |
(and accordingly ) as a matrix including the columns of  which indexes are in  , and N (and
accordingly N) as a NN trained on { ,  }. The iterative analysis stops when  ∈ N,  ≤  features
are selected or on a maximum number of iterations. The algorithm eventually trains and explains a
ifnal NN using the set .</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. A Use Case: Leukemia-ALLAML</title>
      <sec id="sec-4-1">
        <title>4.1. Materials and Methods</title>
        <p>
          To validate the efectiveness of the method, we tested it on a synthetic toy dataset. This allowed us
to verify that the method correctly selected the centroid features for each cluster, ensuring that the
Algorithm 4
Require: , 
 ← CorrClustering()
 ← ∅
while || &lt;  ∨ not  do
 ← AEFiltering(, )
 ,   ← dataBalancing( ,  )
N ← findModel( ,  )
Φ ← Shap(N ,  )
˜ ← select(Φ,  )
 ←  ∪ { ∈  |  ∈ ˜}
 ←  ∖ 
end while
N ← findModel(,  )
Φ ← Shap(N, )
* ← select(Φ, )
◁ Model Selection &amp; Training
◁ matrix of shapely values Φ ∈ R×| |
◁ remove from their corresponding cluster
most representative features were identified. We applied our algorithm for analyzing GEP of patients
from Leukemia from the Kent Ridge biomedical data repository [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. The leukemia dataset consists of
two classes of acute leukemia known as acute lymphoblastic leukemia (ALL), arising under lymphoid
precursors, and acute myeloid leukemia (AML), arising under myeloid precursors. There are 72 bone
marrow samples in the dataset with 47 ALL and 25 AML cases and each contain 7129 gene probes. We
used the proposed algorithm for training a NN to solve such a classification problem as well as to identify
a set of meaningful genes over the whole set of 7129. We additionally provide insight into the prognostic
power of such genes. The genes were initially clustered in groups based on their feature correlations.
First, we computed the correlation matrix of the features, capturing the pairwise correlations between
genes. We then applied a correlation threshold to define significant relationships between features.
Specifically, if the absolute value of the correlation between two genes exceeded a predefined threshold,
we considered them to be correlated. We then identified clusters of correlated genes by detecting the
connected components. Each connected component represents a group of genes that are strongly
correlated with each other. This method allowed us to group the genes into distinct clusters, capturing
the structure of the data without relying on predefined assumptions about the number of clusters. These
clusters were then used for further analysis and filtering. The AE filtering selects genes and we further
applied a statistical filter in order to select 50 genes. After re-balancing the classes with the Synthetic
Minority Over-sampling Technique (SMOTE), we perform model selection with 10-fold cross-validation
in order to find the best (in terms of binary accuracy on the test set) NN for solving the classification
problem.
        </p>
        <p>We finally use our SHAP scores (defined in Section 3.1) to select the most meaningful genes, by
setting  = 0.85. After selecting a set of  = 50 genes through the iterations of the algorithm, we use
them to train and explain a final NN.</p>
        <p>
          The algorithm has been implemented using the Python (v3.8.11) programming language. NNs have
been implemented by taking advantage of the Pytorch (v2.4.1) framework. XAI analysis was performed
by means of the SHAP library [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Results</title>
        <p>The overall results are reported in Table 1. In particular, for each iteration of the algorithm, we measured
the accuracy of all the models obtained during cross-validation, for which we report the confidence
interval. As we expected, the classification accuracy decreases with the algorithm iterations: the reason
is that the previously chosen features—expected to be the most representative of each cluster—are no
more considered for the subsequent analysis. An improvement in accuracy is instead reported for the
ifnal step of the algorithm, by which a model is trained using the set of genes selected during each
iteration. The accuracy of the best final model is 100%.</p>
        <p>
          Figure 3 reports, on the left side, a summarized representation of the shap values and, on the right side,
the values for correlation and intensity for the most interesting genes found by our algorithm. In this
context, it is important to compare our findings with the work of Al-Azani et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and Bennet et al
[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Al-Azani et al. conducted an empirical study utilizing a feature selection technique that combined
Chi-square (ChiS) and Information Gain (IG) methods. Their evaluation of various ensemble-based
learning models, including bagging, random forests, stacking, voting, and boosting, culminated in a
best classification accuracy of 96.88%. This study emphasizes the efectiveness of ensemble methods in
improving model performance.
        </p>
        <p>Conversely, Bennet et al. introduced a hybrid gene selection technique that integrates Support
Vector Machine-Recursive Feature Elimination (SVM-RFE) with the Based Bayes Error Filter (BBF).
Their approach involved ranking attributes with SVM-RFE and subsequently using BBF to eliminate
redundant attributes, followed by classification with the SVM algorithm. Their eforts yielded an
impressive classification accuracy of 97.2% on the Leukaemia dataset, underscoring the power of hybrid
techniques in attribute selection.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>The algorithm proposed in this work can be used as a valuable tool in genomics to identify protective
(or not) sets of genes for a disease, suggesting potential pathways for further medical investigation.
A natural direction for future development is to perform a large-scale assessment of the algorithm
performances, by using state-of-the-art benchmark GEP datasets.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The research reported in the paper was partially supported by the PNRR projects “FAIR (PE00000013)
- Spoke 9” and “Tech4You (ECS00000009) - Spoke 6”, under the NRRP MUR program funded by the
NextGenerationEU. The National Plan for NRRP Complementary Investments (PNC, established with
the decree-law 6 May 2021, n. 59, converted by law n. 101 of 2021) in the call for the funding of research
initiatives for technologies and innovative trajectories in the health and care sectors (Directorial Decree
n. 931 of 06-06-2022) - project n. PNC0000003 - AdvaNced Technologies for Human-centrEd Medicine
(project acronym: ANTHEM).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Koumakis</surname>
          </string-name>
          ,
          <article-title>Deep learning models in genomics; are we there yet?</article-title>
          ,
          <source>Computational and Structural Biotechnology Journal</source>
          <volume>18</volume>
          (
          <year>2020</year>
          )
          <fpage>1466</fpage>
          -
          <lpage>1473</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Alhenawi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Al-Sayyed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hudaib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mirjalili</surname>
          </string-name>
          ,
          <article-title>Feature selection methods on gene expression microarray data for cancer classification: A systematic review</article-title>
          ,
          <source>Computers in Biology and Medicine</source>
          <volume>140</volume>
          (
          <year>2022</year>
          )
          <fpage>105051</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bruno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Calimeri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Kitanidis</surname>
          </string-name>
          , E. De Momi,
          <article-title>Data reduction and data visualization for automatic diagnosis using gene expression and clinical data</article-title>
          ,
          <source>Artificial Intelligence in Medicine</source>
          <volume>107</volume>
          (
          <year>2020</year>
          )
          <fpage>101884</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Graham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Csicsery</surname>
          </string-name>
          , E. Stasiowski, G. Thouvenin,
          <string-name>
            <given-names>W. H.</given-names>
            <surname>Mather</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ferry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cookson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hasty</surname>
          </string-name>
          ,
          <article-title>Genome-scale transcriptional dynamics and environmental biosensing</article-title>
          ,
          <source>Proceedings of the National Academy of Sciences</source>
          <volume>117</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Meena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hasija</surname>
          </string-name>
          ,
          <article-title>Application of explainable artificial intelligence in the identification of squamous cell carcinoma biomarkers</article-title>
          ,
          <source>Computers in Biology and Medicine</source>
          <volume>146</volume>
          (
          <year>2022</year>
          )
          <fpage>105505</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Karim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Beyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Decker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lange</surname>
          </string-name>
          ,
          <article-title>Onconetexplainer: explainable predictions of cancer types based on gene expression data</article-title>
          ,
          <source>in: 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>415</fpage>
          -
          <lpage>422</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Danaee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ghaeini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Hendrix</surname>
          </string-name>
          ,
          <article-title>A deep learning approach for cancer detection and relevant gene identification</article-title>
          ,
          <source>in: Pacific symposium on biocomputing 2017</source>
          , World Scientific,
          <year>2017</year>
          , pp.
          <fpage>219</fpage>
          -
          <lpage>229</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-I.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>A unified approach to interpreting model predictions</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Kent ridge biomedical data set repository</article-title>
          . school of computer engineering, nanyang technological university, singapore,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Al-Azani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. S.</given-names>
            <surname>Alkhnbashi</surname>
          </string-name>
          , E. Ramadan,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alfarraj</surname>
          </string-name>
          ,
          <article-title>Gene expression-based cancer classification for handling the class imbalance problem and curse of dimensionality</article-title>
          ,
          <source>International Journal of Molecular Sciences</source>
          <volume>25</volume>
          (
          <year>2024</year>
          )
          <fpage>2102</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bennet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ganaprakasam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>A hybrid approach for gene selection and classification using support vector machine</article-title>
          .,
          <source>International Arab Journal of Information Technology (IAJIT) 12</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>