=Paper= {{Paper |id=Vol-2589/Paper3 |storemode=property |title=Stacked Sparse Autoencoder For Unsupervised Features Learning in PanCancer miRNA Cancer Classification |pdfUrl=https://ceur-ws.org/Vol-2589/Paper3.pdf |volume=Vol-2589 |authors=Imene Zenbout,Abdelkrim Bouramoul,Souham Meshoul |dblpUrl=https://dblp.org/rec/conf/citsc/ZenboutBM19 }} ==Stacked Sparse Autoencoder For Unsupervised Features Learning in PanCancer miRNA Cancer Classification== https://ceur-ws.org/Vol-2589/Paper3.pdf

Stacked Sparse autoencoder for unsupervised
features learning in PanCancer miRNA cancer
classification
1st Imene Zenbout 2nd Abdelkrim Bouramoul
IFA department, NTIC faculty, Constantine 2 University IFA department, NTIC faculty, Constantine 2 University
CRBT, CERIST Misc laboratory
Constantine , Algeria Constantine, Algeria
imene.zenbout@univ-constantine2.dz abdelkrim.bouramoul@univ-constantine2.dz

3rd Souham Meshoul
Princess Nourah bint Abderahmen University
Riyadh, Saudi Arabia
sbmeshoul@pnu.edu.sa

Abstract—The recent progress in cancer diagnosis is genomic and post treatment [8], which represents the main motivation
data analysis oriented. miRNA is playing an important role as of this work. The miRNA data share the same issue with gene
cancer biomarkers to move with cancer diagnosis and therapy expression data which is the very small sample size with regard
towards personalized medicine with the ultimate goal to augment
survival rate and disease prevention. The recent explosion in to the high profiles dimensionality .i.e there is some profiles
genomic data generation has motivated the use of miRNA to that are irrelevant in cancer diagnosis and related decisions
enhance diagnosis, prognosis and treatment. In this work we compared to the low number of patient samples. Obviously,
have explored the integrated Atlas PanCancer miRNA profiles, this lends itself to a dimensionality reduction problem where
using deep features learning based on unsupervised Stacked it is required to extract the miRNA signature representation
Sparse AutoEncoder (SSAE). The proposed SSAE model learns
features representation from the used data. The consistency that can be a relevant predictors in cancer diagnosis.
of the learned features has been tested using classification of In this work we propose a deep unsupervised features learning
samples according to 31 cancer types. The model performance model, based on stacking three sparse autoencoders to learn
has been compared to state-of-the-art unsupervised features new features from the initial noisy miRNA profiles inputs.
learning models. The obtained results exhibit the competitiveness The learned features through the different abstraction levels,
and promising performance of our model, where an accuracy rate
of about 95% has been achieved. have been used to train classifiers to predict the cancer type
Index Terms—Deep learning, Bioinformatics, features learning, of a specific sample according to 31 different cancer type.
Sparse autoencoders, miRNA, PanCancer. The proposed unsupervised and supervised models have been
trained on the Atlas PanCancer [9] data set. The particularity
I. I NTRODUCTION of this data set is that it combines different cancer type. This
The recent and tremendous advance in high sequencing may help us to draw information from the well explored
technologies [1] have forstred the role of genomic data across cancer type that have a big number of samples and/or a high
all the transcriptomic as a key answer to different biological correlation between the different miRNA profiles and apply
related questions and precisely in disease genetics. With these these information to classify, or understand the cancer type
new genomic and genetic data availability and transparency, with poor exploration rate. The features learning model has
miRNA role moved from noisy particles to a highly engaged been compared to some of the most known unsupervised
genomic instances in gene regulation and post protein function. features learning and dimensionality reduction models, here
This has led to a direct involving of miRNA in the occurrence we used pricipal component analysis PCA and kernel principle
or the suppression of cancer [2]. component analysis KPCA. The rest of the paper is organized
microRNA (miRNA) are classified as non-coding regulatory as follows: A literature review in section II. Section III is
genes [3], that can be found in small fragments of non-coding devoted to a brief introduction to sparse autoencoders. Section
RNA regions (about 21-23 nucleotide) [3], [4]. Since the IV describes the data set and the preprocessing steps. Our
discovery of miRNA in 1993 by R.C.Lee [5], the generation of proposal is presented in section V along with the set of
miRNA data using high throughput technologies [6], [7] to ex- experimental results and discussion.
plore the direct role of miRNA and cancer diagnosis and gene
impact become intensive. The particularity of miRNA profiles
is their ability to be a direct tool in cancer analysis, therapy
II. MI RNA CANCER C LASSIFICATION
Recently, the exploration of noncoding regions rule in
cancer diagnosis and therapy is attracting a large community
of scientists. The miRNA data set analysis using statistical
and machine learning become one of the trending problems in
bioinformatics [3]. In cancer diagnosis and classification, we
cite the work of J.Lu et all [10] where the authors analysed
mammalian miRNA using k-nearest neighbors and probabilis-
tic neural network algorithm. Kotlarchyk et al [11], used
ensemble methodology to classify different cancer type based
on miRNA profiles. A statiscical support vector machine- k- Fig. 1: Sparse Autoencoder Architecture
nearest neighbors is proposed by D. Ting-ting et al [12], where
they used t-statics to select relevent miRNA feautures and a TABLE I: Data Set description befor/after preprocessing,
combination of kNN and SVM as classifiers to distinguish number of training/ testing sets
between positive and negative samples in different cancer type Before After Cancer type
data set. For multiclass cancer classification, P.Yongjun [13] Number of patients 10824 10783 31
used subset-based ensenmble method features selection, by Number of regions 743 494 31
Training Samples Testing Samples
generating multiple miRNA subset based on the correlation
7548 3235
among miRNAs and then using classifiers to learn valuable
knoweledge from each subset to finally combine the results
of each classifier by averaging probabilities.A fuzzy normal-
ization based approach is proposed by M Anidha et al [14], L(x, g(h)) + σ(h) (1)
where the authors used relevant information gain and F-score
to select the most important features in cancer diagnosis, yet g(h) is the decoder output and h = f (x) is the encoder
in this work the experiments were for binary classification output. A detailed description of the autoencoder architecture
tasks only. A web advisor consisting of semi-supervised clas- is in sectionV. Sparse autoencoders have been intensively used
sifiers, with pearson correlation, Kappa statistics and recursive for feature learning problems in different domains, emotion
feature elimination for selecting the best miRNA profiles, detection and robotics [21], medical imaging [20] also and
was conducted by N.Cheerla et al [8] ,to perdict cancer type not only medical diagnosis [22].
and treatment recommendation based on the Atlas PanCancer IV. DATA C OLLECTION AND PREPROCESSING
data set. In paper [15], the authors used Beep belief nets
and active learning to apply multi-level gene/miRNA feature We collected the Atlas PanCancer [9] miRNA Data set
selection, and to visualize the impact between genes and used for predicting cancer type from the TCGA data base
miRNAs, and select the most discriminating miRNAs profiles, repository(10/12/2018 18:14 ). The miRNA data set have been
the paper tested the performance of the proposed approach generated using next generation sequencing on around 33
in classifying 3 cancer types. Whereas L.Fu et al [16] used types of cancer in the US hospitals. The initial miRNA data set
stacked auto-encoders to enhance cancer diagnosis and treat- consist of more than 10 thousand patients and around 800 short
ment, by building both miRNA-miRNAs and human disease- non-coding RNAs profiles. We have applied a preprocessing
disease similarities network and then use stacked autoencoder to the data matrix by eliminating the miRNA instances with
to extract the best features set from the similarity results in more than 20% zero values, also we used a log transformation
order to employit in predicting cancer type. Convolution net to eliminate the skewed data and finally data imputation to
works CNN were also used by A. L.Rincon et al [17], to replace the missing values, After we have divided our final
classify the PanCancer data types, where the authors applied data matrix to 70% samples used to train the supervised model,
Evolutionary algorithm to optimize the architecture of the and 30% samples to evaluate the performance of the trained
CNN model. classifier. Table I exhibit the data set description before and
after preprocessing and table II illustrate the distribution of
III. S PARSE AUTO E NCODERS samples on the different cancer types.

An autoencoder is a symmetric neural network, which V. SSAE FEATURES LEARNING
copies the input of the network to its output passing We can denote the tackled problem as a matrix X of
through a bottleneck layer that represents the latent features a dimension N ∗ M where N represents the number of
space(figure1). A sparse autoencoder is an autoencoder with samples and M represents the set of non-coding regions,
applying a sparsity value σ(h) on the training of the encoder where each xij corresponds to a miRNA value i of a sample
part, in addition to the reconstruction loss [18]. This sparsity j. The proposed architecture(Figure2) consist of two phases,
value will deactivate the low value nodes, which led to the a dimensionality reduction phase and a predictive phase.
extraction of more relevant features representation. In phase one we have used unsupervised features learning
TABLE II: Distribution of samples among cancer types A. First Phase
Cancer Type Number of Samples In this step, we have used SSAE to extract a new
1- BRCA 1164
2- KIRC 570 features representation, that is more accurate in multi-class
3- THCA 569 cancer diagnosis. The first sparse autoencoder SAE1 takes the
4- HNSC 565 features vector S of the matrix X of range M , and fed it to the
5- LUAD 555
6- PRAD 544 encoder, in the bottleneck layer a new latent space F1 of range
7- UCEC 542 K, where K < M is generated and based on this latent space
8- LGG 527 the decoder try to reconstruct the input S as close as possible
9- LUSC 511
10- OV 486 at the output of the decoder where S ≈ S 0 . The output S 0
11- STAD 474 of SAE1 become the input of SAE2 and the same steps are
12- SKCM 452 followed to generate a latent space F1 and the decoder try to
13- COAD 429
14- BLCA 429 reconstruct S 0 at the output of the decoder S 00 where S 0 ≈ S 00 .
15- LIHC 421 Equally S 00 is the new input of SAE3 and the bottleneck of
16- KIRP 321 the third sparse autoencoder generate the last latent features
17- CESC 311
18- SARC 260 space vector F3 . The consistency of each autoencoder and their
19- ESCA 195 final architecture settings has been evaluated by calculating the
20- LAML 188 reconstruction error loss between the input of the encoder and
21- PCPG 186
22- PAAD 182 the output of the decoder for each SAEi . In our proposal we
23- READ 155 have used the mean abselout error loss function(eq2). The
24- TGCT 138 three generated features representation[F1 , F2 , F3 ] from each
25- THYM 126
26- KICH 89 sparse auto encoder have been concatenated in one features
27- MESO 87 vector F 4 to be used to train the classifiers.
28- UVM 80 n n
29- ACC 79 X X
30- UCS 56 mae = 1/n |xi − x0i |/n = |ei |/n (2)
31- DLBC 47 i=1 i=1

While zooming on the architecture of each autoencoder in
phase one (tableIII), we describe it as follows:
* SAE1 : We have used a deep architecture, where the
encoder consist of two fully connected layers (494,250
node) with a L2 regularization as sparse penalty, a latent
space layer with 50 node that will further generate the
new space features F1 , and a symmetric decoder to
reconstruct the encoder input with 250, 494 node for each
layer respectively.
* SAE2 : Equally, we have used a deep autoencoder with
two fully connected layers of 494, 150 nodes for each
to represent the encoder, and we applied on it a sparse
L2 regularization penalty, a 50 nodes bottleneck layer
Fig. 2: Stacked Sparse Autoencoder architecture for miRNA to generate the new F2 features representation, and a
based cancer classification symmetrical decoder.
* SAE3 : In the last step we used the simple representation
of a sparse autoencoder, since our data have been purified
to train a stacked sparse autoencoders (SSAE), where we from the biggest amount of noise in the two previous
have piled three sparse autoencoders[SAE1 , SAE2 , SAE3 ], in sparse auto encoders, we need to avoid falling into the
which the input of SAEi is the output of SAEi−1 , where curse of overfitting and underfitting problem where our
the particularity of the output of autoencoders is that the data autoencoder will only copy the input to the output without
is a reconstruction of the input with less noise. The features learning a new features representation. So our SAE3
vectors generated from the three AEs has been concatenated is composed of only one fully connected sparse layer
to train a predictive models. These models are trained using to represent the encoder(494 nodes), a bottleneck layer
supervised learning to predict the cancer type. The two steps composed of 50 nodes that represent the last features
in our analytical architecture have been implemented using vector F3 , and a 494 nodes fully connected layer for the
python 3.5 and Keras [23] with tensorflow backened. The mirror decoder.
experimental results have been processed on HP-bs0xx with In order to tune each layer weights of the autoencoders(table
Intel Core i7-7500U CPU @ 2.70GHz 4 and 8 GB memory. III), we have used a Relu nonlinear function. While the
bottleneck layer has been tuned using a Softplus activation
Fig. 3: Training performance of the SSAE across each autoen-
coder

Fig. 4: Accuracy score of the classifiers on SSAE and the other
dimensionality reduction methods

function. We trained the stacked autoencoder using mini − overall accuracy score of each classifier (figure4), shows that
batch gardient descent training and Adam optimizer as the predictive models trained on the features representation
follows: extracted from SSAE are more powerful to predict the class of
1- We trained SAE1 through 200 epochs on a batch size each sample. Hence SVM/SSAE scored the highest accuracy
equals to 180 samples from the initial input data set in discriminating between the different cancer types with a
that represents the value of non-coding regions of all the performance that reaches approximately 95%. While in DT
available patients, to obtain a experimental reconstruction we can see that KPCA was able to overcome our approach
loss value of 0.56. with a difference of 0.02. In KNN and RF the performance of
2- SAE2 have been trained on 150 epochs with a batch the classifiers on each dimensionality reduction approach was
size of 150 using the reconstructed input from SAE1 , the so close with a superiority to our approach as an accuracy of
experimental reconstruction loss after training was 0.32. 92% and 89% respectively.
3- The output of SAE2 have been used to train SAE3 on Since Accuracy is not enough to evaluate a classifier, also
100 epochs with a batch size of 130, the reconstruction since our problem is a multi-class classification problem we
loss after training was 0.21. have choose other metrics to evaluate the performance of
The figure3, demonstrate the training process of each encoder, our models all along the trained classifiers. We have used
where we can see that SAE1 converged toward the best micro/macro and weighted average values to evaluate the
performance around 150 epochs while SAE2 was able to consistency of each classifiers on the prediction of each class,
stabilize around the epoch 125, whereas SAE3 converged tables[IV,V,VI,VII]. TableIV, represent the case with the best
rapidly to its best performance around the epoch 80. After performance in each classifier. We conclude from the results
training the three autoencoders we have extracted the latent that the SVM/SSAE scored the best performed model, the
space of each autoencoder and concatenate the three vectors micro average score reflect the ability of the model in pre-
as the new miRNA features space to be used in the second dicting positive samples with a high rate (95%) for both micro
phase. average precision and micro average recall. Equally the macro
average and the weighted average results are very promising
B. Second Phase despite the fact that our data are size variant. Tables[V,VII]
The second phase is for classification, where we have exhibit the overall performance of the classifiers, where our
used three classifiers to predict the class of a cancer sample features representation learning model was able to slightly
according to 31 cancer types. Support vectors machine (SVM), overcome those trained on PCA and KPCA. In tableVI, were
Decision trees(DT), Random Forest(RF), and K-nearest neigh- the case our DT/SSAE model was not able to perform better
bors were the chosen models to be trained to fulfill the than the DT/KPCA classifier. The collection of results tables
diagnosis task. The performance of the model have been exhibit the high consistency of the SSAE features. Where all
assessed through hold-out cross validation where we split along most of the classifiers our model was able to score the
our data into 70% training and 30% testing. Besides, to highest values possible, and in all the experiments we have
evaluate the performance of our SSAE in learning new features tested, PCA features were not able to perform better, than
representations, we have compared the performances of the ours, yet KNN/PCA was so close to KNN/SSAE with equal
trained classifiers with other classifiers trained on features micro average and weighted average values, here, only a small
generated by some of state of the art unsupervised dimension- difference was captured by the macro average values.
ality reduction methods, namely Principal component analysis Compared to the results published in [8] and [17], we
(PCA), and Kernel principal component analysis(KPCA). The can say that our model was very powerful in discriminating
TABLE III: Stacked Spares Autoencoders description
SAE1 SAE2 SAE3
Enc LS Dec Enc LS Dec
L1 L2 L3
Architecture L1 L2 L L01 L02 L1 L2 L L01 L02
494 50 494
494 250 50 250 494 494 150 50 150 494
Epochs 200 150 100
Batch size 180 150 130
Activation function [Relu-Softplus]
Regularizers L2(0.001) L2(0.0001) L2(0.00001)
Loss function mae mae mae
Reconstruction error 0.56 0.23 0.19

TABLE IV: SVM classifier micro/macro/weighted average between the 31 cancer types, despite the fact that some of
score ; P:Precision, R:Recall, f 1 − s : f 1 − score the cancer types samples are very low in count. Cheerla et al
Metric micro-Av macro-Av weighted-Av Acc [8] addressed this problem by eliminating the types that have
P 0.95 0.95 0.95 smaller number of patients, so they worked on only 21 cancer
SSAE R 0.95 0.92 0.95 0.947 type using semi-supervised learning to augment the accuracy
f1 − s 0.95 0.94 0.95
P 0.89 0.94 0.92
score to 97%. For A.L.Rincon et al [17], the authors also dealt
PCA R 0.89 0.79 0.89 0.894 with 29 cancer types to reach a training accuracy 96%. Also
f1 − s 0.89 0.83 0.89 we assume that by integrating more characteristics like stage
P 0.80 0.63 0.78 and gender to our analytical strategy we may improve the
KPCA R 0.80 0.59 0.80 0.803
f1 − s 0.80 0.59 .0.77 results of the 31 predicted cancer type.
VI. C ONCLUSION
TABLE V: RF classifier micro/macro/weighted average score In this paper we have implemented a stacked sparse un-
; P:Precision, R:Recall, f 1 − s : f 1 − score supervised auto encoder to learn new features representa-
Metric micro-Av macro-Av weighted-Av Acc tion that may help in promoting cancer genetic diagnosis
P 0.90 0.92 0.90 based on the short non-coding RNA regions, which plays
SSAE R 0.90 0.85 0.90 0.899
f1 − s 0.90 0.86 0.89 a significant role in silencing, regulating and managing the
P 0.87 0.90 0.88 transcription biological process in human body. The learned
PCA R 0.87 0.81 0.87 0.874 features have been evaluated through a supervised models,
f1 − s 0.87 0.82 0.86
P 0.88 0.89 0.88 where our proposed unsupervised features learning model was
KPCA R 0.88 0.83 0.86 0.881 able to generate a new discriminant data representation leading
f1 − s 0.88 0.84 .0.87 to a competitive method with regard to the state-of the art
methods. We believe that the collection of new samples or
TABLE VI: DT classifier micro/macro/weighted average score moving toward semi-supervised classification or integrating
; P:Precision, R:Recall, f 1 − s : f 1 − score some clinical information may enhance the results obtained
in this work, also the use of the PanCancer data set may give
Metric micro-Av macro-Av weighted-Av Acc
P 0.84 0.86 0.84 to our model the flexibility and the easy use on other cancer
SSAE R 0.84 0.79 0.84 0.838 types generated from different genomic data banks for further
f1 − s 0.84 0.81 0.83 research aspects.
P 0.82 0.84 0.82
PCA R 0.82 0.77 0.82 0.818 R EFERENCES
f1 − s 0.82 0.79 0.82
P 0.85 0.87 0.85 [1] F.Cristiano, P.Veltri. ”Methods and techniques for miRNA data analysis”.
KPCA R 0.85 0.80 0.85 0.851 in Microarray Data Analysis. Humana Press, New York, NY, 2015. pp
f1 − s 0.85 0.82 .0.85 11–23.
[2] S.Tam, M.S.Tsao,J.D.Mcpherson.” Optimization sof miRNA-seq data
preprocessing”. Briefings in bioinformatics, 2015, pp 950–963.
TABLE VII: KNN classifier micro/macro/weighted average [3] S.Sing, et al. ”Machine learning techniques in exploring microRNA
score ; P:Precision, R:Recall, f 1 − s : f 1 − score gene discovery, targets, and functions” in Bioinformatics in MicroRNA
Research. Humana Press, New York, NY, 2017. pp 211–224.
Metric micro-Av macro-Av weighted-Av Acc [4] P.H.Gunaratne, C.Coarfa , B.Soibam , A.Tandon . ”miRNA Data
P 0.92 0.91 0.92 Analysis: Next-Gen Sequencing”. in Fan JB. (eds) Next-Generation
SSAE R 0.92 0.91 0.92 0.923 MicroRNA Expression Profiling Technology. Methods in Molecular
f1 − s 0.92 0.91 0.92 Biology (Methods and Protocols),2012 Humana Press
P 0.92 0.92 0.92 [5] R.C.LEE,R.L.Feinbaum, V.Ambros. ”The C. elegans heterochronic gene
PCA R 0.92 0.90 0.92 0.919 lin-4 encodes small RNAs with antisense complementarity to lin-14.
f1 − s 0.92 0.90 0.92 cell”, 1993, pp 843–854.
P 0.92 0.90 0.92 [6] J.Xuan, Y.Yu , T.Qing a, L.Guo , L.Shi. Next-generation sequencing in
KPCA R 0.92 0.90 0.92 0.918 the clinic: promises and challenges. Cancer letters, 2013,pp 284–95.
f1 − s 0.92 0.90 .0.92 [7] K.R.Kukurba, S.B.Montgomery.”RNA sequencing and analysis”. Cold
Spring Harbor Protocols, 2015.
[8] N.Cheerla, O.Gevaert, ”MicroRNA based Pan-Cancer diagnosis and
treatment recommendation”. BMC bioinformatics, 2017.
[9] J.Liu, et al. ”An integrated TCGA pan-cancer clinical data resource to
drive high-quality survival outcome analytics”. Cell, 2018, pp 400–416.
[10] J.Lu, et al. ”MicroRNA expression profiles classify human cancers”.
nature, 2005.
[11] A. Kotlarchyk,Khoshgoftaar, T., Pavlovic, M., Zhuang, H., A.S Pandya.
Identification of microRNA biomarkers for cancer by combining multi-
ple feature selection techniques. Journal of Computational Methods in
Sciences and Engineering, 2011. pp 283–298.
[12] D,Ting-ting, S.Chang-ji, D.Yan-shou,B. Yi-duo. ”Analysis of miRNA
expression profile based on SVM algorithm”.in IOP Conference Series:
Earth and Environmental Science. IOP Publishing, 2018.
[13] P.Yongjun, P.Minghao, R. Keun Ho. ”Multiclass cancer classification
using a feature subset-based ensemble from microRNA expression
profiles”. Computers in biology and medicine, 2017, pp 39–44.
[14] M.Anidha; K.Premalatha, ”An application of fuzzy normalization in
miRNA data for novel feature selection in cancer classification”.
Biomed. Res, 2017, 28.9: 4187-4195.
[15] R.Ibrahim, N.A.Yousri, M.A.Ismail, N. M.El-Makky,”Multi-level
gene/MiRNA feature selection using deep belief nets and active
learning”. in 2014 36th Annual International Conference of the
IEEE Engineering in Medicine and Biology Society. IEEE, 2014. pp
3957–3960.
[16] L.Fu, Q.Peng.”A deep ensemble model to predict miRNA-disease asso-
ciation”. Scientific reports, 2017.
[17] A. L.RINCON, et al. ”Evolutionary optimization of convolutional neural
networks for cancer miRNA biomarkers classification”. Applied Soft
Computing, 2018, pp 91–100.
[18] I.Goodfellow, Y.Bengio, A.Courville. ”Deep learning”. MIT press, 2016.
[19] M.Tschannen, O.Bachem, M.Lucic, ”Recent advances in autoencoder-
based representation learning”, arXiv preprint arXiv:1812.05069, 2018.
[20] Y-D.Zhang, et al. ”Seven-layer deep neural network based on sparse au-
toencoder for voxelwise detection of cerebral microbleed”. Multimedia
Tools and Applications, 2018, pp 10521–10538.
[21] L Chen, et al. ”Softmax regression based deep sparse autoencoder
network for facial emotion recognition in human-robot interaction”.
Information Sciences, 2018, pp 49–61.
[22] C.Zhang, et al. ”Deep Sparse Autoencoder for Feature Extraction and
Diagnosis of Locomotive Adhesion Status”. Journal of Control Science
and Engineering,2018.
[23] C.François et al.”Keras”.https://keras.io.2015