Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)


        Stacked Sparse autoencoder for unsupervised
       features learning in PanCancer miRNA cancer
                        classification
                          1st Imene Zenbout                                                  2nd Abdelkrim Bouramoul
      IFA department, NTIC faculty, Constantine 2 University                 IFA department, NTIC faculty, Constantine 2 University
                         CRBT, CERIST                                                           Misc laboratory
                      Constantine , Algeria                                                   Constantine, Algeria
               imene.zenbout@univ-constantine2.dz                                  abdelkrim.bouramoul@univ-constantine2.dz

            3rd Souham Meshoul
Princess Nourah bint Abderahmen University
           Riyadh, Saudi Arabia
          sbmeshoul@pnu.edu.sa


   Abstract—The recent progress in cancer diagnosis is genomic          and post treatment [8], which represents the main motivation
data analysis oriented. miRNA is playing an important role as           of this work. The miRNA data share the same issue with gene
cancer biomarkers to move with cancer diagnosis and therapy             expression data which is the very small sample size with regard
towards personalized medicine with the ultimate goal to augment
survival rate and disease prevention. The recent explosion in           to the high profiles dimensionality .i.e there is some profiles
genomic data generation has motivated the use of miRNA to               that are irrelevant in cancer diagnosis and related decisions
enhance diagnosis, prognosis and treatment. In this work we             compared to the low number of patient samples. Obviously,
have explored the integrated Atlas PanCancer miRNA profiles,            this lends itself to a dimensionality reduction problem where
using deep features learning based on unsupervised Stacked              it is required to extract the miRNA signature representation
Sparse AutoEncoder (SSAE). The proposed SSAE model learns
features representation from the used data. The consistency             that can be a relevant predictors in cancer diagnosis.
of the learned features has been tested using classification of         In this work we propose a deep unsupervised features learning
samples according to 31 cancer types. The model performance             model, based on stacking three sparse autoencoders to learn
has been compared to state-of-the-art unsupervised features             new features from the initial noisy miRNA profiles inputs.
learning models. The obtained results exhibit the competitiveness       The learned features through the different abstraction levels,
and promising performance of our model, where an accuracy rate
of about 95% has been achieved.                                         have been used to train classifiers to predict the cancer type
   Index Terms—Deep learning, Bioinformatics, features learning,        of a specific sample according to 31 different cancer type.
Sparse autoencoders, miRNA, PanCancer.                                  The proposed unsupervised and supervised models have been
                                                                        trained on the Atlas PanCancer [9] data set. The particularity
                       I. I NTRODUCTION                                 of this data set is that it combines different cancer type. This
   The recent and tremendous advance in high sequencing                 may help us to draw information from the well explored
technologies [1] have forstred the role of genomic data across          cancer type that have a big number of samples and/or a high
all the transcriptomic as a key answer to different biological          correlation between the different miRNA profiles and apply
related questions and precisely in disease genetics. With these         these information to classify, or understand the cancer type
new genomic and genetic data availability and transparency,             with poor exploration rate. The features learning model has
miRNA role moved from noisy particles to a highly engaged               been compared to some of the most known unsupervised
genomic instances in gene regulation and post protein function.         features learning and dimensionality reduction models, here
This has led to a direct involving of miRNA in the occurrence           we used pricipal component analysis PCA and kernel principle
or the suppression of cancer [2].                                       component analysis KPCA. The rest of the paper is organized
microRNA (miRNA) are classified as non-coding regulatory                as follows: A literature review in section II. Section III is
genes [3], that can be found in small fragments of non-coding           devoted to a brief introduction to sparse autoencoders. Section
RNA regions (about 21-23 nucleotide) [3], [4]. Since the                IV describes the data set and the preprocessing steps. Our
discovery of miRNA in 1993 by R.C.Lee [5], the generation of            proposal is presented in section V along with the set of
miRNA data using high throughput technologies [6], [7] to ex-           experimental results and discussion.
plore the direct role of miRNA and cancer diagnosis and gene
impact become intensive. The particularity of miRNA profiles
is their ability to be a direct tool in cancer analysis, therapy
           II. MI RNA CANCER C LASSIFICATION
   Recently, the exploration of noncoding regions rule in
cancer diagnosis and therapy is attracting a large community
of scientists. The miRNA data set analysis using statistical
and machine learning become one of the trending problems in
bioinformatics [3]. In cancer diagnosis and classification, we
cite the work of J.Lu et all [10] where the authors analysed
mammalian miRNA using k-nearest neighbors and probabilis-
tic neural network algorithm. Kotlarchyk et al [11], used
ensemble methodology to classify different cancer type based
on miRNA profiles. A statiscical support vector machine- k-                   Fig. 1: Sparse Autoencoder Architecture
nearest neighbors is proposed by D. Ting-ting et al [12], where
they used t-statics to select relevent miRNA feautures and a        TABLE I: Data Set description befor/after preprocessing,
combination of kNN and SVM as classifiers to distinguish            number of training/ testing sets
between positive and negative samples in different cancer type                                 Before   After    Cancer type
data set. For multiclass cancer classification, P.Yongjun [13]              Number of patients 10824    10783        31
used subset-based ensenmble method features selection, by                   Number of regions    743     494         31
                                                                                Training Samples          Testing Samples
generating multiple miRNA subset based on the correlation
                                                                                       7548                     3235
among miRNAs and then using classifiers to learn valuable
knoweledge from each subset to finally combine the results
of each classifier by averaging probabilities.A fuzzy normal-
ization based approach is proposed by M Anidha et al [14],                                L(x, g(h)) + σ(h)                     (1)
where the authors used relevant information gain and F-score
to select the most important features in cancer diagnosis, yet      g(h) is the decoder output and h = f (x) is the encoder
in this work the experiments were for binary classification         output. A detailed description of the autoencoder architecture
tasks only. A web advisor consisting of semi-supervised clas-       is in sectionV. Sparse autoencoders have been intensively used
sifiers, with pearson correlation, Kappa statistics and recursive   for feature learning problems in different domains, emotion
feature elimination for selecting the best miRNA profiles,          detection and robotics [21], medical imaging [20] also and
was conducted by N.Cheerla et al [8] ,to perdict cancer type        not only medical diagnosis [22].
and treatment recommendation based on the Atlas PanCancer                  IV. DATA C OLLECTION AND PREPROCESSING
data set. In paper [15], the authors used Beep belief nets
and active learning to apply multi-level gene/miRNA feature            We collected the Atlas PanCancer [9] miRNA Data set
selection, and to visualize the impact between genes and            used for predicting cancer type from the TCGA data base
miRNAs, and select the most discriminating miRNAs profiles,         repository(10/12/2018 18:14 ). The miRNA data set have been
the paper tested the performance of the proposed approach           generated using next generation sequencing on around 33
in classifying 3 cancer types. Whereas L.Fu et al [16] used         types of cancer in the US hospitals. The initial miRNA data set
stacked auto-encoders to enhance cancer diagnosis and treat-        consist of more than 10 thousand patients and around 800 short
ment, by building both miRNA-miRNAs and human disease-              non-coding RNAs profiles. We have applied a preprocessing
disease similarities network and then use stacked autoencoder       to the data matrix by eliminating the miRNA instances with
to extract the best features set from the similarity results in     more than 20% zero values, also we used a log transformation
order to employit in predicting cancer type. Convolution net        to eliminate the skewed data and finally data imputation to
works CNN were also used by A. L.Rincon et al [17], to              replace the missing values, After we have divided our final
classify the PanCancer data types, where the authors applied        data matrix to 70% samples used to train the supervised model,
Evolutionary algorithm to optimize the architecture of the          and 30% samples to evaluate the performance of the trained
CNN model.                                                          classifier. Table I exhibit the data set description before and
                                                                    after preprocessing and table II illustrate the distribution of
               III. S PARSE AUTO E NCODERS                          samples on the different cancer types.

   An autoencoder is a symmetric neural network, which                            V. SSAE FEATURES LEARNING
copies the input of the network to its output passing                    We can denote the tackled problem as a matrix X of
through a bottleneck layer that represents the latent features      a dimension N ∗ M where N represents the number of
space(figure1). A sparse autoencoder is an autoencoder with         samples and M represents the set of non-coding regions,
applying a sparsity value σ(h) on the training of the encoder       where each xij corresponds to a miRNA value i of a sample
part, in addition to the reconstruction loss [18]. This sparsity    j. The proposed architecture(Figure2) consist of two phases,
value will deactivate the low value nodes, which led to the         a dimensionality reduction phase and a predictive phase.
extraction of more relevant features representation.                In phase one we have used unsupervised features learning
   TABLE II: Distribution of samples among cancer types            A. First Phase
                    Cancer Type   Number of Samples                      In this step, we have used SSAE to extract a new
              1-      BRCA             1164
              2-       KIRC             570                        features representation, that is more accurate in multi-class
              3-      THCA              569                        cancer diagnosis. The first sparse autoencoder SAE1 takes the
              4-      HNSC              565                        features vector S of the matrix X of range M , and fed it to the
              5-      LUAD              555
              6-      PRAD              544                        encoder, in the bottleneck layer a new latent space F1 of range
              7-      UCEC              542                        K, where K < M is generated and based on this latent space
              8-       LGG              527                        the decoder try to reconstruct the input S as close as possible
              9-      LUSC              511
              10-       OV              486                        at the output of the decoder where S ≈ S 0 . The output S 0
              11-     STAD              474                        of SAE1 become the input of SAE2 and the same steps are
              12-     SKCM              452                        followed to generate a latent space F1 and the decoder try to
              13-     COAD              429
              14-     BLCA              429                        reconstruct S 0 at the output of the decoder S 00 where S 0 ≈ S 00 .
              15-      LIHC             421                        Equally S 00 is the new input of SAE3 and the bottleneck of
              16-      KIRP             321                        the third sparse autoencoder generate the last latent features
              17-     CESC              311
              18-     SARC              260                        space vector F3 . The consistency of each autoencoder and their
              19-     ESCA              195                        final architecture settings has been evaluated by calculating the
              20-     LAML              188                        reconstruction error loss between the input of the encoder and
              21-     PCPG              186
              22-     PAAD              182                        the output of the decoder for each SAEi . In our proposal we
              23-     READ              155                        have used the mean abselout error loss function(eq2). The
              24-     TGCT              138                        three generated features representation[F1 , F2 , F3 ] from each
              25-     THYM              126
              26-     KICH               89                        sparse auto encoder have been concatenated in one features
              27-     MESO               87                        vector F 4 to be used to train the classifiers.
              28-      UVM               80                                                  n                  n
              29-      ACC               79                                                 X                   X
              30-      UCS               56                                    mae = 1/n        |xi − x0i |/n =     |ei |/n      (2)
              31-     DLBC               47                                                 i=1                 i=1

                                                                   While zooming on the architecture of each autoencoder in
                                                                   phase one (tableIII), we describe it as follows:
                                                                      * SAE1 : We have used a deep architecture, where the
                                                                         encoder consist of two fully connected layers (494,250
                                                                         node) with a L2 regularization as sparse penalty, a latent
                                                                         space layer with 50 node that will further generate the
                                                                         new space features F1 , and a symmetric decoder to
                                                                         reconstruct the encoder input with 250, 494 node for each
                                                                         layer respectively.
                                                                      * SAE2 : Equally, we have used a deep autoencoder with
                                                                         two fully connected layers of 494, 150 nodes for each
                                                                         to represent the encoder, and we applied on it a sparse
                                                                         L2 regularization penalty, a 50 nodes bottleneck layer
Fig. 2: Stacked Sparse Autoencoder architecture for miRNA                to generate the new F2 features representation, and a
based cancer classification                                              symmetrical decoder.
                                                                      * SAE3 : In the last step we used the simple representation
                                                                         of a sparse autoencoder, since our data have been purified
to train a stacked sparse autoencoders (SSAE), where we                  from the biggest amount of noise in the two previous
have piled three sparse autoencoders[SAE1 , SAE2 , SAE3 ], in            sparse auto encoders, we need to avoid falling into the
which the input of SAEi is the output of SAEi−1 , where                  curse of overfitting and underfitting problem where our
the particularity of the output of autoencoders is that the data         autoencoder will only copy the input to the output without
is a reconstruction of the input with less noise. The features           learning a new features representation. So our SAE3
vectors generated from the three AEs has been concatenated               is composed of only one fully connected sparse layer
to train a predictive models. These models are trained using             to represent the encoder(494 nodes), a bottleneck layer
supervised learning to predict the cancer type. The two steps            composed of 50 nodes that represent the last features
in our analytical architecture have been implemented using               vector F3 , and a 494 nodes fully connected layer for the
python 3.5 and Keras [23] with tensorflow backened. The                  mirror decoder.
experimental results have been processed on HP-bs0xx with          In order to tune each layer weights of the autoencoders(table
Intel Core i7-7500U CPU @ 2.70GHz 4 and 8 GB memory.               III), we have used a Relu nonlinear function. While the
                                                                   bottleneck layer has been tuned using a Softplus activation
Fig. 3: Training performance of the SSAE across each autoen-
coder


                                                                    Fig. 4: Accuracy score of the classifiers on SSAE and the other
                                                                    dimensionality reduction methods


function. We trained the stacked autoencoder using mini −           overall accuracy score of each classifier (figure4), shows that
batch gardient descent training and Adam optimizer as               the predictive models trained on the features representation
follows:                                                            extracted from SSAE are more powerful to predict the class of
  1- We trained SAE1 through 200 epochs on a batch size             each sample. Hence SVM/SSAE scored the highest accuracy
      equals to 180 samples from the initial input data set         in discriminating between the different cancer types with a
      that represents the value of non-coding regions of all the    performance that reaches approximately 95%. While in DT
      available patients, to obtain a experimental reconstruction   we can see that KPCA was able to overcome our approach
      loss value of 0.56.                                           with a difference of 0.02. In KNN and RF the performance of
  2- SAE2 have been trained on 150 epochs with a batch              the classifiers on each dimensionality reduction approach was
      size of 150 using the reconstructed input from SAE1 , the     so close with a superiority to our approach as an accuracy of
      experimental reconstruction loss after training was 0.32.     92% and 89% respectively.
  3- The output of SAE2 have been used to train SAE3 on              Since Accuracy is not enough to evaluate a classifier, also
      100 epochs with a batch size of 130, the reconstruction       since our problem is a multi-class classification problem we
      loss after training was 0.21.                                 have choose other metrics to evaluate the performance of
The figure3, demonstrate the training process of each encoder,      our models all along the trained classifiers. We have used
where we can see that SAE1 converged toward the best                micro/macro and weighted average values to evaluate the
performance around 150 epochs while SAE2 was able to                consistency of each classifiers on the prediction of each class,
stabilize around the epoch 125, whereas SAE3 converged              tables[IV,V,VI,VII]. TableIV, represent the case with the best
rapidly to its best performance around the epoch 80. After          performance in each classifier. We conclude from the results
training the three autoencoders we have extracted the latent        that the SVM/SSAE scored the best performed model, the
space of each autoencoder and concatenate the three vectors         micro average score reflect the ability of the model in pre-
as the new miRNA features space to be used in the second            dicting positive samples with a high rate (95%) for both micro
phase.                                                              average precision and micro average recall. Equally the macro
                                                                    average and the weighted average results are very promising
B. Second Phase                                                     despite the fact that our data are size variant. Tables[V,VII]
   The second phase is for classification, where we have            exhibit the overall performance of the classifiers, where our
used three classifiers to predict the class of a cancer sample      features representation learning model was able to slightly
according to 31 cancer types. Support vectors machine (SVM),        overcome those trained on PCA and KPCA. In tableVI, were
Decision trees(DT), Random Forest(RF), and K-nearest neigh-         the case our DT/SSAE model was not able to perform better
bors were the chosen models to be trained to fulfill the            than the DT/KPCA classifier. The collection of results tables
diagnosis task. The performance of the model have been              exhibit the high consistency of the SSAE features. Where all
assessed through hold-out cross validation where we split           along most of the classifiers our model was able to score the
our data into 70% training and 30% testing. Besides, to             highest values possible, and in all the experiments we have
evaluate the performance of our SSAE in learning new features       tested, PCA features were not able to perform better, than
representations, we have compared the performances of the           ours, yet KNN/PCA was so close to KNN/SSAE with equal
trained classifiers with other classifiers trained on features      micro average and weighted average values, here, only a small
generated by some of state of the art unsupervised dimension-       difference was captured by the macro average values.
ality reduction methods, namely Principal component analysis          Compared to the results published in [8] and [17], we
(PCA), and Kernel principal component analysis(KPCA). The           can say that our model was very powerful in discriminating
                                     TABLE III: Stacked Spares Autoencoders description
                                              SAE1                            SAE2                            SAE3
                                     Enc       LS      Dec           Enc        LS        Dec
                                                                                                        L1      L2    L3
               Architecture        L1     L2    L  L01     L02    L1     L2      L    L01     L02
                                                                                                        494     50    494
                                   494    250  50  250    494     494 150       50   150 494
                  Epochs                       200                              150                            100
                Batch size                     180                              150                            130
            Activation function                                    [Relu-Softplus]
               Regularizers                  L2(0.001)                      L2(0.0001)                     L2(0.00001)
              Loss function                    mae                             mae                            mae
            Reconstruction error               0.56                            0.23                           0.19


TABLE IV: SVM classifier micro/macro/weighted average            between the 31 cancer types, despite the fact that some of
score ; P:Precision, R:Recall, f 1 − s : f 1 − score             the cancer types samples are very low in count. Cheerla et al
          Metric   micro-Av    macro-Av    weighted-Av   Acc     [8] addressed this problem by eliminating the types that have
            P        0.95        0.95          0.95              smaller number of patients, so they worked on only 21 cancer
   SSAE     R        0.95        0.92          0.95      0.947   type using semi-supervised learning to augment the accuracy
          f1 − s     0.95        0.94          0.95
            P        0.89        0.94          0.92
                                                                 score to 97%. For A.L.Rincon et al [17], the authors also dealt
   PCA      R        0.89        0.79          0.89      0.894   with 29 cancer types to reach a training accuracy 96%. Also
          f1 − s     0.89        0.83          0.89              we assume that by integrating more characteristics like stage
            P        0.80        0.63          0.78              and gender to our analytical strategy we may improve the
  KPCA      R        0.80        0.59          0.80      0.803
          f1 − s     0.80        0.59         .0.77              results of the 31 predicted cancer type.
                                                                                          VI. C ONCLUSION
TABLE V: RF classifier micro/macro/weighted average score           In this paper we have implemented a stacked sparse un-
; P:Precision, R:Recall, f 1 − s : f 1 − score                   supervised auto encoder to learn new features representa-
          Metric   micro-Av    macro-Av    weighted-Av   Acc     tion that may help in promoting cancer genetic diagnosis
            P        0.90        0.92          0.90              based on the short non-coding RNA regions, which plays
   SSAE     R        0.90        0.85          0.90      0.899
          f1 − s     0.90        0.86          0.89              a significant role in silencing, regulating and managing the
            P        0.87        0.90          0.88              transcription biological process in human body. The learned
   PCA      R        0.87        0.81          0.87      0.874   features have been evaluated through a supervised models,
          f1 − s     0.87        0.82          0.86
            P        0.88        0.89          0.88              where our proposed unsupervised features learning model was
  KPCA      R        0.88        0.83          0.86      0.881   able to generate a new discriminant data representation leading
          f1 − s     0.88        0.84         .0.87              to a competitive method with regard to the state-of the art
                                                                 methods. We believe that the collection of new samples or
TABLE VI: DT classifier micro/macro/weighted average score       moving toward semi-supervised classification or integrating
; P:Precision, R:Recall, f 1 − s : f 1 − score                   some clinical information may enhance the results obtained
                                                                 in this work, also the use of the PanCancer data set may give
          Metric   micro-Av    macro-Av    weighted-Av   Acc
            P        0.84        0.86          0.84              to our model the flexibility and the easy use on other cancer
   SSAE     R        0.84        0.79          0.84      0.838   types generated from different genomic data banks for further
          f1 − s     0.84        0.81          0.83              research aspects.
            P        0.82        0.84          0.82
   PCA      R        0.82        0.77          0.82      0.818                               R EFERENCES
          f1 − s     0.82        0.79          0.82
            P        0.85        0.87          0.85              [1] F.Cristiano, P.Veltri. ”Methods and techniques for miRNA data analysis”.
  KPCA      R        0.85        0.80          0.85      0.851       in Microarray Data Analysis. Humana Press, New York, NY, 2015. pp
          f1 − s     0.85        0.82         .0.85                  11–23.
                                                                 [2] S.Tam, M.S.Tsao,J.D.Mcpherson.” Optimization sof miRNA-seq data
                                                                     preprocessing”. Briefings in bioinformatics, 2015, pp 950–963.
TABLE VII: KNN classifier micro/macro/weighted average           [3] S.Sing, et al. ”Machine learning techniques in exploring microRNA
score ; P:Precision, R:Recall, f 1 − s : f 1 − score                 gene discovery, targets, and functions” in Bioinformatics in MicroRNA
                                                                     Research. Humana Press, New York, NY, 2017. pp 211–224.
          Metric   micro-Av    macro-Av    weighted-Av   Acc     [4] P.H.Gunaratne, C.Coarfa , B.Soibam , A.Tandon . ”miRNA Data
            P        0.92        0.91          0.92                  Analysis: Next-Gen Sequencing”. in Fan JB. (eds) Next-Generation
   SSAE     R        0.92        0.91          0.92      0.923       MicroRNA Expression Profiling Technology. Methods in Molecular
          f1 − s     0.92        0.91          0.92                  Biology (Methods and Protocols),2012 Humana Press
            P        0.92        0.92          0.92              [5] R.C.LEE,R.L.Feinbaum, V.Ambros. ”The C. elegans heterochronic gene
   PCA      R        0.92        0.90          0.92      0.919       lin-4 encodes small RNAs with antisense complementarity to lin-14.
          f1 − s     0.92        0.90          0.92                  cell”, 1993, pp 843–854.
            P        0.92        0.90          0.92              [6] J.Xuan, Y.Yu , T.Qing a, L.Guo , L.Shi. Next-generation sequencing in
  KPCA      R        0.92        0.90          0.92      0.918       the clinic: promises and challenges. Cancer letters, 2013,pp 284–95.
          f1 − s     0.92        0.90         .0.92              [7] K.R.Kukurba, S.B.Montgomery.”RNA sequencing and analysis”. Cold
                                                                     Spring Harbor Protocols, 2015.
 [8] N.Cheerla, O.Gevaert, ”MicroRNA based Pan-Cancer diagnosis and
     treatment recommendation”. BMC bioinformatics, 2017.
 [9] J.Liu, et al. ”An integrated TCGA pan-cancer clinical data resource to
     drive high-quality survival outcome analytics”. Cell, 2018, pp 400–416.
[10] J.Lu, et al. ”MicroRNA expression profiles classify human cancers”.
     nature, 2005.
[11] A. Kotlarchyk,Khoshgoftaar, T., Pavlovic, M., Zhuang, H., A.S Pandya.
     Identification of microRNA biomarkers for cancer by combining multi-
     ple feature selection techniques. Journal of Computational Methods in
     Sciences and Engineering, 2011. pp 283–298.
[12] D,Ting-ting, S.Chang-ji, D.Yan-shou,B. Yi-duo. ”Analysis of miRNA
     expression profile based on SVM algorithm”.in IOP Conference Series:
     Earth and Environmental Science. IOP Publishing, 2018.
[13] P.Yongjun, P.Minghao, R. Keun Ho. ”Multiclass cancer classification
     using a feature subset-based ensemble from microRNA expression
     profiles”. Computers in biology and medicine, 2017, pp 39–44.
[14] M.Anidha; K.Premalatha, ”An application of fuzzy normalization in
     miRNA data for novel feature selection in cancer classification”.
     Biomed. Res, 2017, 28.9: 4187-4195.
[15] R.Ibrahim, N.A.Yousri, M.A.Ismail, N. M.El-Makky,”Multi-level
     gene/MiRNA feature selection using deep belief nets and active
     learning”. in 2014 36th Annual International Conference of the
     IEEE Engineering in Medicine and Biology Society. IEEE, 2014. pp
     3957–3960.
[16] L.Fu, Q.Peng.”A deep ensemble model to predict miRNA-disease asso-
     ciation”. Scientific reports, 2017.
[17] A. L.RINCON, et al. ”Evolutionary optimization of convolutional neural
     networks for cancer miRNA biomarkers classification”. Applied Soft
     Computing, 2018, pp 91–100.
[18] I.Goodfellow, Y.Bengio, A.Courville. ”Deep learning”. MIT press, 2016.
[19] M.Tschannen, O.Bachem, M.Lucic, ”Recent advances in autoencoder-
     based representation learning”, arXiv preprint arXiv:1812.05069, 2018.
[20] Y-D.Zhang, et al. ”Seven-layer deep neural network based on sparse au-
     toencoder for voxelwise detection of cerebral microbleed”. Multimedia
     Tools and Applications, 2018, pp 10521–10538.
[21] L Chen, et al. ”Softmax regression based deep sparse autoencoder
     network for facial emotion recognition in human-robot interaction”.
     Information Sciences, 2018, pp 49–61.
[22] C.Zhang, et al. ”Deep Sparse Autoencoder for Feature Extraction and
     Diagnosis of Locomotive Adhesion Status”. Journal of Control Science
     and Engineering,2018.
[23] C.François et al.”Keras”.https://keras.io.2015