<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>IICST</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>IMPROVING PERFORMANCE FOR PREDICTION OF HEPATOCELLULAR CARCINOMA USING STACKING METHOD OF SUPPORT VECTOR MACHINE</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lailil Muflikhah</string-name>
          <email>lailil@ub.ac.id</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Widodo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wayan Firdaus Mahmudy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Solimun</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Brawijaya University</institution>
          ,
          <country country="ID">Indonesia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>5</volume>
      <fpage>41</fpage>
      <lpage>50</lpage>
      <abstract>
        <p>Hepatocellular Carcinoma is a serious disease that can be caused of Hepatitis B virus-infected and lead to the death. Support Vector Machine (SVM) is a robust classifier method to predict the disease. However, unbalanced class data distribution is often effect to the performance of prediction and tend to identify the high volume class. Therefore, this research aims to ensemble methods by stacking the learning model of SVM with other classifier methods to increase the performance evaluation measurement. In the proposed method, two or three algorithms, i.e. Random Forest with k-Nearest Neighbor, and or Generalized Linear Model were ensemble to construct as a bottom-layer classifier model and were applied to the SVM model as a top-layer classifier. As a result, the performance measure of the proposed method was higher than the conventional SVM. The proposed method the accuracy rate of 89%, sensitivity of 87.2% and specificity of 82%. It slightly increased from the SVM as 2% for accuracy and sensitivity. However, the specitivity significantly increased as 82%. In nowadays, Hepatocellular Carcinoma (HCC) is making the government to give extra attention due to increasing the total amount of patients each year. The HCC was the third-ranking cause of death at 8.2% (781,531 cases) based on the Global Cancer Observatory database, the International Agency for Research on Cancer (IARC) (Global Cancer Observatory, 2019.). Many studies showed that the role of HBx in the pathogenesis of viral-induced HCC(Ali et al., 2014). By using the computational approach, the research on the patient's HCC that infected the HBx Hepatitis B virus was conducted by profiling the DNA sequence of the Hepatitis B Virus using the clustering method (Muflikhah et al., 2019). Support Vector Machine (SVM) is a robust classification method for HCC prediction. A lot of research was conducted using many types of datasets, such as image, clinical, microarray data of gene expression and DNA sequence (Ali et al., 2014a; Bai et al., 2018; Radha and Divya, 2016; Shen and Liu, 2017). However, the drawback of SVM is when applied to the huge data volume and unbalanced class. Ensemble methods are advanced techniques often used to solve complex machine learning problems. The method is a process where different and independent models as weak learners are combined to produce an outcome. The hypothesis is that combining multiple models can produce better results by decreasing generalization error. Stacking is one of the ensemble methods by combining many machine learning algorithms as a base layer of classification to predict the new data The research on the stacking machine learning model was conducted to improve accuracy in parallel computers (Gunes et al., 2017). Many other studies also conducted the ensemble construction to make a robust classifier than a single classifier to improve the accuracy and ROC (Abawajy et al., 2012; Buzhou Tang et al., 2010). Therefore, this research aims to detect the Hepatocellular Carcinoma using the nucleotide composition of HBx HepB virus using stacking learning model of machine learning to SVM. This paper is organized as follows. First, the background of this study and development of the hepatocellular carcinoma and detection of the disease in biological and computational approaches. Second, the research method including the proposed method using an ensemble method by stacking the learning model of SVM with other machine learning methods. The third section presents the results and discussion. The last section provides a conclusion and recommendations for further studies.</p>
      </abstract>
      <kwd-group>
        <kwd>hepatocellular carcinoma</kwd>
        <kwd>SVM</kwd>
        <kwd>stacking model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
    </sec>
    <sec id="sec-2">
      <title>2. RESEARCH METHOD</title>
      <p>In general, the research was conducted with several steps as shown in Figure 1. The nucleotide composition was a
result of the transformation from DNA sequence of HBx Hepatitis B virus database at URL:
https://hbvdb.ibcp.fr/HBVdb/. First, the dataset was applied to three machine learning algorithm i.e. Random
Forest, K-Nearest Neighbor (KNN), and Generalized Linear Model (GLM) including the SVM algorithm. Second,
get the correlation between SVM to other machine learning methods and make sure the correlation value is not
high. Then, applied the proposed method by stacking the the machine learning algorithms to SVM algorithm.</p>
      <sec id="sec-2-1">
        <title>Nucleotide composition of HBx HepB</title>
      </sec>
      <sec id="sec-2-2">
        <title>Apply three other machine learning methods: Random forest, KNN, and GLM</title>
      </sec>
      <sec id="sec-2-3">
        <title>Get the correlation value to SVM</title>
      </sec>
      <sec id="sec-2-4">
        <title>The Proposed Method HCC or no HCC detection</title>
        <p>
          Random forest (RF) has been widely used as a robust machine learning method for classification and regression
or other purposes. The classifier is built based on an ensemble of decision unpruned trees randomly during training
          <xref ref-type="bibr" rid="ref5">(Breiman, 2001)</xref>
          . The trees in the forest are grown using the CART method to maximum size without pruning.
This subspace random selection scheme is resembled with bagging (resampling with replacement the training data
set each time a new tree is built). It has been pointed out that the outperformance of random forests related to the
good quality of each tree together with the small correlation among the trees of the forest. For the prediction of a
new data sample, the classifier aggregates the outputs of all trees. The RF can deal with a large amount of data and
can be used when the number of variables is much larger than the number of observations
          <xref ref-type="bibr" rid="ref12">(Nguyen et al., 2015)</xref>
          .
2.2
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Generalized Linear Model</title>
      <p>Generalized Linear Models (GLM) extend the general linear model framework to address both of these issues: the
range of Y is restricted (e.g. binary, count) and the variance of Y depends on the mean. A generalized linear model
is made up of a linear predictor:   =  0 +  1 1 + ⋯ +     and two functions as this below Equation (Turner,
n.d.)((Turner, n.d.):
- A link function that describes how the mean,  (  ) =   , depends on the linear predictor  (  ) =  
- A variance function that describes how the variance, var (  ) depends on the mean  (  ) =  ( ),
where the dispersion parameter  is a constant.
2.3</p>
    </sec>
    <sec id="sec-4">
      <title>K-Nearest Neighbor (KNN)</title>
      <p>
        K-Nearest Neighbours (KNN) algorithm is a method that uses the data points are separated into several separate
classes to predict the classification of a new data point. The way in which the algorithm decides which of the points
from the training set are similar enough to be considered when choosing the class to predict for a new observation
is to pick the k closest data points to the new observation and to take the most common class among these
        <xref ref-type="bibr" rid="ref16">(Sutton,
2012)</xref>
        .
2.4
      </p>
    </sec>
    <sec id="sec-5">
      <title>Support Vector Machine (SVM)</title>
      <p>The Support Vector Machine (SVM) algorithm is initially a linear classification method that seeks the best function
of hyperplane. The function divides two classes of input space, which are then developed into non-linear classifiers
by incorporating kernel tricks in high-dimensional space. The data should be transformed into the vector space in
a high dimension. The kernel trick functions that can be used in non-linear SVM classifications are Polynomial,
Gaussian (RBF) and Sigmoid. Each label is denoted y¬i ϵ {-1, +1} for i = 1, 2, ..., n, where n is the number of data.
The label is assigned +1 and -1 classes which can be completely separated from the hyperplane as defined in</p>
      <sec id="sec-5-1">
        <title>Equation (1):</title>
      </sec>
      <sec id="sec-5-2">
        <title>The object data xi is assigned to -1 as in Equation (2).</title>
        <p>The object data xi is assigned to +1 as in Equation (3).</p>
        <p>.</p>
        <p>+  = 0
 .   +  ≤ −1
 .   +  ≥ +1
1
∥ ∥
(1)
(2)
(3)
(4)
The largest margin is calculated by maximizing the distance between the hyperplane and the nearest point
In principle, a non-linear SVM concept changes the data x that is applied to the function Φ (x) in the high
dimensional vector space. The objective function represents data in the new vector space. In the SVM, the learning
process is finding support vectors by dot product of the new vector space data. The kernel function aims to
determine a support vector for non-linear data in the SVM learning process. The kernel function can be stated as
in Equation (5)
   .  
= Φ(  ). Φ(  )
(5)
This research used the Polynomial kernel trick as shown in Equation (6).</p>
        <p>⋅  
= exp −
  − 
2 2
2
(6)</p>
        <p>
          The next step is to make predictions by implementing the Sequential Support Vector Machine method including
calculation of the Hessian matrix, iteration to reach the maximum in the least error rate or Max (| δα |) &lt;ε. After
that, the bias and similarities between the testing data and training data are calculated. As a result, it will be
obtained the positive or negative classes
          <xref ref-type="bibr" rid="ref18">(Vijayakumar and Wu, 1999)</xref>
          .
2.5
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>The Proposed Method: SVM Stacking Learning Model</title>
      <p>
        An ensemble is a set of classifiers that learn a target function, and their individual predictions are combined to
classify the new data. Stacking is an ensemble learning technique that combines multiple classification or regression
models via a meta-classifier or a meta-regressor. The base-level models are trained based on a complete training
set, then the meta-model is trained on the outputs of the base level model-like features. The base-level often consists
of different learning algorithms and therefore stacking ensembles are often heterogeneous. The algorithm in Figure
2 summarizes stacking
        <xref ref-type="bibr" rid="ref19">(Zhou, 2012)</xref>
        .
      </p>
      <p>2:
3:
5:
6:
7:
9:
11:
12:
1: Input: training data</p>
      <p>= {  ,   } =</p>
    </sec>
    <sec id="sec-7">
      <title>Output: ensemble classifier H</title>
    </sec>
    <sec id="sec-8">
      <title>Step 1: learn base-level classifiers</title>
      <p>4: for t=1 to T do</p>
      <p>Learn   based on D
end for</p>
    </sec>
    <sec id="sec-9">
      <title>Step 2: construct new data set of predictions</title>
      <p>8: for i=1 to m do
10: end for
13: return H</p>
    </sec>
    <sec id="sec-10">
      <title>Step 3: learn a meta-classifier</title>
      <p>learn H based on  
,   }, where  ′ = {  (  ), … ,   (  )}</p>
      <p>In stacking multiple layers of machine learning models are placed one over another where each of the models
passes their predictions to the model in the layer above it and the top layer model takes decisions based on the
outputs of the models in layers below it. The model predictions of various individual models are not highly
correlated with the predictions of other models.</p>
      <sec id="sec-10-1">
        <title>This research used two-layer models as shown in Figure 3. The detail is as follows:</title>
        <p>•
•</p>
        <p>The bottom-layer models (d1, d2, d3 ) that consist of Random Forest, K-NN, and Generalized Linear Model
received the original input features (x) from the nucleotide composition dataset.</p>
        <p>Top layer model, Support Vector Machine classifier, f() which takes the output of the bottom layer models
(d1, d2, d3 ) as its input and predicts the final output.</p>
      </sec>
      <sec id="sec-10-2">
        <title>Then, the out of fold predictions are used while predicting for the training data</title>
        <p>Random Forest
(d1)</p>
        <p>Output (y)
Hepatocellular carcinoma prediction
Top layer
model
Support Vector Machine</p>
        <p>f()
K-Nearest Neighbor</p>
        <p>(d2)</p>
        <p>Nucleotide
composition of
HBx HepB Virus (x)
Bottom layer
model
General Linear Model</p>
        <p>(d3)
Original input features</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>The proposed method of stacking multi-layer machine learning model</title>
    </sec>
    <sec id="sec-12">
      <title>3. RESULT AND DISCUSSION 3.1</title>
    </sec>
    <sec id="sec-13">
      <title>Data Sets</title>
      <p>
        This research used the nucleotide composition of DNA sequence - HBx Hepatitis B virus in genotype-C as data
set. The regions of DNA sequence are representative of the nucleotide compositions such as Thymine (T), Cytocine
(C), Adenine (A), Guanine (G) at the first, second, and third positions of codon. The relative frequencies of the
four nucleotides can be computed for one specific sequence or for all sequences. For the coding regions of DNA,
additional columns are presented for the nucleotide compositions at the first, second, and third codon positions
        <xref ref-type="bibr" rid="ref10">(Kumar et al., 2016)</xref>
        . In this research, we use the MEGA tools of bioinformatics software that was transformed to
nucleotide composition in percentage of codon as it was showed in Tabel 1.
      </p>
      <p>2
3
4
.</p>
      <p>n
The composition of data set used is unbalanced class distribution, with 420 of HCC and 2862 of non-HCC as
illustrated in Figure 4. The number of normal cases (non-HCC) is more than the number of carcinoma cases (HCC).</p>
      <p>True positive (tp): The cases are predicted as carcinoma, and they are actually carcinoma.
True negative (tn): The cases are predicted as not carcinoma (normal), and they are actually not
carcinoma.</p>
      <p>False-positive (fp): The cases are predicted as carcinoma, but they are actually no carcinoma.
False-negative (fn): The cases are predicted as not carcinoma, but they are actually carcinoma.</p>
      <p>
        Moreover, there are various measurements for performance evaluation, including accuracy, sensitivity,
specificity, and area under the curve (AUC), as presented in Table 3
        <xref ref-type="bibr" rid="ref7">(“Evaluating a Classification Model,” 2019)</xref>
        .
87.26%. However, the Random Forest can achieve the highest accuracy.
0.0 0.2 0.4 0.6 0.8
      </p>
      <p>Accuracy
Confidence Level: 0.95</p>
      <p>Kappa</p>
      <p>Then, the correlation among the machine learning algorithms as sub models was evaluated to define as
baselayer which is not high as shown in Table 5 and Figure 6. The highest correlation between svmPoly and Logistic
Regression (GLM) at 0.475 is not high, so that recommended as sub-model layer of classifier.
0.87
0.86</p>
      <p>Accuracy
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●</p>
      <p>In this research, there are four scenarios of stacking ensemble methods as bottom layer, details as follows:
• Stack-1: Random Forest, Linear Regression and KNN
• Stack-2: Random Forest, Linear Regression
• STACK-3: Linear Regression, KNN
• STACK-4: Random Forest, KNN</p>
      <p>By using the stacking method as bottom layer for predictor and SVM with polynomial kernel for top layer,
then the comparison of accuracy rates were shown in Figure 7. The accuracy is a measurement to know the ability
of the classifier model to predict correctly. The proposed method, stack-1, stack-2, and stack-4 had the higher
accuracy rate than the conventional SVM. The ensemble with Random Forest increased the accuracy of SVM
algorithm.
6
7
8
.
0
1
7
5
3
0
.
0
S V M</p>
      <p>Accuracy rate comparison of SVM against stacking ensemble methods</p>
      <p>Sensitivity is a measurement to predict positive carcinoma correctly even though the related data is small size
when compared to the normal data (negative class). The proposed method, stacking by ensemble method showed
the sensitivity is higher than the conventional SVM method as shown in Figure 8. It means that it was not depend
on the class distribution.
1
2
4
8
.
0
5
5
6
8
.</p>
      <p>0
5
9
0
8
.
0
S V M</p>
      <p>Furthermore, the specificity, the ability to predict in negative class, of SVM is highest as shown in Figure 9.
The SVM methods can predict a negative class (normal, non-HCC) as maximum, due to high data volume in this
class.</p>
      <p>S V M</p>
      <p>S T A C K - 1 S T A C K - 2 S T A C K - 3 S T A C K - 4</p>
      <p>Finally, the AUC of the proposed method (Stack-1, Stack-2, and Stack-4) is higher than the conventional SVM
using Polynomial kernel as shown in Fig. 10. It implies that the method can well classify to predict hepatocellular
carcinoma disease based on the DNA sequences of HBx HepB.</p>
    </sec>
    <sec id="sec-14">
      <title>4. CONCLUSION</title>
      <p>Prediction of Hepatocellular Carcinoma disease was applied using the proposed method by stacking the learning
model of Random Forest with k-Nearest Neighbor, and or Generalized Linear Model as a base-layer classifier
model to SVM as a top-layer classifier. Because Random Forest has higher accuracy than other methods, then in
stacking ensemble method, the prediction is based on the highest accuracy. In general, the performance evaluation
result of the proposed method is higher than the Support Vector Machine in the single classifier model.</p>
    </sec>
    <sec id="sec-15">
      <title>5. FUTURE WORK</title>
      <p>The proposed method needs high computation for the learning model due to the ensemble from many algorithms.
In the future, it is possible to develop the selected algorithm with fast computation for ensemble learning model.</p>
    </sec>
    <sec id="sec-16">
      <title>ACKNOWLEDGMENTS</title>
      <p>This research was financially supported by the Ministry of Research and Technology /National Agency for
Research and Innovation (RISTEK/ DIKTI) in a program of Doctoral Dissertation Research Grant.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Abawajy</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Kelarev</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>A Multi-tier Ensemble Construction of Classifiers for Phishing Email Detection and Filtering</article-title>
          , In: Cyberspace Safety and Security, Lecture Notes in Computer Science. Xiang,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Lopez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Kuo</surname>
          </string-name>
          , C.-C.J., and
          <string-name>
            <surname>Zhou</surname>
          </string-name>
          , W. (Eds.),
          <fpage>48</fpage>
          -
          <lpage>56</lpage>
          . Springer: Berlin, Heidelberg. https://doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -35362-
          <issue>8</issue>
          _
          <fpage>5</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Ali</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abdel-Hafiz</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suhail</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Al-Mars</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zakaria</surname>
            ,
            <given-names>M.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fatima</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ahmad</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Azhar</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaudhary</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Qadri</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Hepatitis B virus, HBx mutants and their role in hepatocellular carcinoma</article-title>
          .
          <source>World J. Gastroenterol. WJG 20</source>
          ,
          <fpage>10238</fpage>
          -
          <lpage>10248</lpage>
          . https://doi.org/10.3748/wjg.v20.
          <year>i30</year>
          .
          <fpage>10238</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Ali</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hussain</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sudhakr</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mahmud</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zakir</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Rajak</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2014a</year>
          ).
          <article-title>Intelligent image processing techniques for cancer progression detection, recognition and prediction in the human liver</article-title>
          ,
          <source>In: 2014 IEEE Symposium on Computational Intelligence in Healthcare and E-Health (CICARE)</source>
          ,
          <fpage>25</fpage>
          -
          <lpage>31</lpage>
          . IEEE, Orlando, FL, USA. https://doi.org/10.1109/CICARE.
          <year>2014</year>
          .7007830
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Bai</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            , and
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Deep sequencing of HBV pre-S region reveals high heterogeneity of HBV genotypes and associations of word pattern frequencies with HCC</article-title>
          .
          <source>PLOS Genet</source>
          .
          <volume>14</volume>
          , e1007206. https://doi.org/10.1371/journal.pgen.1007206
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2001</year>
          ).
          <source>Random Forests. Mach. Learn</source>
          .
          <volume>45</volume>
          ,
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          . https://doi.org/10.1023/A:1010933404324
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Buzhou</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qingcai</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xuan</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Xiaolong</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Reranking for Stacking Ensemble Learning</article-title>
          ,
          <source>In: International Conference on Neural Information Processing ICONIP 2010: Neural Information Processing. Theory and Algorithms</source>
          ,
          <fpage>575</fpage>
          -
          <lpage>584</lpage>
          . LNCS 6443. Springer: Berlin/Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>Evaluating a Classification Model</article-title>
          . (
          <year>2019</year>
          ). http://www.ritchieng.
          <source>com/machine-learning-evaluate-classificationmodel/ (last accessed on November 01</source>
          ,
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Global</given-names>
            <surname>Cancer Observatory</surname>
          </string-name>
          (
          <year>2019</year>
          ). http://gco.iarc.
          <source>fr/ (last accessed on July 10</source>
          ,
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Gunes</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolfinger</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Tan</surname>
          </string-name>
          , P.-Y. (
          <year>2017</year>
          ).
          <article-title>Stacked Ensemble Models for Improved Prediction Accuracy</article-title>
          ,
          <source>In: SAS Global Forum</source>
          <year>2017</year>
          ,
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stecher</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Tamura</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <source>MEGA7: Molecular Evolutionary Genetics Analysis Version 7</source>
          .0 for
          <string-name>
            <given-names>Bigger</given-names>
            <surname>Datasets</surname>
          </string-name>
          .
          <source>Mol. Biol</source>
          . Evol.,
          <volume>33</volume>
          ,
          <fpage>1870</fpage>
          -
          <lpage>1874</lpage>
          . https://doi.org/10.1093/molbev/msw054
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Muflikhah</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Widodo</surname>
            , Mahmudy,
            <given-names>W.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solimun</surname>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>DNA Sequence of Hepatitis B Virus Clustering Using Hierarchical k-Means Algorithm</article-title>
          .
          <source>Presented at 6th IEEE International Conference on Engineering Technologies and Applied Sciences (ICETAS)</source>
          ,
          <source>December 20-21</source>
          ,
          <year>2019</year>
          ,
          <string-name>
            <given-names>Kuala</given-names>
            <surname>Lumpur</surname>
          </string-name>
          , Malaysia. https://icetas.etssm.
          <source>org/ (last accessed on October 07</source>
          ,
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Nguyen</surname>
          </string-name>
          , T.-T.,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>J.Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
          </string-name>
          , T.T., and
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests</article-title>
          .
          <source>BMC Genomics</source>
          ,
          <volume>16</volume>
          , S5. https://doi.org/10.1186/
          <fpage>1471</fpage>
          -2164-16-S2-S5
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Radha</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Divya</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Multiple time series clinical data with frequency measurement and feature selection</article-title>
          ,
          <source>In: 2016 IEEE International Conference on Advances in Computer Applications (ICACA)</source>
          ,
          <fpage>250</fpage>
          -
          <lpage>254</lpage>
          . https://doi.org/10.1109/ICACA.
          <year>2016</year>
          .7887960
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Identifying module biomarkers of hepatocellular carcinoma from gene expression data</article-title>
          ,
          <source>In: 2017 Chinese Automation Congress (CAC)</source>
          ,
          <fpage>5404</fpage>
          -
          <lpage>5407</lpage>
          . IEEE: New York NY. https://doi.org/10.1109/CAC.
          <year>2017</year>
          .8243741
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <article-title>Simple guide to confusion matrix terminology (</article-title>
          <year>2014</year>
          ).
          <article-title>Data Sch</article-title>
          . URL https://www.dataschool.
          <article-title>io/simple-guideto-confusion-matrix-terminology/ (last accessed on October 30,</article-title>
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Sutton</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Introduction to k Nearest Neighbour Classification and Condensed Nearest Neighbour Data Reduction 10</article-title>
          . (accessible at: http://www.math.le.ac.uk/people/ag153/homepage/KNN/OliverKNN_Talk.pdf)
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Turner</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2008</year>
          ). Introduction to Generalized
          <source>Linear Models</source>
          <volume>52</volume>
          . (accessible at: https://statmath.wu.ac.at/courses/heather_turner/glmCourse_001.pdf)
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Vijayakumar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>Sequential Support Vector Classifiers and Regression</article-title>
          ,
          <source>In: Proc. Int. Conf. on Soft Computing</source>
          ,
          <volume>5</volume>
          ,
          <fpage>610</fpage>
          -
          <lpage>619</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>Z.-H.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Ensemble Methods: Foundations and Algorithms</article-title>
          . Chapman &amp; Hall/CRC.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>