<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Rotation Forest in Software Defect Prediction •</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Rotation Forest in Software Defect Prediction </article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Additional Key Words</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Phrases: Rotation Forest</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Random Forest</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Logistic Regression</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Software Defect Prediction</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <volume>5</volume>
      <issue>37</issue>
      <fpage>5</fpage>
      <lpage>13</lpage>
      <abstract>
        <p>Software Defect Prediction (SDP) deals with localization of potentially faulty areas of the source code. Classification models are the main tool for performing the prediction and the search for a model of utmost performance is an ongoing activity. This paper explores the performance of Rotation Forest classification algorithm in the SDP problem domain. Rotation Forest is a novel algorithm that exhibited excellent performance in several studies. However, it was not systematically used in the SDP. Furthermore, it is very important to perform the case studies in various contexts. This study uses 5 subsequent releases of Eclipse JDT as the objects of the analysis. The performance evaluation is based on comparison with two other, known classification models that exhibited very good performance so far. The results of our case study concur with other studies that recognize the Rotation forest to be the state of the art classification algorithm. Software Defect Prediction (SDP) is an evolving research area that aims to improve the software quality assurance activities. It is in search for an effective predictive model that could lead the testing resource allocation towards the software modules that are more likely to contain defects. Empirical studies proved that there is certain regularity in defect distribution. It follows the Pareto principle, meaning that minority of source code (20%) is responsible for majority of defects (80%) [Galinac Grbac et al., 2013]. Many classification models have been used for SDP, with various outcomes. It is important to emphasize that the context of data source may be the cause of inconsistent results in software engineering research [Galinac Grbac and Huljenić, 2014]. Therefore, we need to perform a large number of systematically defined case studies with as much data from various domains in order to achieve generalizability of results. In this paper, we examine the potential of Rotation Forest (RttFor), a novel classification model. The RttFor achieved some promising results in classification problem domain [Rodríguez et al., 2006]. However, its potential is scarcely examined in the SDP research area. That is why we compare its performances with two known classification models, the Logistic Regression (LogReg) and the Random Forest (RndFor). LogReg is an example of a reliable classification model of good performance in many application domains. RndFor is another novel approach that showed promising results and in many cases outperformed other classification models [Lessmann et al., 2008]. RttFor is similar to RndFor because it uses a number of decision trees to make the prediction. But unlike RndFor, each decision tree uses all the features. Furthermore, it weights each feature using the principal component analysis (PCA) method upon randomly selected groups of input features. That way it maximizes the variance between features and achieves better performance [Amasyali and Ersoy, 2014].</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        We perform the comparison in the context of five subsequent releases of an open source projects for
java development, the Eclipse JDT. The obtained results are evaluated in terms of Accuracy, TPR, FPR,
F-measure, Kappa statistics and AUC. The paired T-test indicated RttFor to perform equally good or
significantly better than the other two classifiers in all the evaluation metrics, with the exception of TPR.
Our findings are consistent with other few case studies that obtained favorable results when comparing
RttFor to other classification and even regression models [Pardo e
        <xref ref-type="bibr" rid="ref1">t al., 2013</xref>
        ].
      </p>
      <p>The structure of this paper is as following: Section 2 provides the promising results achieved by RttFor
in classification domain that motivated this study. Section 3 presents the algorithms of the 3 classifiers
we compare in more details. The description of our case study is given in section 4. The results are
presented and their threats to validity are examined in section 5. Section 6 finally gives the conclusion.</p>
    </sec>
    <sec id="sec-2">
      <title>2. BACKGROUND</title>
      <p>
        There are some case studies that achieved promising results when using RttFor algorithm. The authors of
the algorithm compared it with Bagging, AdaBoost and RndFor on a random selection of 33 benchmark
data sets from UCI machine learning repository [Rodríguez et al., 2006]. These datasets contained from 4
up to 69 features, from 57 up to 20,000 instances and from 2 to 26 distinct classes. Their results indicated
that the RttFor outperformed other algorithms, achieving 84 wins and only 2 loses in paired comparison
of significant differences between models’ accuracy levels. Amasyali and Ersoy [A
        <xref ref-type="bibr" rid="ref5">masyali and Ersoy,
2014</xref>
        ] performed the comparison of Bagging, Random Subspaces, RndFor and RttFor algorithm on 43
datasets from the same UCI repository in terms of accuracy. Each algorithm was used with and without
the addition of new, artificially combined features. RttFor with the addition of new features outperformed
all the other algorithms and RttFor without the addition of new features was the second best.
      </p>
      <p>
        There are several studies that used the RttFor for different classification purposes, like image
classification [Kuncheva et a
        <xref ref-type="bibr" rid="ref7">l., 2010</xref>
        ], [Xia e
        <xref ref-type="bibr" rid="ref2">t al., 2014</xref>
        ], [Z
        <xref ref-type="bibr" rid="ref10">hang, 2013</xref>
        ]. There, the RttFor was
outperformed by Random Subspace with Support Vector Machine (SVM) and several other algorithms in
classification of brain images obtained through functional magnetic resonance imaging [Kuncheva et a
        <xref ref-type="bibr" rid="ref7">l.,
2010</xref>
        ]. The impact of several feature transformation methods was analyzed in the RttFor: PCA that is the
default method, maximum noise fraction, independent component analysis and local Fisher discriminan
        <xref ref-type="bibr" rid="ref2">t
analysis [Xia et al., 2014</xref>
        ]. Xia et al. also compared each of these variations of RndFor to CART, Bagging,
AdaBoost, SVM and LogReg via Variable Splitting and Augmented Lagrangian (LORSAL) in
hyperspectral remote sensing image classification. The default variant of RttFor with PCA outperformed
others in terms of accuracy. An interesting cascade classifier ensemble is crea
        <xref ref-type="bibr" rid="ref1">ted by Zhang [Zhang, 2013</xref>
        ].
He combined k-Nearest Neighbor (kNN), SVM, Multi-Layer Perceptrons (MLP) and RndFor as the first
cascade and RttFor with MLP as the second cascade. Each stage gives a majority vote that is supposed to
be above a predefined threshold value. The second cascade is targeting the rejected instances from
previous cascade, for which the majority vote was not above that threshold, to further insure the
confidence. They achieved a great improvement in reducing the rate of rejected instances and, therefore,
minimizing the misclassification cost. All of these studies used the RttFor because its performance was
reported to be very good comparing to other classifiers.
      </p>
      <p>
        To the best of our knowledge, the only use of RttFor in the SDP domain was done by [Palivela e
        <xref ref-type="bibr" rid="ref1">t al.,
2013</xref>
        ]. However, it remained unexplained what is their source of data, what information is stored in it,
how many features and how many instances does it contain. They compared several classification
algorithms: C4.5, SMO, RttFor, Bagging, AdaBoost M1, RndFor and DBScan, evaluated the performance
in terms of 8 different evaluation metrics: Accuracy, True Positive rate, False Positive rate, Recall,
Fmeasure, Kappa statistics and AUC, but lacked the comparison of statistical significance. Nevertheless,
their results also indicated the RttFor as the classifier of utmost performance.
All the algorithms involve building models iteratively upon a training set. The training set contains
multiple independent variables and 1 dependent variable that we want to predict. The model is trained on
this dataset and then it is evaluated on previously unseen data, i.e. the testing set. In classification
domain, the dependent variable is discrete, unlike in regression domain, where it is continuous. In our
case, the dependent variable is a binary class, where 1 indicates the presence of a bug and 0 indicates the
absence of bugs and the independent variables are source code features. We present the algorithms of all
the three classification models in the remainder of this section.
      </p>
      <sec id="sec-2-1">
        <title>3.1 Logistic Regression</title>
        <p>
          LogReg is a statistical classifier. It is used in various classification problem domains and it is renowned as
a robust method. That quality makes this classification algorithm appealing to software engineering
domain where data are rarely normally distributed, usually are skewed and contain outliers and missing
values. The multivariate LogReg is used for classification problem with multiple independent varia
          <xref ref-type="bibr" rid="ref11">bles
[Tabachnick and Fidell, 2001</xref>
          ]. For a training set that consists of a set of features X of size (N x n), with N
being the number of instances and n being the number of features mk (with 1 ≤ k ≤ n), and of dependent
variable Y of size n as the binary class vector, the LogReg classification algorithm can be explained in
these steps:
(1) Initiate the search of regression coefficients Ck, where C0 is the intercept and Ck are the weights for
each feature mk and build the classification model  as:
        </p>
        <p>=   0+ 1 1+⋯+   
  1,  2, …   1+  0+ 1 1+⋯+    (1)
(2) Evaluate the model by assessing the natural log likelihood (NLL) between the actual (  ) and the
predicted (  ) outcomes for each j-th instance (with 1 ≤ j ≤ N) as:
   =  =1   ln   + (1 −   )  (1 −   )
(2)
(3) Optimize the coefficients using maximum likelihood procedure iteratively, until convergence of
coefficients is achieved
(4) The output of classification is the probability that a given instance belongs to the class   = 1</p>
      </sec>
      <sec id="sec-2-2">
        <title>3.2 Random Forest</title>
        <p>
          RndFor is an ensemble classifier, proposed
          <xref ref-type="bibr" rid="ref11">by [Breiman, 2001</xref>
          ], that takes advantage of randomness
induced by splitting the instances and features for multiple classifiers. Decision trees are the classifiers
and each of them receives a different training subset. The final classification output is the majority’s
decision of all the trees. That way, generalization error is reduced, impact of outliers and noise is
minimized and model’s performance is improved. For a training set that is defined according to previous
Subsection 3.1 the RndFor classification algorithm is presented in Figure 1a. With Ti indicating an
arbitrary tree and K as the number of trees, it contains following steps:
(1) Assign each tree Ti with a subset of features of size between 1 and √ from the training set X
(2) Take a bootstrap sample of instances from the training set (2/3 for training and 1/3 for error
estimation)
(3) Iteratively grow the trees using CART methodology without pruning, selecting the best feature and
splitting the node into two daughters
(4) Average the tree’s error testing the subset of trees on mutual error estimation subset and testing
each tree individually on its own error estimation subset
(5) The output of classification is the majority vote of all trees in the forest
75%
(N)
2/3
(N)
1/3
(N)
1
2
...
        </p>
        <p>N
· The training set X contains n dependent variables (features) and y as dependent variable and N instances</p>
        <p>m1 m2 ... mn y
X= ...21 y= 10</p>
        <p>N
1. Each tree Ti (where 1 ≤ i ≤ K) is given a random subset of features of size M (where 1 ≤ M ≤ √n )</p>
        <p>T1 T2 TK
m1 ... mM m1 ... mM m1 ... mM y</p>
        <p>M M M
· Each tree Ti is given a bootstraped subset of instances of size 2/3 for learning and 1/3 for error estimation
· CART methodology without pruning is learning method
· Error estimation is done individually and mutually for a subset of trees</p>
        <p>T1 Ti
m1 ... mM y m1 ... mM y
1
2
...</p>
        <p>N</p>
      </sec>
      <sec id="sec-2-3">
        <title>3.3 Rotation Forest</title>
        <p>The RttFor is a novel ensemble classifier algorithm, proposed by [Rodríguez et al., 2006]. It is an
algorithm that involves randomized PCA performed upon a selection of features and instances of the
training set before building the decision trees. For a training set that is defined according to previous
Subsection 3.1, the RttFor classification algorithm is presented in Figure 1b. With Si indicating an
arbitrary subset of training set and K as the total number of subsets, it contains following steps:
(1) Split the features of training set X into K non-overlapping subsets of equal size M=n/K
(2) From each subset Si, randomly remove 25% of instances using bootstrap method to ensure the
coefficients obtained by PCA are different for each tree
(3) Run PCA on the remaining 75% of the subset and obtain ai,j coefficients for each i-th subset and j-th
feature inside the subset
(4) Organize the ai,j coefficients in a sparse rotation matrix and rearrange the coefficients so that they
match the positions of the n features they were build upon to obtain Rka
(5) Repeat the procedure in steps 1 – 4 for each tree in the RttFor and build the tree upon training set X ·</p>
        <p>Rka, Y
(6) The output of classification in the majority vote of all the trees in the forest</p>
      </sec>
      <sec id="sec-2-4">
        <title>4. CASE STUDY</title>
        <p>The goal of this case study is to evaluate the performance of RttFor in SDP. We used LogReg, a robust and
trustworthy statistical classifier used in many other domains and the RndFor, a newer ensemble
algorithm that is achieving excellent results in SDP, as the basis of our comparison. Our research
question (RQ) is how well does RttFor perform in SDP when compared to LogReg and RndFor?</p>
      </sec>
      <sec id="sec-2-5">
        <title>4.1 Datasets</title>
        <p>We collected datasets from five consecutive releases of Eclipse JDT project: 2.0, 2.1, 3.0, 3.1 and 3.2.
Eclipse is an integrated development environment (IDE), mostly written in Java, but offering a wide
range of programming languages for development. The Java development tools (JDT) is one of the largest
development environments and, due to publicly available source code management repository in GIT and
bug tracking repository in Bugzilla, it is often analyzed in SDP research. The number of the source code
files for each release of the JDT project and the ratio of NFPr and FPr files is given in Table I.</p>
        <p>Dataset
JDT 2.0
JDT 2.1
JDT 3.0
JDT 3.1
JDT 3.2</p>
        <p>
          We collected the data using Bug Code (BuCo) Analyzer tool, which we developed for
          <xref ref-type="bibr" rid="ref2">this purposes
[Mauša et al., 2014</xref>
          ]. The data collection approach starts with the collection of fixed bugs of severity
greater than trivial, for each of the analyzed releases of the JDT project. Then it performs the bug-code
linking on the file level of granularity using a regular expression that defines the surrounding characters
of a Bug ID in the commit messages and outperforms even some of the complex predic
          <xref ref-type="bibr" rid="ref2">tion techniques
[Mauša et al., 2014</xref>
          a]. Finally, it computes the 50 software product metrics using LOC Metrics and JHawk
for each source code file. All the numerical metrics are used as independent variables for our experiment,
so we excluded only the name of superclass. In order to make the dependent variable suitable for
classification, the number of bugs per file was transformed into binary attribute named Fault proneness.
Its value is set to 1 for all the files that contain at least 1 bug and set to 0 otherwise. The final structure of
dataset is a matrix of size N x n, where N represents the number of files and n represents the number of
features.
        </p>
      </sec>
      <sec id="sec-2-6">
        <title>4.2 Experiment Workflow</title>
        <p>We used the Weka Experimenter Environment (version 3.6.9) to perform the experiment. Weka is a
popular open source software for machine learning written in java, developed at the University of
Waikato, New Zealand. The experiment workflow includes the following steps:
(1) import the data in arff format and assign fault proneness to be the dependent variable
(2) split the training and testing sets using 10-times 10-fold cross validation
(3) build RttFor, LogReg and RndFor models
(4) evaluate the models’ performance in terms of Acc, TPR, FPR, F-measure, Kappa and AUC
(5) perform the paired T-test of significance</p>
        <p>The 10-times 10-fold cross validation is used to prepare the training and testing sets. The 10-fold cross
validation divides the dataset into 10 parts of equal size, randomly picking instances (rows) and
maintaining the number of features (columns). In each of 10 steps, another 1/10 portion of dataset is used
as testing set and the remaining 9/10 are used for training. The 10-times 10-fold cross validation iterates
the whole procedure 10 times, evaluating the classification model for 100 times in total.</p>
        <p>
          The classification models are all built and evaluated upon the same training and testing datasets, i.e.
JDT release. It is important to be aware that the parameters tuning can improve the performance of
various
          <xref ref-type="bibr" rid="ref13">classifiers [Chen et al., 2009</xref>
          ]. Thus, we performed an examination of their performance with
various configurations. We used Weka’s CVParameter Selection meta classifier that performs parameter
selection by cross-validation for any classifier. For RndFor, we were configuring the number of features
per tree from 1 to 8 (8 is the default value, calculated as log2(                  ) + 1), the maximum
depth of tree from 1 to 5 (including the default unlimited depth) and the number of trees from 10 to 50
with step of 5 (default value is 10). For RttFor, we were configuring only the number of groups from 3 to
10 (default value is 3) and left the number of iterations to default value of 10 and the percentage of
removed instances to default value of 50%. Since there were no significantly different results obtained by
either of these configurations, we left all the parameters to their default values for the main experiment.
        </p>
      </sec>
      <sec id="sec-2-7">
        <title>4.3 Performance Evaluation</title>
        <p>The evaluation of binary classification problems is usually done using the confusion matrix. Confusion
matrix consists of correctly classified instances, true positive (TP) and true negative (TN), and incorrectly
classified instances, false positive (FP) and false negative (FN). In our case, the Positive instances are the
ones that are Fault Prone (FPr), i.e. files that have at least 1 bug, and Non-Fault-Prone files (NFPr) are
the Negative ones. The evaluation metrics that we used in our experiment are the following ones:




</p>
        <sec id="sec-2-7-1">
          <title>Accuracy (Acc) – number of correctly classified files:</title>
          <p>+ 
   =   +  +  + 
True Positive Rate (TP_Rate, Recall) – number of correctly classified FPr files in total FPr files:
 
      =   + 
False Positive Rate (FP Rate) – number of incorrectly classified NFPr files in total false predictions:
 
      =   +</p>
        </sec>
        <sec id="sec-2-7-2">
          <title>F-measure (F) – harmonic mean of TPR and Precision:</title>
          <p>= 2 ∙  T P R+∙ Pr e c ision</p>
          <p>Kappa statistics – accuracy that takes into account the chance of random guessing:
where Pr_rand is equal to:
 = Acc −Pr⁡_rand</p>
          <p>1−Pr ⁡_   
  _    = T P + FN ∙ TP + F P + T N + F P ∙ T N + F N</p>
          <p>The usual output of a binary classifier is the probability that a certain instance belongs to the Positive
class. Before making the final prediction, a probability threshold value, above which instances are going
to be classified as Positive, needs to be determined. That is why all the above mentioned evaluation
metrics are calculated with predetermined threshold value. A metric that does not depend on the
threshold value is:
</p>
          <p>Area under receiver operating curve (AUC):
   = 01 ROC_curve
(3)
(4)
(5)
(6)
(7)
(8)
(9)
where ROC_curve is a graphical plot that illustrates the relation between TP_Rate and FP_Rate for all
possible probability threshold values.</p>
          <p>All the used metrics have their values in range [0-1]. The performance of a classifier is better for
higher values of Accuracy, TP Rate, F-measure, Kappa statistics and AUC. Only in the case of FP Rate,
also known as the false alarm rate, the performance is better for lower values. It is important to use
several evaluation metrics because they examine predictive performance from various angles. In the
presence of severe data imbalance between the two output classes, it is even more important. For
example, Acc can then easily become very high, misleading us to believe in excellent performance. On the
other hand, the FP Rate would be also very high, indicating that the prediction is of questionable value
[Mauša et al., 2012]. After obtaining the results, we perform the paired T-test in order to discover whether
there are significant differences between classifiers. The whole process is repeated for each of the 5
Eclipse JDT datasets and for each of the 6 evaluation metrics.</p>
        </sec>
      </sec>
      <sec id="sec-2-8">
        <title>5. RESULTS</title>
        <p>The results of the paired T-tests for Acc, TPR, FPR, F-measure, Kappa statistics and AUC are given in
Table II. Paired T-test compares only two classifiers at the time in one evaluation metric. First five rows
represent the paired T-test comparison of results when the RttFor is the basis of comparison and the
second five rows provide the results when the LogReg is compared to other two classifiers. The only
remaining combination would be to use the RndFor as a comparison basis, but that one can be deduced
from the previous two. The basis of comparison and its results are given in bold. The results are presented
with their average value and standard deviation of 100 iterations that are obtained with the 10-times
10fold cross validation process. The results that exhibit significant difference are marked with * and v. Sign
* is given for cases in which the compared classifier is performing significantly worse than the basis of
comparison and v is given for significantly better performance. The results that do not exhibit statistically
significant difference have no sign adjacent to them. From results presented in tables II, we draw
following observations:



</p>
        <p>Overall summary of results shows that RttFor achieved 29 wins and 14 loses, RndFor achieved 19 wins
and 14 loses, LogReg achieved 15 wins and 27 loses
RttFor outperformed RndFor and LogReg in terms of AUC and Kappa statistics in all but 1 case.
LogReg outperformed RndFor and RttFor in terms of FP rate in all but 1 case.</p>
        <p>RttFor is outperformed only in terms of TP Rate and FP Rate, 6 times by RndFor and 8 times by
LogReg</p>
      </sec>
      <sec id="sec-2-9">
        <title>5.1 Threats to Validity</title>
        <p>
          It is very important to be aware of the threats to validity of our
          <xref ref-type="bibr" rid="ref13">case study [Runeson and Höst, 2009</xref>
          ] and
we address them in this subsection. The generalization of our results is limited with the choice of the
datasets we used. We covered only the evolution through 5 subsequent releases of only 1 project that
comes from only 1 open source community. A greater number of case studies like this one, with as many
datasets from various domains, is required to achieve a conclusion of greater confidence level. The choice
of comparing classification algorithms is another threat to validity. This case study included only 2
classifiers other than RttFor. However, we chose the LogReg and the RndFor due to their very good
performance in other case studies. The choice of classification model’s parameters is a potential source of
bias to our results. That is why we performed an analysis of performance when tuning the parameters.
Since we noticed no statistically significant difference between various configurations, we left them to
their default values.
        </p>
      </sec>
      <sec id="sec-2-10">
        <title>RESULTS</title>
        <p>This paper continues the search for a classification algorithm of utmost performance for the SDP research
area. We analyzed a promising novel classifier called RttFor that received very limited attention in this
field so far. We compared the performance of RttFor with the LogReg and the RndFor in terms of 6
diverse evaluation metrics in order to get as detailed comparison as possible. The classifiers were used
upon 5 subsequent releases of Eclipse JDT open source project. These datasets contain between 1800 and
3400 instances, i.e. java files, 48 features describing their complexity and size and 1 binary class variable
that indicates weather the file contains defects or not. The comparison was done using paired T-test of
significance between results obtained by 10-times 10-fold cross-validation.</p>
        <p>The conclusion of our case study and the answer to our RQ is that RttFor is indeed a state of the art
algorithm for classification purposes. The overall ranking of the three classifiers we analyzed shows that
the RttFor is the most successful one, the RndFor is the second best and the LogReg is the least successful
classifier. This finding is consistent with other case studies that used RttFor and proves that this
classifier needs to be taken into account when performing research in SDP more often. Our future work
intentions include a more complex comparison that would include a greater number of classification
algorithms. But more importantly, it would include a greater number of datasets, covering longer periods
of projects’ evolution and a greater number of projects from various contexts.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>T.</given-names>
            <surname>Galinac Grbac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Runeson</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Huljenic</surname>
          </string-name>
          ,
          <year>2013</year>
          .
          <article-title>A second replicated quantitative analysis of fault distributions in complex software systems</article-title>
          .
          <source>IEEE Trans. Softw. Eng</source>
          .
          <volume>39</volume>
          (
          <issue>4</issue>
          ), pp.
          <fpage>462</fpage>
          -
          <lpage>476</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>T. Galinac</given-names>
            <surname>Grbac</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Huljenić</surname>
          </string-name>
          ,
          <year>2014</year>
          .
          <article-title>On the probability distributions of faults in complex software systems</article-title>
          ,
          <source>Information and Software Technology (0950-5849)</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Rodríguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.I.</given-names>
            <surname>Kuncheva</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.J.</given-names>
            <surname>Alonso</surname>
          </string-name>
          ,
          <year>2006</year>
          .
          <article-title>Rotation Forest: A New Classifier Ensemble Method</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Learning</source>
          , vol.
          <volume>28</volume>
          , no.
          <issue>10</issue>
          , pp.
          <fpage>1619</fpage>
          -
          <lpage>1629</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Lessmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Basesens</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Pietsch</surname>
          </string-name>
          ,
          <year>2008</year>
          .
          <article-title>Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings</article-title>
          ,
          <source>IEEE Transactions On Software Engineering</source>
          , vol.
          <volume>34</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>485</fpage>
          -
          <lpage>496</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Amasyali</surname>
          </string-name>
          and
          <string-name>
            <given-names>K. O.</given-names>
            <surname>Ersoy</surname>
          </string-name>
          ,
          <year>2014</year>
          .
          <article-title>Classifier Ensembles with the Extended Space Forest</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          , Vol.
          <volume>26</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>549</fpage>
          -
          <lpage>562</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>C.</given-names>
            <surname>Pardo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Diez-Pasor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Garcia-Osorio</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          ,
          <year>2013</year>
          . Rotation Forest for regression,
          <source>Applied Mathematics and Computing</source>
          , vol.
          <volume>219</volume>
          (
          <issue>19</issue>
          ), pp.
          <fpage>9914</fpage>
          -
          <lpage>9924</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>L. I.</given-names>
            <surname>Kuncheva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. O.</given-names>
            <surname>Plumpton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E. J.</given-names>
            <surname>Linden</surname>
          </string-name>
          and
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Johnston</surname>
          </string-name>
          ,
          <year>2010</year>
          .
          <article-title>Random Subspace Ensembles for fMRI Classification</article-title>
          ,
          <source>IEEE Transaction on Medical Imaging</source>
          , Vol.
          <volume>29</volume>
          .,
          <source>No. 2</source>
          , pp.
          <fpage>531</fpage>
          -
          <lpage>542</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>J.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Chanussot</surname>
          </string-name>
          ,
          <year>2014</year>
          .
          <article-title>Hyperspectral Remote Sensing Image Classification Based on Rotation Forest</article-title>
          ,
          <source>IEEE Geoscience and Remote Sensing Letters</source>
          , Vol.
          <volume>11</volume>
          ., no.
          <issue>1</issue>
          , pp.
          <fpage>239</fpage>
          -
          <lpage>243</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Bailing</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <year>2013</year>
          .
          <article-title>Reliable Classification of Vehicle Types Based on Cascade Classifier Ensembles</article-title>
          ,
          <source>IEEE Transactions on Inteligent Transportation Systems</source>
          , Vol.
          <volume>14</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>322</fpage>
          -
          <lpage>332</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>H.</given-names>
            <surname>Palivela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. K.</given-names>
            <surname>Yogish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vijaykumar</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Patil</surname>
          </string-name>
          ,
          <year>2013</year>
          .
          <article-title>A stury od mining algorithms for finding accurate results and marking irregularities in software fault prediction</article-title>
          ,
          <source>International Conferenc on Information Communication and Embedded Systems (ICICES)</source>
          , pp.
          <fpage>524</fpage>
          -
          <lpage>530</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>B. G.</given-names>
            <surname>Tabachnick</surname>
          </string-name>
          and
          <string-name>
            <given-names>L. S.</given-names>
            <surname>Fidell</surname>
          </string-name>
          ,
          <year>2001</year>
          .
          <article-title>Using Multivariate Statistics, 5th edition</article-title>
          , Pearson,
          <year>2007</year>
          , ISBN 0-205-45938-2
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>L.</given-names>
            <surname>Breiman</surname>
          </string-name>
          ,
          <year>2001</year>
          . Random forests,
          <source>Machine Learning</source>
          , Vol.
          <volume>45</volume>
          , No.
          <issue>1</issue>
          , pp.
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.-L.</given-names>
            <surname>Shyu</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.-C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <year>2009</year>
          .
          <article-title>Supervised Multi-Class Classification with Adaptive and Automatic Parameter Tuning</article-title>
          ,
          <source>IEEE International Conference on Information Reuse &amp; Integration</source>
          , IRI '09,
          <string-name>
            <surname>Las</surname>
            <given-names>Vegas</given-names>
          </string-name>
          , USA, pp.
          <fpage>433</fpage>
          -
          <lpage>434</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>G.</given-names>
            <surname>Mauša</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. Galinac</given-names>
            <surname>Grbac</surname>
          </string-name>
          and
          <string-name>
            <given-names>B. Dalbelo</given-names>
            <surname>Bašić</surname>
          </string-name>
          ,
          <year>2014</year>
          .
          <article-title>Software Defect Prediction with Bug-Code Analyzer - a Data Collection Tool Demo</article-title>
          ,
          <source>In Proceedings of SoftCOM '14</source>
          .
          <string-name>
            <surname>Split</surname>
          </string-name>
          , Croatia
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>G.</given-names>
            <surname>Mauša</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Perković</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. Galinac</given-names>
            <surname>Grbac</surname>
          </string-name>
          and
          <string-name>
            <given-names>B. Dalbelo</given-names>
            <surname>Bašič</surname>
          </string-name>
          ,
          <year>2014</year>
          .
          <article-title>Techniques for Bug-Code Linking</article-title>
          ,
          <source>In Proceedings of SQAMIA '14</source>
          ,
          <string-name>
            <surname>Lovran</surname>
          </string-name>
          , Croatia, pp.
          <fpage>47</fpage>
          -
          <lpage>55</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>G.</given-names>
            <surname>Mauša</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. Galinac</given-names>
            <surname>Grbac</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B. Dalbelo</given-names>
            <surname>Bašić</surname>
          </string-name>
          ,
          <year>2012</year>
          .
          <article-title>Multivariate logistic regression prediction of fault-proneness in software modules</article-title>
          ,
          <source>In Proceedings of MIPRO '12</source>
          ,
          <string-name>
            <surname>Opatija</surname>
          </string-name>
          , Croatia, pp.
          <fpage>698</fpage>
          -
          <lpage>703</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>P.</given-names>
            <surname>Runeson</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Höst</surname>
          </string-name>
          ,
          <year>2009</year>
          .
          <article-title>Guidelines for conducting and reporting case study research in software engineering, Empirical Softw</article-title>
          . Engg., vol.
          <volume>14</volume>
          , pp.
          <fpage>131</fpage>
          -
          <lpage>164</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>