<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Pruning AdaBoost for Continuous Sensors Mining Applications</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>M. Rastgoo</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>G. Lemaitre</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>X. Rafael Palou</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>F. Miralles</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>P. Casale</string-name>
          <email>plcasale@bdigital.org</email>
        </contrib>
      </contrib-group>
      <fpage>53</fpage>
      <lpage>57</lpage>
      <abstract>
        <p>In this work, pruning techniques for the AdaBoost classifier are evaluated specially aimed for a continuous learning framework in sensors mining applications. To assess the methods, three pruning schemes are evaluated using standard machine-learning benchmark datasets, simulated drifting datasets and real cases. Early results obtained show that pruning methodologies approach and sometimes out-perform the no-pruned version of the classifier, being at the same time more easily adaptable to the drift in the training distribution. Future works are planned in order to evaluate the approach in terms of time efficiency and extension to big-data analysis.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        As the number of sensors deployed every day in the real world
increases, the ambition of mining these continuous data-streams
becomes a crucial part in applications. In the recent years, data mining
techniques started to be very popular in sensors mining tasks
specially when related to learning from data streams [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. These
techniques, stated upon the machine learning framework, are
designed to generate a predictive model from a well sampled training
dataset distribution. The model is further used to classify any future
instance of data without the possibility to be updated if the value
distribution of the data-stream changes. In other words, the paradigm
provided by the typical machine learning setting is not suitable for
continuous mining of data streams [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The AdaBoost learning
function [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] allows a suitable framework for mining continuous streams
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Being an incremental ensemble of classifiers, this learning
function is updated to grow its knowledge just adding new classifiers to
the previous models. Nevertheless, when many subsequent batches
of data are provided, Adaboost tends to create large ensembles that
suffer of two main drawbacks: (i) increasing memory needed to store
the decision model and (ii) over-fitting. Pruning techniques can be
suited for reducing the dimension of the ensemble by selecting only
specific models. The first attempt of pruning an AdaBoost
classifiers was introduced by Margineantu and Dietterich [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] by mean of
comparing five different methods, namely (i)early stopping, (ii) KL
divergence, (iii) Kappa statistics, (iv) Kappa error convex Hull and
(v) Reduce error with back-fitting . Hernanadez-Lobato et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] used
Genetic Algorithms to prune the AdaBoost ensemble, searching in
the space of all possible subsets of classifiers created by AdaBoost.
Zhang et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] defined pruning as a quadratic integer
programming problem with the aim to find a fixed size subset ofk classifiers
with minimum misclassification and maximum diversity.
Nevertheless, those works are no suitable solutions for pruning AdaBoost in a
continuous learning framework. In this paper, experiments on
pruning methods for continuous data-streams mining are performed. The
AdaBoost algorithm is trained on subsequent batches of incoming
data followed by consecutive pruning steps. The advantage of this
approach is twofold: (i) on the first hand, when new concepts are
learned, pruning allows to maintain the ensemble in order to be the
least memory consuming and (ii) on the other hand, pruning provides
a first attempt to retain only the significant information acquired
from previous knowledge. The reminder of this paper is organized
as follows. In Section 1, the continuous learning framework, the
AdaBoost algorithm and the used pruning methods are introduced and
explained in details. In Section 3, validation protocols are described
and, in Section 4, results are presented. Finally, Section 5 discusses
the obtained results and concludes the paper.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Pruning AdaBoost in Continuous Learning</title>
      <p>
        In a continuous learning framework, as shown in Fig. 1, new
knowledge is acquired only when the current model does not fit anymore
the incoming data-stream distribution [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This decision is performed
by evaluating the current classifier using the function g as
performance measure and evaluating the obtained performance e. When
e is not good enough, the current model hi is updated training the
learning function f with the new incoming data Di+1.
Incremental learning functions should be preferred. In this way, only the new
incoming data will be used for both maintaining the previous
knowledge acquired, not having to store historical data. The AdaBoost
algorithm represents an incremental learning function able to properly
meet these requirements. Nevertheless the classifiers created by
AdaBoost grows linearly as many subsequent learning steps are
performed. Here, the pruning function p allows to maintain the model
computationally optimal. Aim of this work is evaluating between
different pruning functions p in terms of classifier performance. In the
following subsection, AdaBoost and the pruning methods are
presented and explained in details.
2.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>AdaBoost</title>
      <p>AdaBoost, short for Adaptive Boosting, is an ensemble learning
algorithm that allows to obtain an high performance classifier by a
linear combination of weak learners. Algorithm 1 shows the
pseudocode for AdaBoost. The algorithm takes as input a training set
(xi, yi) where xi is a N -dimensional feature vector, and yi are
the class labels. After T rounds of training, T weak classifiers ht
and T weights αt are combined to assemble the final strong
classifier. Higher weights αt are assigned to the best weak classifiers
ht. Instantiations of AdaBoost may differ due to the choice of the
Algorithm 1 AdaBoost Algorithm
Input:
- Training set of N samples (xi, yi), with i = 1 . . . N , xi ∈ RN , yk ∈ Y =
{1, +1} ;
- Weak learning algorithm WeakLearn ;
- Number of learning iteration T ;
Initialize W1(k) = 1/N, k = 1, . . . , N ;
for t = 1, . . . , T do
1. Train WeakLearn using distribution Wt and get weak hypothesis ht ;
2. Compute classification error t = P rk∼Wt [ht(xk) 6= yk] ;
3. Compute αt = 12 ln( 1−t t ) ;
4. Update distribution:</p>
      <p>Wt+1(k) = Wt(k) exp(−αtykht(xk)) ;</p>
      <p>Zt
where Zt is a normalization factor chosen so that Wt+1 will be a proper
distribution function.</p>
      <p>end for
Output:</p>
      <p>H(x) = sign(PtT=1 αtht(x)) ;
weak learning algorithm, defined as a learner performing slightly
better than random guessing (&gt; 50% right-classification). A variety of
weak learners e.g., neural networks or decision trees can be used.
Decision stumps are the most common weak classifiers used in
AdaBoost. Decision stumps are one-level decision trees equivalent to a
threshold that best splits the data. Each stump learner is
characterized by three parameters: (i) the nth dimension of the features set
where the classifier is applied, (ii) the decision level, i.e.,
thethreshold splitting the data in the nth given dimension and (iii) the decision
sign (−1 or +1) determining the inequality direction for the
thresholding. For a given batch of data with a set of features of size n, at
each iteration of AdaBoost the decision stump that minimizes the
error t in an nth dimension of the training distribution is selected. The
information provided by the final set of decision stumps selected by
AdaBoost can be used for mining which are the significant features
of the data-stream and, more important, which is the best split in the
data.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Pruning methods</title>
      <p>
        Three different pruning methods have been used and compared,
namely (i) Reduce Error, (ii) Learner Weights Analysis and (iii)
Pareto Analysis. The Reduced Error algorithm was used first in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
Being the original implementation not suitable from a continuous
learning framework, an improved version is proposed in this work
in order to speed-up the process. Pruning has also been performed
using Learner Weights and Pareto Analysis methodologies, both of
them able to provide a set of most discriminative learners from the
whole ensemble. From the far of our knowledge, no previous
application of those methodologies has been done in the tasks of pruning
an AdaBoost ensemble.
2.2.1
      </p>
      <sec id="sec-4-1">
        <title>Reduce Error (RE)</title>
        <p>In this algorithm, the first step is performed in order to initialize the
pruning distribution Wt and to select the weak classifierht from the
ensemble H which minimizes the classification error t on Wt
distribution. This classifier is added to the pruned ensembleP , a weight αt
is assigned to it and Wt+1 distribution is also updated as in AdaBoost
routine. Then, iteratively, each remaining classifierht is individually
added to the ensemble P and the classification error t of this new
ensemble is evaluated on the pruning set using Wt+1 distribution.
In order to select the best classifier, the classifierht combined with
P minimizing the classification error t is definitely added to P , a
weight αt is assigned to it and Wt+2 distribution is also updated as
in AdaBoost routine. The routine stops when the number of
classifiers in the sub-ensemble P reaches a ppre-specified size. The two
main changes with respect the original RE algorithm are the
following. In the original version, a final back-fitting approach is performed
only after the selection of each weak classifier while in our approach
selection is done at each step. In addition, each weak classifier is
added to the pruned ensemble P only after being re-weighted. This
procedure ensures better classification results than the original RE
formulation.
2.2.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Learner Weights Analysis (WA)</title>
        <p>From the distributions of the weights αt in the ensemble, weak
learners were selected based on the following assumptions: (i) weak
learners with higher ensemble weight αt are the best weak learners of the
ensemble and (ii) an ensemble is better when more diversified the
classifiers forming it are. The technique works as follow. AdaBoost
is applied on the batch of data to obtain an ensemble of T classifiers.
Then, a matrix M is built, by grouping the ensemble weights αt of
each decision stump classifier using their dimension parameter. M
is of size n × D where n is the number of element for each of the
D dimensions. In order to select the best classifiers,M is first sorted
formerly by row and subsequently by column, always in a
descendant order. M is transformed into a vector V by concatenating all its
columns. Finally t classifiers corresponding to the t first weights of
V , with t &lt;&lt; T , are selected. The value of t determines the pruning
percentage.
2.2.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Pareto Analysis (PA)</title>
        <p>PA is based on the assumption that few key actions will produce
significant overall effects. Applied to ensemble learning, this technique
implies that only few key weak classifiers will have an high impact
on the overall performance of the ensemble. PA proposed a statistical
point of view in order to select these key classifiers. This technique is
used to estimate effectiveness of each feature dimension, and
accordingly selects the classifiers from feature dimensions with high
impact. The effectiveness could be adjusted using a threshold. First, the
features are grouped based on the total number of ensemble weight
which are considers as outliers in each dimension. The outliers could
be found with reference to first and third quartile (Q1, Q3), and inter
quartile range (IQR). Values above Q3 + 1.5 × (IQR) are
considered as outliers in each case. The frequency distribution of these
outliers is sorted in descendant order and the cumulative distribution
is computed. Then, the features dimensions are selected based on a
threshold level corresponding to the number of classifiers to keep. All
dimensions with lower cumulative percentage than the threshold (i.e.
desired percentage of maximum cumulative value) are taken into
account. From the selected feature dimensions, the maximum weights
are used to highlight the learners. The technique can be perceived as
a principle dimension selection, where the dimensions considered as
more important are selected.
3</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Validation Protocol</title>
      <p>Three typologies of experiments have been performed in order to
validate the effectiveness of the pruning methods on both static and
drifting distributions. A cross-validation approach has been used for
validating the methods. At each step of the cross-validation procees,
the dataset has been randomly divided into three sub-sets, training
(50%), pruning (40%) and testing(10%) sets. In the following
sections the validation protocols adopted for each topology of
experiment are described. Under the model described in Fig. 1, a proper
threshold T h has been chosen in order to train the model always on
the new incoming data.
3.1</p>
    </sec>
    <sec id="sec-6">
      <title>UCI Datasets Repository</title>
      <p>
        Five datasets from the UCI repository [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] have been used for
evaluating the effectiveness of the pruning methods. In this validation
step, the KL divergence method as originally proposed in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], has
been added in order to have a baseline comparison. The datasets
considered are Australian, Breast, Diabetes, Heart and Pima. The mean
number of instances in the datasets is around 700, except Heart
having 270 instance. The aim of the experiment is to analyse the
results by pruning at 90% an initial ensemble. The average error rate
for each technique was computed using a modified version of ten
fold cross-validation able to consider the pruning sets into the
evaluation process, with the percentage previously outlined. AdaBoost
algorithm was used to create an ensemble of hundred weak
classifiers. Then, each pruning method was performed in order to create a
pruned sub-ensemble containing only ten classifiers.
3.2
      </p>
    </sec>
    <sec id="sec-7">
      <title>Simulated Drifting Datasets</title>
      <p>
        The second set of experiments has been focused on testing the
pruning methods in a continuous learning framework. These have been
performed using three sets of simulated data-streams that include
drifting. The datasets are generated using the software provided
by [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Figure 2 shows the three different settings for each
experiment. Four linear drifts have been considered for the first dataset and
three circular drifts have been created for the remaining two datasets.
The ensemble was incrementally grown using all the drifted
distributions. The experiments performed using the simulated datasets are
described in the following.
(a) Linear Drift
(b) Circular Drift
(c) Circular Narrow
      </p>
      <p>
        Drift
Exp. 1: In the first experiment, it is assumed that data distribution
is subject to the change due to different drifts and the ensemble
is incrementally grown over the drifted batches of incoming data
with the main aim to classify the current batch of information.
After the training, the pruning and the testing are applied on a
different samplings of the same drifted batch. The experiment is
repeated five times following a 5-fold cross-validation paradigm.
Exp. 2: The aim of the second experiment is to evaluate the
potential of pruning in classifying both previous and current
information. With the training kept as in the previous experiment, at each
step i the ensemble is pruned and tested on pruning and testing
sets of the joint distribution C0 ∪ . . . ∪ Ci. The experiment has
been performed on five different runs, following a 5-fold
crossvalidation paradigm.
• The Sensor Stream(SS) dataset [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] contains sensors information
(temperature, humidity, light and sensor voltage) collected from
fifty-four sensors deployed at Intel Berkeley Research Lab. The
whole stream contains consecutive information over two months
(2 219 803 instances). The experiment aims to infer the
illuminance state based on the measurements provided by each sensor.
Illuminance higher than 200 lux are considered as class 1
otherwise considered as class −1. Every fifteen days, a new batch
of data is collected which leads to three drifts considering the
changes in the lab environment due to weather, humidity and
office work. The experiment was performed using 4-fold
crossvalidation paradigm.
• Power Supply(PS) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is the second dataset used. The dataset
contains hourly power supply consumptions of the Italian
electricity company. The stream contains three year power supply records
from 1995 to 1998 (29 928 instances). The experiment aims to
predict the day state morning (1) - night (−1)) based on the raw
consumption value. The drifting in this stream is mainly derived
by some features such as the season, weather, hours of a day and
the day of the week. The data were split in three batches
representing one drift for each year. The experiment was performed using
3-fold cross-validation paradigm.
• Elec 2(E2) is the third dataset used. This dataset containing 27 549
instances is composed of seven drifts, each representing a week
day. The drifts are due to changes of power consumptions over
the weekdays. The experiment was performed using 7-fold
crossvalidation paradigm.
(a) Results obtained on Linear Drift in Exp.1
(b) Results obtained on Circular Drift in Exp.1
(c) Results obtained on Circular Narrow Drift in Exp.1
As in the Exp. 2 on simulated data, AdaBoost is trained for each drift
on the training set of current data. The pruning function is applied on
a pruning set which contains samples of previous and new batches of
data.
In this section, results obtained on the experiments described in the
previous section are reported. Misclassification error has been
chosen as performance measure. In particular, the pruning methods has
been evaluated using the relative error ( rel) with respect to the error
provided by the no-pruned version of AdaBoost, computed as shown
in Eq. 1. Hence, methods with negative relative errors are performing
better than the reference model.
      </p>
      <p>rel = −1 ·
no pruned − pruned
no pruned
(1)
4.1</p>
    </sec>
    <sec id="sec-8">
      <title>UCI Datasets Repository</title>
      <p>In Fig. 4 the results obtained on the five UCI datasets are reported.
RE is the method performing better than the others, being better than
the reference in Aus, Dia and Hea datasets, and slightly worst than
the reference in Pim. Similar behavior is obtained by WA. Pruning
performs always bad on Bre, where the best result is provided by PA.
4.2</p>
    </sec>
    <sec id="sec-9">
      <title>Simulated Drifting Datasets</title>
      <p>Results obtained on simulated drifting datasets with Exp. 1 are
reported in Fig. 3. RE is the best pruning method for linear and circular
drifting datasets, as previous experiments suggest. In both linear and
circular drifting, WA performs better than PA. Non of the pruning
methods work better than the no-pruned version for high percentage
of pruning. Nevertheless, WA works slightly better than no-pruned
Adaboost when the percentage of pruning is almost 50%. As it may
be expected, the performance of the pruned ensemble generally get
worse as the percentage of pruning increases. Nevertheless, RE is
able to maintain its performance constant over the pruning
percentage in the circular dataset and almost constant in the narrow pruning
dataset. For Exp. 2, results obtained on simulated drifting datasets
are reported in Fig. 6. In this setting, all the pruned ensemble behave
better than their correspondent no-pruned classifiers. As all previous
experiment suggest, RE is the best method, followed by WA. Also
in this case, although the performance of the methods decreases as
the percentage of pruning increases, RE remains almost constant
regardless of the percentage. It should be also noted that the AdaBoost
performance in this experiment is rather bad, reaching a global
error up to 40%. The pruning methods improve this performance until
reaching an error of 25%.
4.3</p>
    </sec>
    <sec id="sec-10">
      <title>Real World Datasets</title>
      <p>Results obtained on the real world dataset are shown in Fig. 5.
Results obtained with the PS datasets are shown in Fig. 7. RE
confirms to be the best pruning method, followed by WA. For SS and
E2 datasets, WA and PA provide the same performance. It should be
noted that RE performs better than the no-pruned version for all the
experiments.
(a) Results obtained on Linear Drift in Exp.2
(b) Results obtained on Circular Drift in Exp.2
(c) Results obtained on Circular Narrow Drift in Exp.2
In this work, experiments have been carried out in order to evaluate
the potential of different pruning methods and their performance in
the framework of continuous learning. The Reduced Error method
is the most consistent method followed by Learner Weight Analysis.
The use of Pareto Analysis does not seem to be justified during the
experiment. Nevertheless, one of the important characteristic of this
method consists in the capability of defining automatically the
number of classifiers of the pruned ensemble. PA may be automatized
by thresholding the performance. Early results show that this
automatic version performs better than the original method in most of the
cases. Experiments on simulated datasets in case of Exp 1 show that
pruning methods are more efficient over wider drifted distribution
rather than narrow drifted distribution. Due to the nature of the
narrow circular dataset, drift stages have more common area and since in
this experiment, current stage has more effect for pruning, compare
to previous stage, the pruning performances are slightly lower. At the
same time, Exp 2 show that pruning methods perform better than the
original classifier when the whole drifting distribution is presented.
Based on Fig. 6, pruning ensemble through the incremental learning,
definitely improves the final results. Finally, results obtained by
experiments on real datasets prove that pruning through the continuous
learning process provides very close or better results than AdaBoost.
As future works, an evaluation of the method efficiency in terms of
computational complexity will be considered since this parameter
has a great importance in a continuous learning framework. For this
main motivation, the reduced error method had been modified in our
research in order to be conceptually capable to run following time
efficiency guidelines and methods based on genetic algorithm and
semi-definite programming have been not used for comparison.
Finally, a study on the extension of the proposed methods towards a
big-data approach is planned to be done. This research shows that
pruning by selecting the weak classifiers from different pools of
subsampled data may improve the final ensemble in terms of accuracy,
diversity and adaptation ability to drift. The employed procedures
in this work can be easily adapted for large datasets and continuous
learning environment with the high quantity of incoming data.
6</p>
    </sec>
    <sec id="sec-11">
      <title>Acknowledgments</title>
      <p>This work is supported by the Information and Communication
Technologies Collaborative Project action BrainAble within the Seventh
Framework of the European Commission, project number
ICT-2010247447.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.J.</given-names>
            <surname>Newman</surname>
          </string-name>
          <string-name>
            <surname>A. Asuncion.</surname>
          </string-name>
          <article-title>UCI machine learning repository</article-title>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Freund</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Schapire</surname>
          </string-name>
          , '
          <article-title>A short introduction to boosting'</article-title>
          ,
          <source>J. Japan. Soc. for Artif</source>
          . Intel.,
          <volume>14</volume>
          (
          <issue>5</issue>
          ),
          <fpage>771</fpage>
          -
          <lpage>780</lpage>
          , (
          <year>1999</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gama</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Gaber</surname>
          </string-name>
          (Eds),
          <source>Learning from Data Streams - Processing techniques in Sensor Networks</source>
          , Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Herna</surname>
          </string-name>
          <article-title>´ndez-</article-title>
          <string-name>
            <surname>Lobato</surname>
          </string-name>
          , Jose´ Miguel Herna´
          <fpage>ndez</fpage>
          -Lobato,
          <article-title>Rube´n Ruiz-Torrubiano, and A´ngel Valle, 'Pruning adaptive boosting ensembles by means of a genetic algorithm'</article-title>
          , in IDEAL, pp.
          <fpage>322</fpage>
          -
          <lpage>329</lpage>
          , (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Ludmila</surname>
            <given-names>I. Kuncheva</given-names>
          </string-name>
          , '
          <article-title>Classifier ensembles for changing environments'</article-title>
          ,
          <source>in In Multiple Classifier Systems</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          . Springer, (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Dragos</surname>
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Margineantu</surname>
            and
            <given-names>Thomas G.</given-names>
          </string-name>
          <string-name>
            <surname>Dietterich</surname>
          </string-name>
          . Pruning adaptive boosting,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Leandro</surname>
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Minku</surname>
            ,
            <given-names>Allan P.</given-names>
          </string-name>
          <string-name>
            <surname>White</surname>
          </string-name>
          , and Xin Yao, '
          <article-title>The impact of diversity on online ensemble learning in the presence of concept drift'</article-title>
          ,
          <source>IEEE Trans. on Knowl. and Data Eng</source>
          .,
          <volume>22</volume>
          (
          <issue>5</issue>
          ),
          <fpage>730</fpage>
          -
          <lpage>742</lpage>
          , (May
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Martin</given-names>
            <surname>Scholz</surname>
          </string-name>
          and Ralf Klinkenberg, '
          <article-title>Boosting classifiers for drifting concepts', Intelligent Data Analysis (IDA), Special Issue on Knowledge Discovery from Data Streams,</article-title>
          <year>2006</year>
          ,
          <year>2007</year>
          , (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Jan</given-names>
            <surname>Tozicka</surname>
          </string-name>
          , Michael Rovatsos, Michal Pechoucek, and Stepan Urban, 'Malef 58:
          <article-title>Framework for distributed machine learning and data mining',</article-title>
          <string-name>
            <given-names>Int. J. Intell. Inf. Database</given-names>
            <surname>Syst</surname>
          </string-name>
          .,
          <volume>2</volume>
          (
          <issue>1</issue>
          ),
          <fpage>6</fpage>
          -
          <lpage>24</lpage>
          , (
          <year>February 2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Haixun</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wei</surname>
            <given-names>Fan</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Philip S. Yu</surname>
          </string-name>
          , and Jiawei Han, '
          <article-title>Mining conceptdrifting data streams using ensemble classifiers', inProceedings of the ninth ACM SIGKDD</article-title>
          ,
          <source>KDD '03</source>
          , pp.
          <fpage>226</fpage>
          -
          <lpage>235</lpage>
          , New York, NY, USA, (
          <year>2003</year>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Yi</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Samuel Burer, and W. Nick Street, '
          <article-title>Ensemble pruning via semi-definite programming'</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>7</volume>
          ,
          <fpage>1315</fpage>
          -
          <lpage>1338</lpage>
          , (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          .
          <source>Stream data mining repository</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>