-

Self-Adaptive Ensemble Classi er for Handling Complex Concept Drift

Imen Khamassi

imen.khamassi@isg.rnu.tn 1

Moamar Sayed-Mouchaweh

moamar.sayed-mouchaweh@mines-douai.fr 0 0 . Ecole des Mines Douai , France 1 . Universite de Tunis, Institut Superieur de Gestion de Tunis , Tunisia

In increasing number of real world applications, data are presented as streams that may evolve over time and this is known by concept drift. Handling concept drift through ensemble classi ers has received a great interest in last decades. The success of these ensemble methods relies on their diversity. Accordingly, various diversity techniques can be used like block-based data, weighting-data or ltering-data. Each of these diversity techniques is e cient to handle certain characteristics of drift. However, when the drift is complex, they fail to e ciently handle it. Complex drifts may present a mixture of several characteristics (speed, severity, in uence zones in the feature space, etc) which may vary over time. In this case, drift handling is more complicated and requires new detection and updating tools. For this purpose, a new ensemble approach, namely EnsembleEDIST2, is presented. It combines the three diversity techniques in order to take bene t from their advantages and outperform their limits. Additionally, it makes use of EDIST2, as drift detection mechanism, in order to monitor the ensemble's performance and detect changes. EnsembleEDIST2 was tested through di erent scenarios of complex drift generated from synthetic and real datasets. This diversity combination allows EnsembleEDIST2 to outperform similar ensemble approaches in term of accuracy rate, and present stable behaviors in handling di erent scenarios of complex drift.

Learning from evolving data stream has received a great attention. It addresses the non-stationarity of data over time, which is known by concept drift. The term concept refers to data distribution, represented by the joint distribution p(x; y), where x represents the n dimensional feature vector and y represents its class label. The term concept drift refers to a change in the underlying distribution of new incoming data. For example, in intrusion detection application, the behavior of an intruder may evolve in order to confuse the system protection rules. Hence, it is essential to consider these changes for updating the system in order to preserve its performance.

Ensemble classi ers appear to be promising approaches for tracking evolving data streams. The success of the ensemble methods, according to single classi er, relies on their diversity [ 17 ] [ 22 ] [ 21 ]. Diversity can be achieved according to three main strategies [ 15 ]: block-based data, weighting-data or ltering-data. In blockbased ensembles [ 5 ], [ 16 ], [ 20 ], the training set is presented as blocks or chunks of data at a time. Generally, these blocks are of equal size and the evaluation of base learners is done when all instances from a new block are available. In weighting-data ensembles [ 3 ] [ 4 ] [ 18 ] [ 13 ], the instances are weighted according to some weighting process. For example in Online Bagging [ 19 ], the weighting process is based on re-using instances for training individual learners. Finally, ltering-data ensembles [ 1 ] are based on selecting data from the training set according to a speci c criterion, for example similarity in feature space.

In many real-life applications, the concept drift may be complex in the sense that it presents time-varying characteristics. For instance, a drift can present di erent characteristics according to its speed (abrupt or gradual ), nature (continuous or probabilistic) and severity (local or global ). Accordingly, complex drift can present a mixture of all these characteristics over time. It is worth to underline that each characteristic presents its own challenges. Accordingly, a mixture of these di erent characteristics may accentuate the challenge issues and complicate the drift handling.

In this paper, the goal is to underline the complementarity of the diversity techniques (block-based data, weighting-data and ltering-data) for handling different scenarios of complex drift. For this purpose, a new ensemble approach, namely EnsembleEDIST2, is proposed. The intuition is to combine these three diversity techniques in order to e ciently handle di erent scenarios of complex drift. Firstly, EnsembleEDIST2 de nes a data-block with variable size for updating the ensemble's members, thus it can avoid the problem of tuning o size of the data-block. Secondly, it de nes a new ltering criterion for selecting the most representative data of the new concept. Thirdly, it applies a new weighting process in order to create diversi ed ensemble's members. Finally, it makes use of EDIST2 [ 14 ] [ 12 ], as drift detection mechanism, in order to monitor the ensemble's performance and detect changes.

EnsembleEDIST2 has been tested through di erent scenarios of complex drifts generated from synthetic and real datasets. This diversity combination allows EnsembleEDIST2 to outperform similar ensemble approaches in term of accuracy rate, and present a stable behavior in handling di erent scenarios of complex drift.

The remainder of the paper is organized as follows. In Section II, the challenges of complex concept drift are exposed. In Section III, the advantages and the limits of each diversity technique are studied. In Section IV, the proposed approach, namely EnsembleEDIST2, is detailed. Section V, the experimental setup and the obtained results are presented. Finally, in Section VI, the conclusion and some future research directions are exposed.

Complex Concept Drift

In many real-life applications, the concept drift may be complex in the sense that it presents time-varying characteristics. Let us take the example of a drift with three di erent characteristics according to its speed (gradual or abrupt), nature (continuous or probabilistic) and severity (local or global). It is worth to underline that each characteristic presents its own challenges. Accordingly, a mixture of these di erent characteristics may accentuate the challenge issues and complicate the drift handling.

For instance, we can consider the drift depicted in Fig.1 as complex drift as it simulates a Gradual Continuous Local Drift, in the sense that the hyperplane class boundary is gradually rotating during the drifting phase and continuously presenting changes with each instance in local regions. Namely, the time until this complex drift is detected can be arbitrarily long. This is due to the rarity of data source representing the drift, which in turn makes it di cult to con rm the presence of drift. Moreover, in some cases, this drift can be considered as noise by confusion, which makes the model unstable. Hence, to overcome the instability, the model has to (i) e ectively di erentiate between local changes and noises, and (ii) deal with the scarcity of instances that represent the drift in order to e ectively update the learner.

Another interesting complex drift represents the Gradual Continuous Global Drift (see Fig.2). During this drift, the concept is gradually changing and continuously presenting modi cations with each instance. Namely, during the transition phase, the drift evolves and presents several intermediate concepts until the emergence of the nal concept (see Fig.2.b). Hence, the challenging issue is to e ciently decide the end time of the old concept and detect the start time of the new concept. The objective is to update the learner with the data that represent the nal concept (see Fig.2.c) and not with data collected during the concept evolution (see Fig.2.b). Moreover, this drift is considered as global because it is a ecting all the instances of the drifting class. Namely, handling this complex drift is also challenging, because the performance's decrease of the learner is more pronounced than the other types of drifts.

The diversity [ 15 ] among the ensemble can be ful lled by applying various techniques such as: block-based data, weighting-data or ltering data, in order to di erently train base learners (see Fig.3). Accordingly, the objective in this investigation is to highlight the advantages and drawbacks of each diversity techniques in handling complex drift (see Table 1).

According to the block-based technique, the training set is presented as blocks or chunks of data at a time. Generally, these blocks are of equal size and the construction, evaluation, or updating of base learners is done when all instances from a new block are available. Very often, ensemble learners periodically evaluate their components and substitute the weakest one with a new (candidate) learner after each data block [ 20 ] [ 16 ] [ 6 ]. This technique preserves the adaptability of the ensemble in such way that learners, which were trained in recent blocks, are the most suitable for representing the current concept.

The block-based ensembles are suitable for handling gradual drifts. Generally, during these drifts, the change between consecutive data blocks is not quite pronounced; thus, it can be only noticeable in long period. The interesting point in the block-based ensembles is that they can enclose di erent learners that are trained in di erent period of time. Hence, by aggregating the outputs of these base classi ers, the ensemble can o er accurate reactions to such gradual drifts.

In contrast, the main drawback of block-based ensembles is the di culty of tuning o the block size to o er a compromise between fast reactions to drifts and high accuracy. If the block size is too large, they may slowly react to abrupt drift; whereas small size can damage the performance of the ensemble in stable periods.

In this technique, the base learners are trained according to weighted instances from the training set. A popular instance weighting process is presented (a) Block-based (b) Weighting-data (c) Filtering-data

Data Stream C1

C 2

Ensemble Classifier 3 times

1 time C 2 in the Online Bagging ensemble [ 19 ]. For ease of understanding, the weighting process is based on re-using instances for training individual classi ers. Namely, if we consider that each base classi er Ci is trained from a subset Mi from the global training set; then the instancei will be presented k times in Mi; where the weight k is drawn from a P oisson(1) distribution.

Online Bagging has inspired many researchers in the eld of drift tracking [ 3 ] [ 17 ] [ 13 ]. This approach can be of great interest for: { Class imbalance: where some classes are severely underrepresented in the dataset { Local drift: where changes occur in only some regions of the instance space. Generally, the weighting process intensi es the re-use of underrepresented class data and helps to deal with the scarcity of instances that represent the local drift. However, the instance duplication may impact the ability of the ensemble in handling global drift. During global drift, the change a ects a large amount of data; thus when re-using data for constructing base classi ers, the performance's decrease is accentuated and the recovery from the drift may be delayed. 3.3

Filtering-data Technique

This technique is based on selecting data from the training set according to a speci c criterion, for example similarity in the feature space. Such technique allows to select subsets of attributes that provide partitions of the training set containing maximally similar instances, i.e., instances belonging to the same regions of feature space. Thanks to this technique, base learners are trained according to di erent subspaces to get bene t from di erent characteristics of the overall feature space.

In contrast with conventional approaches which detect drift in the overall distribution without specifying which feature has changed, ensemble learners based on ltered data can exactly specify the drifting feature. This is a desired property for detecting novel class emergence or existing class fusion in unlabeled data. However, these approaches may present di culty in handling local drifts if they do not de ne an e cient ltering criterion. It is worth to underline that during local drift, only some regions of the feature space are a ected by the drift. Hence, only the base classi er which is trained on changing region is the most accurate to handle the drift. However, when aggregating the nal decision of this classi er with the remained classi ers, trained from unchanged regions, the performance recovery may be delayed.

The intuition behind EnsembleEDIST2 is to combine the three diversity techniques (Block-based, Weighting-data and Filtering data) in order to take bene t from their advantages and avoid their drawbacks.

The contributions of EnsembleEDIST2 for e ciently handling complex concept drifts are as follows, it: { Explicitly handles drift through a drift detection method EDIST2 [ 14 ] (sub

Section4.1) { Makes use of data-block with variable size for updating the ensemble's members (subSection4.2) { De nes a new ltering criterion for selecting the most representative data of the new concept (subSection4.3) { Applies a new weighting process in order to create diversi ed ensemble's members (subSection4.4)

WG W 0 WG W 0

EnsembleEDIST2 is an ensemble classi er designed to explicitly handle drifts. It makes use of EDIST2 [ 14 ], as drift detection mechanism, in order to monitor the ensemble's performance and detect changes (see Fig4).

EDIST2 monitors the prediction feedback provided by the ensemble. More precisely, EDIST2 studies the distance between two consecutive errors of classi cation. Notice that the distance is represented by the number of instances between two consecutive errors of classi cation. Accordingly, when the data distribution becomes non-stationary, the ensemble will commit much more errors and the distance between these errors will decrease.

In EDIST2, the concept drift is tracked through two data windows, a 'global' one and a 'current' one. The global window WG is a self-adaptive window which is continuously incremented if no drift occurs and decremented otherwise; and the current window W0 which represents the batch of current collected instances.

In EDIST2, we want to estimate the error distance distribution of WG and W0 and make a comparison between the averages of their error distance distributions in order to check a di erence. As stated before, a signi cant decrease in the error distance implies a change in the data distribution and suggests that the learning model is no longer appropriate.

EDIST2 makes use of a statistical hypothesis test in order to compare WG and W0 error distance distributions and check whether the averages di er by more than the threshold . It is worth underlining that there is no a priori de nition of the threshold , in the sense that it does not require any a priori adjusting related to the expected speed or severity of the change. is autonomously adapted according to a statistical hypothesis test (for more details please refer ti [ 14 ]).

The intuition behind EDIST2 is to monitor d which represents di erence between WG and W0 averages and accordingly three thresholds are de ned: { In-Control level : d ; within this level, we con rm that there is no change between the two distributions, so we enlarge WG by adding W0 's instances. Accordingly, all the ensemble members are incremented according to data samples in WG and W0. { Warning level : d > ; within this level, the instances are stored in an warning chunk Wwarning. Accordingly, all the ensemble members are incremented according to weighted data from Wwarning. (The weighting process will be explained in subSection4.4) { Drift level : d > + d; within this level, the drift is con rmed and WG is decremented by only containing the instances stored since the warning level,i.e., in Wwarning. Additionally, a new base classi er is created from scratch and trained according to data samples in Wwarning, then the oldest classi er is removed from the ensemble. 4.2

EnsembleEDIST2's diversity by variable-sized block technique

In EnsembleEDIST2, the size of data-block is not de ned according to the number of instances, as it is the case of conventional block-based ensembles, but according to the number of errors committed during the learning process. More precisely, the data-block W0, in EnsembleEDIST2, is constructed by collecting the instances that exist between N0 errors.

As depicted in Fig.5 , when the drift is abrupt, the ensemble commits N0 errors in short drifting time. However, when the drift is gradual, the ensemble commits N0 errors in relatively longer drifting time. Hence, according to this strategy, the block size is variable and adjusted according to drift characteristics.

It is worth to underline that EnsembleEDIST2 can o er a compromise between fast reaction to abrupt drift and stable behavior regarding gradual drift. This is a desirable property for handling complex drift which may present different characteristics in the same time, and accordingly EnsembleEDIST2 can avoid the problem of tuning o the size of data-block as it is the case of most block-based approaches.

(a) Abrupt drift

(b) Gradual drift

Di erently from conventional ltering-data ensembles, which lter data according to similarity in the feature space, EnsembleEDIST2 de nes a new ltering criterion. It lters the instances that trigger the warning level. More precisely, each time the ensemble reaches the warning level, the instances are gathered in a warning chunk Wwarning in order to re-use them for training the ensemble's members (see Fig.6.a). This is an interesting point when dealing with local drift because drifting data are scarce and not continuously provided. It is possible that a certain amount of drifting data can be found in zones (1), (2), (3) and (4) but not quite su cient to reach the drift level. Accordingly, by considering these data for updating the ensemble's members, EnsembleEDIST2 can ensure a rapid recovery from local drift.

In contrast, conventional ltering-data ensembles are unable the de ne in which zone the drift has occurred, thus, they may update the ensemble's members with data ltered from unchanged feature space; which in turn may delay the performance correctness. 4.4

EnsembleEDIST2's diversity by new weighting-data process

The focus in EnsembleEDIST2 is to maximize the use of data present in Wwarning for accurately updating the ensemble. More precisely, the data in Wwarning are weighted according to the same weighting process used in Online bagging [ 19 ]. Namely, each instancei from Wwarning is re-used k times for training the base classi er Ci , where the weight k is drawn from a P oisson(1) distribution (see Appendix7).

Generally, the weighting process in EnsembleEDIST2 o ers twofold advantages. First, it intensi es the re-use of underrepresented class data and helps to deal with scarcity of instances that represent the local drift. Second, it permits faster recovery from global drift than conventional weighting-data ensembles. As it is known, during global drift, the change a ects a large amount of data. Hence, di erently from conventional weighting-data ensembles, which apply the weighting process to all the data sets; EnsembleEDIST2 only weights the instances present in Wwarning (see Fig.6.b). Accordingly, it can avoid to accentuate the decrease of the ensemble's performance during global drift, and ensure a fast recovery.

Experiments and performance analysis Experimental evaluation

Synthetic Datasets In this investigation, we are studying six di erent scenarios of complex concept drift as depicted in Table 2 . All synthetic datasets contain 100; 000 instances and one concept drift where the starting and the ending time are prede ned. For gradual drift, the drifting time lasts 30; 000 instances (it begins at tstart=40,000 and ends at tend = 70; 000). For abrupt drift, the drift occurs at t = 50; 000. Electricity Dataset (48,312 instances, 8 attributes, 2 classes) is a real world dataset from the Australian New South Wales Electricity Market [ 9 ]. In this electricity market, the prices are not xed and may be a ected by demand and supply. The dataset covers a period of two years and the instances are recorded every half an hour. The classi cation task is to predict a rise (UP) or a fall (DOWN) in the electricity price. Three numerical features are used to de ne the feature space: the electricity demand in current region, the electricity demand in the adjacent regions and the schedule of electricity transfer between the two regions.

This dataset may present several scenarios of complex drift. For instance, a gradual continuous drift may occur when the users progressively change their consumption habits during a long time period. Likewise, an abrupt drift may occur when the electricity prices suddenly increase due to unexpected events (e.g., political crises or natural disasters). Moreover, the drift can be local if it impacts only one feature (e.g., the electricity demand in current region); or global if it impacts all the features.

Spam Dataset (9,324 instances, 500 attributes, 2 classes) is a real world dataset containing email messages from the Spam Assassin Collection Project [ 11 ]. The classi cation task is to predict if a mail is a spam or legitimate. The data set contains 20% of spam mailing. The feature space is de ned by a set of numerical features such as the number of receptors, textual attributes describing the mail contain and sender characteristics:::

This dataset may present several scenarios of complex drift. For instance, a gradual drift may occur when the user progressively changes his preferences. However, an abrupt drift may occur when the spammer rapidly changes the mail content to trick the spam lter rules. It is worth to underline that the drift can also be continuous when the spammer starts to change the spam content; but the lter continues to correctly detect them. In the other side, the drift can be probabilistic when the spammer starts to change the spam content; but the lter fails in detecting some of them.

Evaluation criteria When dealing with evolving data streams, the objective is to study the evolution of the EnsembleEDIST2 performance over time and see how quick the adaptation to drift is. According to Gama et al. [ 8 ] the prequential accuracy is a suitable metric to evaluate the learner performance in presence of concept drift. It proceeds as follows: each instance is rstly used for testing then for training. Hence, the accuracy is incrementally updated using the maximum available data; and the model is continuously tested on instances that it has not already seen (for more details please refer to [ 8 ]).

Parameter Settings All the tested approaches were implemented in the java programming language by extending the Massive Online Analysis (MOA) software [ 2 ]. MOA is an online learning framework for evolving data streams and supports a collection of machine learning methods.

For comparison, we have selected well known ensemble approaches according to each category: { Block-based ensemble: AUE (Accuracy Updated Ensemble) [ 5 ], AWE (Accuracy Weighted Ensemble) [ 16 ] and LearnNSE [ 20 ] with block size equal to 500 instances. { Weighting-data ensemble: LeveragingBag [ 3 ] and OzaBag [ 19 ] { Filtering-data ensemble: LimAttClass [ 1 ] For all these approaches, the ensemble's size was xed to 10 and the Hoe ding Tree (HT) [ 7 ] was used as base learning algorithm.

It is worth to notice that EnsembleEDIST2 makes use of two parameters: N0 which is the number of error in W0 and m which is the number of base classi ers among the ensemble. In this investigation, we respectively set N0 = 30 and m = 3 according to empirically studies done in subSections 5.2 and 5.2. 5.2

Comparative study and interpretation Impact of N0 on EnsembleEDIST2 performance EnsembleEDIST2 makes

use of the parameter N0 in order to de ne the minimum number of error occurred in W0. Recall that W0 represents the batch of current collected instances. This batch is constructed by collecting the instances that exist between N0 errors.

It is interesting to study the impact of N0 on the accuracy according to di erent scenarios of complex drift. For this purpose, we have done the following experiments: for each scenario of complex drift, the accuracy of EnsembleEDIST2 is presented by varying N0 values (see Table 3).

Based on these results, we can conclude that the performance of EnsembleEDIST2 in handling di erent scenarios of complex drifts is weakly sensitive to N0. Hence, we have decided to use N0 = 30 as it has achieved the best accuracy rate in most cases.

Impact of ensemble size on EnsembleEDIST2 performance Ensem

bleEDIST2 makes use of the parameter m in order to de ne the number of classi ers in the ensemble. Accordingly, it is interesting to study the impact of m on ensemble's performance according to di erent scenarios of complex drift.

According to Table4, it is noticeable that the size of EnsembleEDIST2 does not impact signi cantly the performance in handling di erent scenarios of complex drift. Hence, we have decided to use m = 3 as it achieved the best accuracy rate in most cases and it allows to limit the computational complexity of the ensemble. Accuracy of EnsembleEDIST2 Vs other ensembles Table5 summarizes the average of prequential accuracy during the drifting phase. The objective of this experiment is to study the ensemble performance in the presence of di erent scenarios of complex drift. Firstly, it is noticeable that EnsembleEDIST2 has achieved better results than block-based ensembles in handling di erent types of abrupt drift. During abrupt drift (independently of being local of global), the change is rapid; thus AUE, AWE and LearnNSE present di culty in tuning o the block size to o er a compromise between fast reaction to drift and high accuracy. However, EnsembleEDIST2 is able to autonomously train ensemble members with variable amount of data at each time process, thus it can e ciently handle abrupt drift.

Secondly, it is noticeable that EnsembleEDIST2 outperforms weighting-data ensembles in handling di erent categories of global drift. During global drift (either continuous, probabilistic or abrupt), the change a ects a large amount of data; thus when LeveragingBag and OzaBag intensify the re-use of data for training ensemble members, the performance's decrease is accentuated. In contrast, EnsembleEDIST2 duplicates only a set of ltered instances for training the ensemble members, that is why it is more accurate in handling global drift.

Thirdly, it is noticeable that EnsembleEDIST2 outperforms the ltering-data ensembles in handling di erent categories of local drift. During local drift (either continuous, probabilistic or abrupt), the change a ects a little amount of data; thus the choice of the ltering criterion is a essential point for e ciently handling local drift. EnsembleEDIST2 de nes a new ltering criterion, which is based on selecting the data that triggered the warning level. These data are the most representative of the new concept, thus when training the ensemble's members accordingly, it makes it more e cient for handling local drift.

EnsembleEDIST2 has also been tested through real world data sets which represent di erent scenarios of drift. It is worth underlining that the size of these data sets is relatively small comparing to the synthetic ones. Despite the di erent features of each real data set, encouraging results have been found where EnsembleEDIST2 has achieved the best accuracy in all the datasets (see Table6).

To sum, it is worth to underline that the combination of the three diversity techniques in EnsembleEDIST2 is bene cial for handling di erent scenarios of complex drift in the same time.

In this paper, we have presented a new study of the role of diversity among the ensemble. More precisely, we have highlighted the advantages and the limits of three widely used diversity techniques (block-based data, weighting-data and ltering data) in handling complex drift.

Additionally, we have presented a new ensemble approach, namely EnsembleEDIST2, which combines these three diversity techniques. The intuition behind this approach is to explicitly handle drifts by using the drift detection mechanism EDIST2. Accordingly, the ensemble performance is monitored through a self-adaptive window. Hence, EnsembleEDIST2 can avoid the problem of tuning o the size of the batch data as it is the case of most block-based ensemble approaches, which is a desirable property for handling abrupt drifts. Secondly, it de nes a new ltering criterion, which is based on selecting the data that trigger the warning level. Thanks to this property, EnsembleEDIST2 is more e cient for handling local drifts then conventional ltering-data ensembles, which are only based on ltering data according to similarity on feature space. Then, di erently from the conventional weighting-data ensembles which apply the weighting process to all the data stream; EnsembleEDIST2 only intensi es the re-use of most representative data of the new concept, which is a desirable property for handling global drifts.

EnsembleEDIST2 has been tested di erent scenarios of complex drift. Encouraging results were found, comparing to similar approaches, where EnsembleEDIST2 has achieved the best accuracy rate in all datasets; and presented a stable behavior in handling di erent scenarios of complex drift.

It worth to underline that in the present investigation, the ensemble size, i.e., the number of ensemble members, was xed. Hence it is interesting, for future work, to perform a strategy for dynamically adapting the ensemble size. The focus is that, during stable period, the ensemble size is maintained xed; whereas during the drifting phase the size is autonomously adapted. This may ameliorate the performance and reduce the computational cost among the ensemble. Acknowledgements The second author acknowledges the support of the Regional project REPAR, funded by the French Region Hauts-de-France.

EnsembleEDIST2 pseudo code Algorithm EnsembleEDIST2

Input: (x; y): Data Stream

N0: number of error to construct the window m: number of base classi er Output: Trained ensemble classi er E 1. for each base classi er Ci from E 2. InitializeClassif ier(Ci) 3. end for 4. WG CollectInstances(E; N0) 5. Wwarning 6. repeat 7. W0 CollectInstances(E; N0) 8. Level DetectedLevel(WG; W0) 9. switch (Level) 10. case 1: Incontrol 11. WG WG [ W0 12. U pdateP arameters(WG; W0) 13. Increment all ensemble's members of E according to instances in

Algorithm DetectedLevel(WG; W0)

Input: WG: Global data window characterized by:

NG: error number

G: error distance mean

G:error distance standard deviation W0: Current data window characterized by:

N0: error number,

WG end case 1 case 2: W arning

Wwarning Wwarning [ W0 U pdateP arameters(Wwarning; W0)

W eightingDataP rocess(E; Wwarning) end case 2 case 3: Drif t

Create a new base classi er Cnew trained on instances in Wwarning E E [ Cnew

Remove the oldest classi er from E 0: error distance mean, 0:error distance standard deviation Output: Level: detection level 1.

Algorithm UpdateParameters(WG; W0)

Input: WG: Global data window characterized by: NG: error number

1 NG+N0 (NG: G+N0: 0) G NG + N0 q NG G2+N0 02 + (NNGG+NN00)2 ( G

NG+N0 0)2

Algorithm WeightingDataProcess(E; Wwarning)

Input: E: Ensemble Classi er

Wwarning: Window of data Output: E: Updated ensemble classi er 1. for each instance xi from Wwarning 2. for each base classi er Ci from E 3. k poisson(1) 4. do k times 5. T rainClassif ier(Ci; xi) 6. end do 7. end for 8. end for 23. Schlimmer, J.C., Granger, Jr., R.H.: Incremental learning from noisy data. Mach.

Learn. 1(3), 317{354 (Mar 1986) 24. Street, W.N., Kim, Y.: A streaming ensemble algorithm (sea) for large-scale classi cation. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 377{382. KDD01, ACM, New York, NY, USA (2001)

1. Bifet , A. , Frank , E. , Holmes , G. , Pfahringer , B. , Sugiyama , M. , Yang , Q. : Accurate ensembles for data streams: Combining restricted hoe ding trees using stacking . In: 2nd Asian Conference on Machine Learning (ACML2010) . pp. 225 { 240 ( 2010 )

2. Bifet , A. , Holmes , G. , Kirkby , R. , Pfahringer , B. : MOA: massive online analysis . Journal of Machine Learning Research 11 , 1601 { 1604 ( 2010 )

3. Bifet , A. , Holmes , G. , Pfahringer , B. : Leveraging bagging for evolving data streams . In: Proceedings of the 2010 European Conference on Machine Learning and Knowledge Discovery in Databases: Part I . pp. 135 { 150 . ECML PKDD' 10 , Springer-Verlag, Berlin, Heidelberg ( 2010 ), http://dl.acm.org/citation.cfm? id= 1888258 . 1888275

4. Bifet , A. , Holmes , G. , Pfahringer , B. , Kirkby , R. , Gavalda , R.: New ensemble methods for evolving data streams . In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . pp. 139 { 148 . KDD '09, ACM , New York, NY, USA ( 2009 ), http://doi.acm. org/10 .1145/1557019. 1557041

5. Brzezinski , D. , Stefanowski , J.: Reacting to di erent types of concept drift: The accuracy updated ensemble algorithm . Neural Networks and Learning Systems, IEEE Transactions on 25(1) , 81 {94 (Jan 2014 )

6. Brzezinski , D. , Stefanowski , J.: Accuracy updated ensemble for data streams with concept drift . In: Corchado, E. , Kurzy?ski, M. , Wo?niak, M. (eds.) Hybrid Arti cial Intelligent Systems, Lecture Notes in Computer Science , vol. 6679 , pp. 155 { 163 . Springer Berlin Heidelberg ( 2011 ), http://dx.doi.org/10.1007/ 978-3- 642 -21222-2_ 19

7. Domingos , P. , Hulten , G.: Mining high-speed data streams . In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . pp. 71 { 80 . KDD00, ACM, New York, NY, USA ( 2000 )

8. Gama , J. , Sebastio , R. , Rodrigues , P. : On evaluating stream learning algorithms . Machine Learning 90 ( 3 ), 317 { 346 ( 2013 )

9. Harries , M. : Splice-2 comparative evaluation: Electricity pricing . Tech. rep. , The University of South Wales, United Kingdom ( 1999 )

10. Hulten , G. , Spencer , L. , Domingos , P. : Mining time-changing data streams . In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining , San Francisco, CA, USA, August 26- 29 , 2001 . pp. 97 { 106 ( 2001 )

11. Katakis , I. , Tsoumakas , G. , Vlahavas , I. : Tracking recurring contexts using ensemble classi ers: an application to email ltering . Knowledge and Information Systems 22 ( 3 ), 371 { 391 ( 2010 )

12. Khamassi , I. , Sayed-Mouchaweh , M. : Drift detection and monitoring in nonstationary environments . In: Evolving and Adaptive Intelligent Systems (EAIS) , Austria . pp. 1 { 6 ( June 2014 )

13. Khamassi , I. , Sayed-Mouchaweh , M. , Hammami , M. , Ghedira , K. : Ensemble classi ers for drift detection and monitoring in dynamical environments . In: Annual Conference of the Prognostics and Health Management Society , New Orlean, USA, 2013 ( October 2013 )

14. Khamassi , I. , Sayed-Mouchaweh , M. , Hammami , M. , Ghedira , K. : Self-adaptive windowing approach for handling complex concept drift . Cognitive Computation 7 ( 6 ), 772 { 790 ( 2015 ), http://dx.doi.org/10.1007/s12559-015-9341-0

15. Khamassi , I. , Sayed-Mouchaweh , M. , Hammami , M. , Ghedira , K. : Discussion and review on evolving data streams and concept drift adapting . Evolving Systems (Oct 2016 ), http://dx.doi.org/10.1007/s12530-016-9168-2

16. Kolter , J.Z. , Maloof , M.A. : Dynamic weighted majority: An ensemble method for drifting concepts . J. Mach. Learn. Res . 8 , 2755 {2790 (Dec 2007 )

17. Minku , L. , White , A. , Yao , X. : The impact of diversity on online ensemble learning in the presence of concept drift. Knowledge and Data Engineering , IEEE Transactions on 22 ( 5 ), 730 {742 (May 2010 )

18. Minku , L. , Yao , X. : Ddd: A new ensemble approach for dealing with concept drift. Knowledge and Data Engineering , IEEE Transactions on 24 ( 4 ), 619 {633 (April 2012 )

19. Oza , N.C. , Russell , S. : Online bagging and boosting . In: In Arti cial Intelligence and Statistics 2001 . pp. 105 { 112 . Morgan Kaufmann ( 2001 )

20. Polikar , R. , Upda , L. , Upda , S. , Honavar , V. : Learn++: an incremental learning algorithm for supervised neural networks . Systems, Man, and Cybernetics , Part C: Applications and Reviews, IEEE Transactions on 31(4) , 497 {508 (Nov 2001 )

21. Ren , Y. , Zhang , L. , Suganthan , P.N. : Ensemble classi cation and regression-recent developments, applications and future directions [review article] . IEEE Computational Intelligence Magazine 11 ( 1 ), 41 { 53 ( 2016 )

22. Sayed-Mouchaweh , M. : Learning from Data Streams in Dynamic Environments, chap . Handling Concept Drift , pp. 33 { 59 . Springer International Publishing, Cham ( 2016 ), http://dx.doi.org/10.1007/978-3- 319 -25667- 2 _ 3