<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ensemble-Based Prediction of Business Processes Botlenecks With Recurrent Concept Drifts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yorick Spenrath</string-name>
          <email>y.spenrath@student.tue.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marwan Hassani</string-name>
          <email>m.hasssani@tue.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Mathematics and Computer Science, Eindhoven University of Technology</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Bottleneck prediction is an important sub-task of process mining that aims at optimizing the discovered process models by avoiding such congestions. This paper discusses an ongoing work on incorporating recurrent concept drift in bottleneck prediction when applied to a real-world scenario. In the field of process mining, we develop a method of predicting whether and which bottlenecks will likely appear based on data known before a case starts. We next introduce GRAEC, a carefully-designed weighting mechanism to deal with concept drifts. The weighting decays over time and is extendable to adapt to seasonality in data. The methods are then applied to a simulation, and an invoicing process in the field of installation services in real-world settings. The results show an improvement to prediction accuracy compared to retraining a model on the most recent data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>Concept drift expresses an occurrence of a shift in relations between
input and output data over time. The challenge with concept drift is
that it is dificult to asses, detect, and correct such evolutions. One
method of solving concept drift is to retrain models periodically.
This method poses two major disadvantages. First, this reduces the
quantity of available data, which is a big issue when the number
of data points per time unit is low, or when dealing with
unbalanced datasets. The combination of these two causes over-fitting of
the model on larger classes. Second, retraining a model with new
data inherently discards former data, which could remove valuable
information on seasonality.</p>
      <p>This paper discusses a technique to cover both issues and applies
it to reduce the duration of a process, by predicting if, and where,
bottlenecks take place in the process. A simple example for a paper
submission process is shown in Figure 1, where the bottlenecks are
indicated for process instances at diferent points in time. Note the
bottle activities for papers on Process Mining. The proposed
solution uses multiple machine learning models trained over diferent
intervals of previous data, and assigns weights to the predictions
of each model based on the diference in time between the model
First International Workshop on Data Science for Industry 4.0.</p>
      <p>Copyright ©2019 for the individual papers by the papers’ authors. Copying permitted
for private and academic purposes. This volume is published and copyrighted by its
editors.
and the test data, factoring in an exponential decay (giving
priority to recent models) and a periodic function (giving priority to
seasonally similar points in time). As such we develop a Gradual
and Recurrent concept drift Adapting Ensemble Classifer, or GRAEC.
The contributions of this paper are:
• A mathematical way for assessing bottlenecks in business
processes
• A novel concept drift adaptation method, GRAEC
• An extensive experimental evaluation using a simulation
study and a real-world dataset, which proves the superiority
of our method compared to existing solutions which also
proves the correctness of our concept
The rest of the paper is organised as follows. Section 2 discusses
some related literature. Section 3 defines the problem that this paper
aims to solve. Section 4 proposes a solution to adapting for concept
drift with seasonality. Section 5 describes the experimental set-up
for the simulation and the real-world dataset. Section 6 presents
and discusses the results of the experiment. Section 7 concludes the
paper and provides suggestions for future research.
2</p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>
        Concept drift has been a topic in online data streams in recent
years. Concept drift is the phenomenon where relations between
input (features) and output (labels in the case of the experiments
in this paper) change over time, because the underlying models
and/or distributions change [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Much of existing literature is on
detecting concept drift. Techniques for detection include sudden
change in classification accuracy (such as in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]), or changes in
feature importance (such as in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]). Detecting concept drift
is a first step, adapting to it is the next. One way to do so is by
simply retraining the model whenever a drift in concept is detected.
A disadvantage of this technique occurs if the drift is periodic, i.e.
a drift of concept is detected, but the previous concept may still be
of use at a later point in time, just not right after the detected drift.
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] trains a model for every given number of data points, and then
creates an ensemble classifier, weighting each model according to
the accuracy assessed on the most recent data points.
      </p>
      <p>
        The challenges of seasonal concepts, or what is more known in
the literature as recurrent concepts, have been addressed recently in
[
        <xref ref-type="bibr" rid="ref13 ref15 ref16 ref4 ref9">4, 9, 13, 15, 16</xref>
        ]. The general idea of approaches handling recurring
concepts is not to lose knowledge gathered over time. Therefore,
they maintain a pool of classifiers and use one or an ensemble of
them each time. Our approach is bringing the benefits of
ensemblebased recurrent concept drifts to the domain of business process
intelligence for predicting bottlenecks.
      </p>
      <p>
        In the field of prediction process bottlenecks, [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] predicts the
duration of queue time, which can be regarded as one activity. In a
study on handling concept drift in business processes, [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
investigates diferent solutions to adapt to concept drift when predicting
process outcomes. The authors use information about the process
execution (Control-flow perspective) as well as involved data
(Dataperspective). They show the efectiveness of incremental machine
learning algorithms such as Adaptive Hoefding Option Trees [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] in
handling diferent types of concept drift, amongst which recurrent
concept drift. Diferent to this approach, our novel method
introduced in this paper implements an ensemble classifier to adapt to
recurrent concepts.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>PROBLEM DESCRIPTION</title>
      <p>The problem discussed in this paper is to adapt to concept drift
when applied in the domain of business processes analysis. The
focus of the paper is on the efectiveness of the proposed concept
drift adaption method, finding the bottlenecks is the use case.</p>
      <p>We aim to improve the throughput of the process by indicating
bottleneck activities based on information we have before the case
starts. In our toy example of paper submissions, such information
could be the author(s), the research area, the number of publications
each author has, and the number of pages in the paper, all of which
are known once a submission is made. Using this information we
then predict which of the activity will be the bottleneck activity.
This bottleneck is then considered as a label for classifiers, which
are trained using the information of the case as input.</p>
      <p>We assume that the process is stable, i.e. our research is limited
to processes whose structure does not change over time. The time
evolution we are looking at is that of the relations between case
information and bottleneck labels, i.e. the drift in the underlying
statistical models.
4</p>
    </sec>
    <sec id="sec-4">
      <title>PROPOSED SOLUTION</title>
      <p>The problem this paper addresses is predicting bottleneck activities
in business processes, and improve these predictions by accounting
for concept drifts. The goal is not to indicate or explicitly detect
concept drift, but to optimise the use of selected previously trained
models to improve prediction quality. This section describes the
proposed solution, first introducing the context of business
processes (Subsection 4.1), next assessing what activity is a bottleneck
activity (Subsection 4.2), and finally improving predictions using
GRAEC (Subsection 4.3). We discuss them in general and also in
applying them to a simulation and a case study in Section 5.
4.1</p>
    </sec>
    <sec id="sec-5">
      <title>Business Processes</title>
      <p>Process mining is field of research that bridges the gap between
the computational possibilities of data mining with the industrial
application of business processes, by learning process behaviour
from event logs. Event logs consist of events, each describing what
happened (Activity), at what time (Time), by whom (Resource), for
which process instance (Case ID). Time can alternatively be
represented by considering start and end times of events. The analysis
in this paper only considers end times (the time when the activity
is logged) but can easily be extended to include both. Resources
are not considered in the process analysis part of the solution. To
explain the algorithm used to define the bottlenecks, some notation
is required.</p>
      <p>An event log L is a set of events e, a subset of all possible events
E. An event e is a combination of the Case ID, Activity, and Time,
represented as the tuple (e .cid, e .act, e .time). A case c is a sequence
of events, c ∈ E∗. The first element of a sequence c is represented
as c(0). We further define the function act (c) : E∗ → A∗, mapping
each element of a case to the activity of the event, A being the set
of all activities. act (c) is said to be the trace of c, and two cases
c1 and c2 are said to be of the same variant if act (c1) = act (c2).
The function is extended to a set C of cases; if all c ∈ C have
the same variant, act (C) is the sequence of activities for which
act (c) = act (C) for all c ∈ C. The duration of a case c is defined
as duration(c) = c(|c | − 1).time − c(0).time, i.e. the time diference
between the first and the last event. This implies that the duration
of the first event is not considered in the duration. If only the end
times of events are logged, there is no way to asses the duration of
the first event. In the example sketched above this is no issue: the
submission merely marks the start of the case, no work is required
by the committee. The same applies to the real-world scenario
(assessed in Lines 8 and 7 respectively) in Line 15. If there are no
activities that satisfies both requirements, we take the activity with
the longest duration that satisfies the Resolvability requirement, in
Line 17.</p>
      <sec id="sec-5-1">
        <title>Algorithm 1: Selection of the bottleneck</title>
        <p>Result: Each case c in C gets a label label (c)
1 for i = 1 . . . N do
2 σ ← act (Hi )
3 for c ∈ Hi do
4 label (c) ← short
5
presented in Section 5. In other processes some notion of the start
of the process (like an automated trigger caused by a client) needs
to be identified to analyse the throughput of the first activity.</p>
        <p>In the example of Figure 1, each case is one submission. The
Case ID could be a DOI, the activity set A would be {Submit,
Assign reviewers, Review1, Review2, Review3, Combine reviews, Send
notification }. Events would relate these two together, providing
information on when an activity for the given paper was completed.
The process consists of only one variant, i.e. there is only one way
to follow the process.
Let C be a set of cases. We want to find bottlenecks in cases that
have a long duration. We partition the set of cases into Hi and
Li , each representing cases that take up to a given time d
(Highthroughput) and cases that take more than d (Low-throughput)
respectively, for variant i for i ∈ 1...N , N being the total number of
variants in C. For activities to be considered bottlenecks, we have
two requirements: Relevance and Resolvability.</p>
        <p>
          If an activity has a long duration compared to other cases, but
a short duration compared to other activities in the same case,
it will be less efective to resolve a long duration of that activity
than activities that take up a larger part of the case duration. We
formalise this Relevance requirement by requiring that the fraction
of the total case duration of the activity is at least α , where α ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]
is a parameter chosen on domain expert’s recommendation and
process characteristics (like the number of activities). Note that
setting α to 0 drops the relevance requirement entirely. Further
note that if α is set too high, no activities will meet this requirement.
It will hence be important to find an appropriate balance or to have
a fall-through mechanism to label cases.
        </p>
        <p>An activity is resolvable if it has a longer duration than a
benchmark average for that activity. We only address bottlenecks in Li , as
such we use the average duration of the activity in Hi as benchmark.
This is the Resolvability requirement. We know that there must be
at least one activity that satisfies this requirement, since the sum
of all durations (i.e. the duration of the case) of any case in Li is
larger than the duration of any case in Hi , so at least one activity
must take longer than the average of that activity in Hi .</p>
        <p>Finally, of all activities that pass both requirements, we choose
the activity that deviates the most (in terms of the standard
deviation in Hi from the benchmark average). This way ensures that the
chosen bottleneck is least explained by the intrinsic variability of
real-world processes. This procedure is described in Algorithm 1.</p>
        <p>Line 1 loops over each Variant of C. First we derive the Trace of
the Variant in Line 2. Note that this is the same as act (Li ). In Lines
3 and 4 we label each case in Hi as short. Lines 7 and 8 calculate the
sample mean and standard deviation1. We next consider each case
with a long duration in Lines 10 - 19. First we take all indices of
the Trace in Line 11. Next we apply the Resolvability requirement
in Line 12, and the Relevance requirement in Line 13. We then
check if there is any activity that meets both requirements in Line
14. If there is, we apply as the label the activity whose duration
deviates the most standard deviations from the mean throughput
1We could also have taken the population standard deviation, it does not make a
diference since the sample size is the same for each activity.
4.3</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Incorporating the Efect of Concept Drift</title>
      <p>This subsection proposes a method of dealing with concept drift.
The general idea is to split available training data into smaller sets,
and give weights to the models created by each set. The method is
general, and we will demonstrate its efectiveness in our use case
of predicting bottlenecks.</p>
      <p>Let D be our dataset, and the points in D be di . Each datapoint
has input features di .x , a timestamp di .time, and a label di .y. In our
use case of business process bottleneck prediction, di is a case, di .x
is information about a case, di .time is the timestamp of the first
event of a case, and di .y is the bottleneck as labelled by Algorithm 1.
In the light of Figure 1, di .x would be information on the author(s),
the research area, the number of publications each author has, and
the number of pages in the paper. di .time would be the time of
the submission, and di .y would be the bottleneck, as indicated in
Figure 1.</p>
      <p>D contains data over a time period of length T = maxdi ∈D di .time−
mindi ∈D di .time. This length is divided into smaller data sets Dj ,
each with length S, such that di ∈ Dj ⇔ ⌊ di .tSime ⌋ = j. In this
paper S is regarded as a constant in time, and diferent values of
end
for k ∈ 1 . . . |σ | − 1 do
x¯k ← |H1i | Íc ∈Hi c(k).time − c(k − 1).time
sk ← r Íc∈Hi (c(k).t ime|H−ci (|k−1).t ime−x¯k )2
end
for C ∈ Li do</p>
      <p>K0 ← 1 . . . |σ | − 1
K1 ← {k |k ∈ K0 : c(k).time − c(k − 1).time &gt; x¯k }
K2 ← nk |k ∈ K1 : c(k).t dimurea−tci(okn−(1c)).t ime ≥ α o
if |K2 | , 0 then
label (c) ←
c arg maxk ∈K2 h c(k).t ime−c(skk−1).t ime−x¯k i .act
end
else
label (c) ←
c(arg maxk ∈K1 c(k).time − c(k − 1).time).act</p>
      <p>S are explored. Picking the right value of S is important, having
too short time-frames will mean too few datapoints, having too
long time-frames might fail to capture short-term concept drift. The
splitting of data is visualised in Figure 2.</p>
      <p>We then train a machine learning model on each Dj , resulting in
multiple classifiers Mj . This creates an ensemble classifier, and we
use a weighting function explained below to decide the importance
of each classifier in the ensemble. In the running example, one could
decide to train a classifier monthly ( S = 1 month), creating twelve
models per year, each consisting of all papers that were submitted
in a specific month.</p>
      <p>
        When predicting a test data point, all models Mj receive a weight
based on how old the model is. We propose a combination of two
weights. The first weight is related to exponential decay, and is
inspired by for example [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Let d be a test data point, which has a
timestamp d .time. As such, d belongs to period j = ⌊ d .tSime ⌋. We
assign weight 1 to Mj−1, 10−β to Mj−2, 10−2β to Mj−3 and so on;
with β ∈ IR+. In general we assign 10−(j−k+1)β to model Mk , for
k &lt; j (models Mk , k ≥ j are not available for testing d, since they
are learned from data that is not available at d .time).
      </p>
      <p>The second weight relates to giving extra importance to models
that are seasonally similar. Say we want to reduce the efects of
recurrent concept drift that occurs every p ∈ IR+ time. We hence add
weight τ ∈ IR+ to models that belong to d .time − p, d .time − 2p . . ..
These additions are given on top of the exponential decay weight;
though in most situations we will have that p &gt;&gt; S (we train models
far more often than the recurrent concept drift occurs), and hence
the exponential weight will be insignificant for models around time
d .time − p.2 The determination of the weights is also presented in
Algorithm 2.</p>
      <p>We combine the models as follows: for datapoints in Dj we let
each of the j existing trained prediction models (M0, M1, ..., Mj−1)
make a prediction, resulting in j vectors (p1,k , p2,k , ..., pn,k ), where
pl ,k is the probability of label l as predicted by model k (where we
have a total of n labels in all of D). We determine the combined
prediction for the test datapoint d by taking the weighted sum:
2In particular, the implementation only assigns exponential weights of at least 10−6.</p>
      <sec id="sec-6-1">
        <title>Algorithm 2: Determination of weights</title>
        <p>3
4 end</p>
        <p>Result: For the test prediction d, each model Mk gets a
weight wk
1 j ← ⌊ d .tSime ⌋
2 for k = 0 . . . j − 1 do</p>
        <p>wk ← 10−(j−k+1)β
5 t ← d .time − p
6 while t &gt; 0 do
7 k ← ⌊ St ⌋
8 wk ← wk + τ</p>
        <p>t ← t − p
9
10 end
pl =
Õ</p>
        <p>wk · pl ,k
0≤k ≤j−1
where wk is the weight of the described above. The predicted
label is then arg maxl pl . The combination of β , τ and p uniquely
identifies a weighting assignment, together with S we form a
complete concept drift adaption technique. Since this technique adapts
to gradual and recurrent drift, we will refer to it as Gradual and
Recurrent concept drift Adapting Ensemble Classifier, or GRAEC.
Finding the best combination of β , τ , p and S is part of the training
process. In this paper we will restrict to a single value for p.
5</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>SIMULATION AND CASE STUDY SETUP</title>
      <p>This section describes the simulated (Subsection 5.1) and real-world
data set used (Subsection 5.2), the experimental setup (Subsection
5.3), and additional information about how the models are trained
(Subsection 5.4). The goal of the case study is to test the efectiveness
of our proposed method. The actual predicted bottlenecks is of
interest to the company, but falls outside the scope of this paper.</p>
      <p>The code to run the full Simulation experiment including the
implementations of Algorithms 1 and 2 is available at https://github.
com/yorick-spenrath/GRAEC.
5.1</p>
    </sec>
    <sec id="sec-8">
      <title>Simulation Setup</title>
      <p>
        We simulate an event log based on the running example. We use
three features: the topic, the number of pages used in the paper,
and the number of publications the author has. The topic is used
to determine where a bottleneck will be, and the number of pages
is used to determine if there will be a bottleneck. As such, the
number of publications will not influence the simulation. The topic
is one of the following: ’Machine Learning’ (ML), ’Data Mining’
(DM), ’Process Mining’ (PM), and ’Operations Management’ (OM).
The number of pages and the number of publications are both
integers in the interval [
        <xref ref-type="bibr" rid="ref1">1,100</xref>
        ], drawn at random. We simulate
for 5 years of data, where we receive 200 papers of each topic
each month (for a total of 800 cases per month, or 48000 over the
simulation). Time is measured in months, and months consist of
30 days each, i.e. the start of all cases in the first month have a
time between 0 and 1, and a day takes 310 units of simulation time.
The start of the case coincides with the timestamp of the submit
activity, the timestamp the other events is the timestamp of the
preceeding event, increased by a random variable X . X is in days,
with X ∼ Gamma(k = 5, θ = 0.7) for a non-bottleneck activity (see
below), and X ∼ Gamma(k = 25, θ = 0.7) for a bottleneck activity.
This way we will have cases without a bottleneck to take 14 days
on average, and over 97% of the cases without bottlenecks will take
less than 21 days. Cases with bottlenecks will on average take 28
days, and at least 97% of them take more than 21 days.
      </p>
      <p>
        We add gradual concept drift by selecting an interval of size 20 in
[
        <xref ref-type="bibr" rid="ref1">1, 100</xref>
        ]; if the number of pages is in this interval, we will not add a
bottleneck to the case, if the number of pages is outside the interval
we will. The interval changes over time; at time t , the interval starts
at 80 · 6t0 and ends at 80 · 6t0 + 20. This is visualised in Figure 3.
The activity which will be the bottleneck is based on the topic and
quartile as indicated in Table 1, resulting in recurrent concept drift.
We test our approach for values of S ∈ {1, 2, 3, 4}.
5.2
      </p>
    </sec>
    <sec id="sec-9">
      <title>Real-World Dataset</title>
      <p>We analyze part of the repair process at a Dutch installation services
company. Their repair process is roughly divided into three steps:
preparation, repair, and invoicing. We have an event log of about
two and a half years, containing about 200 000 cases and about 2 000
000 events, with information on the resource and the time stamp
of when an activity ended. We are interested in the invoicing part,
which, in the light of Subsection 4.1, is considered as the full process
starting at the last repair activity3. We define the throughput of
the invoice process as the time diference between the last repair
activity occurrence and the last invoice activity occurrence. The
invoice consists of three activities and one optional activity, but the
event log contains cases that still have repair activities after the first
invoice activity, and duplicate or missing invoice activities. In the
case of repair activities after an invoice activity, the throughput is
measured from the last non-invoice activity before the first invoice
activity. We further require that the three main invoice activities
3This also justifies not taking the first event of the case into account when determining
the bottleneck
(repair assessment, data check, and invoice creation) occur in the
case, and that the case is completed (there is a final batching activity
in the process that happens once a month). Again, each case from
the dataset results in one datapoint. Cases have feature values, such
as client, responsible resource, type of defect, client location, all of
which are categorical.</p>
      <p>The time in the dataset is measured in days, we use S ∈ {7, 14, 21, 28}.
This is because the dataset is afected by working days, so taking
any other value than a multiple of 7 will result with biased subsets.
5.3</p>
    </sec>
    <sec id="sec-10">
      <title>Experimental Setup</title>
      <p>This section describes how we use events logs, to which we refer
as L, to assess diferent methods of predicting bottlenecks. Unless
otherwise stated, the explanation holds for the simulation study as
well as the real-world dataset.</p>
      <p>From L we create four diferent datasets DS , one for each value of
S. For each S, L is split into smaller subsets LSj , each of duration S, as
discussed in Section 4.3. This creates subsets DSj , where information
about cases (topic, pages, and publications for the simulation, client
information for the Real-world dataset) are related to labels, for
period j. DSj contains data about cases that started in [S · j, S · (j + 1)).
Note that cases could have diferent bottleneck labels for diferent
values of S, as a result of being compared within a diferent set in
Algorithm 1. d and α are set to 21 days and 0.2 respectively. For
the simulation dataset, this is set by the simulation design, for the
real-world dataset this is per recommendation of the company’s
domain expert. After labelling each set, the number of datapoints
in each subset is reduced using k-medoids with Jaccard distance,
to ensure that the largest class in a set is at most twice as big as
the smallest class, to reduce overfitting. This is not required for
the simulation dataset, as it is already balanced by design. For each
subset DSj a model MjS is trained using the method described in the
next subsection.</p>
      <p>We then compare three diferent approaches. The first approach
uses only the first year of data to create a single model, which
is then used for all testing. This will be referred to as the Naive
method. The second approach uses the model created from the
dataset that was most recently completed; if a datapoint of DSj is
tested, model MS</p>
      <p>j−1 is used. This approach will be referred to as the
Recent method. Finally, we use our approach, which will be referred
to as GRAEC. Each method is tested on each value of S. Note that
this also creates four configurations for the Naive method, since a
diferent S results may result in diferent labelling.</p>
      <p>We next use the full last year of data for evaluation. This part is
split in a stratified 80% - 20% train-test split. The train split is used
to optimise the parameters of GRAEC, using a complete grid search
with β ∈ {0, 1, 2}, and τ ∈ {0, 0.01, 0.1, 1, 10, 100} for each value of
S. We also set p on one year (12 in the simulation, 365.25 in the
realworld dataset), as per design and domain expert recommendation.
The test split is used to assess all approaches.
5.4</p>
    </sec>
    <sec id="sec-11">
      <title>Model Training</title>
      <p>
        The model MjS is trained by optimising the hyperparameters of
Logistic Regression, kNN, Decision Tree, and Random Forest
classifiers using a stratified 5-fold cross validation on stratified 80%
of a dataset, all using the implementation of the scikit learn library
in Python [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The remaining 20% is then used to pick the best
model out of the four optimised algorithms. Because of this, we
require that DSj has at least 10 entries in each of its labels and that
it has at least 2 diferent classes. For the real-world dataset, some
DSj ’s might fail to meet one or both of these requirements. This
means that those DSj ’s will not be used for training, and as such the
respective model MjS may not exist. GRAEC is not limited by this,
but having too few MjS ’s will have a negative impact on the
efectiveness of GRAEC. If we cannot make a prediction for a datapoint in
DSj because of a missing model MjS−1, we will use the most recent
model instead (MjS−2, or even earlier models).
6
      </p>
    </sec>
    <sec id="sec-12">
      <title>RESULTS AND DISCUSSION</title>
      <p>In this section we show the results of applying our approach to
the simulation dataset and the real-world dataset described in the
previous section. Since the proposed solution aims to adapt to
overcome the efects of concept drift and not to detect it, we are
only interested in the score, but not in the drift in the process
or derived models themselves. In Subsection 6.1 we present and
discuss the results of the simulation dataset, in Subsection 6.2 we
do the same for the real-world dataset.
6.1</p>
    </sec>
    <sec id="sec-13">
      <title>Simulation Study</title>
      <p>The results of the simulation study are presented in Figure 4 and
Table 2. Figure 4 shows the scores of each of the twelve methods
applied. The best scores for GRAEC are underlined in Table 2 to
indicate the optimal values of β and τ . Table 2 further shows the
F1 and accuracy scores for all configurations of β , τ and S.
6.1.1 The efectiveness of GRAEC. The foremost result is the
efectiveness of GRAEC. When comparing the scores for each method for
a given value of S, GRAEC outperforms the other two methods. The
best GRAEC score, for S = 1 outperforms every other option. This
is expected, the design of the simulation was to make optimal use
of the adaptions GRAEC could give for S = 1. The optimal
parameters for GRAEC indicate the adaption for both gradual and recurrent
concept drift.
6.1.2 The inefectiveness of Naive. The Naive method is completely
inefective for any value of S. This makes sense, since one year of
data contains four diferent recurrent concepts. As a result the Naive
model will not even be able to properly explain the dataset it was
trained on, let alone datasets that only contain a single recurrent
concept, and datasets on which the gradual drift has had more
efect.
6.1.3 The efect of S. We shortly discuss the efect of each value of
S below. We will refer to the way topic the related to the bottleneck
activity as a concept, and refer to this concept in quartile 1 as ‘A’,
in quartile 2 as ‘B’, 3 as ‘C’, and 4 as ‘D’. A DSj can contain one or
more concepts. For example, D01 contains only concept ‘A’, as do
D11 and D21, where as D31 has concept ‘B’. On the other hand D12 has
concepts ‘A’ and ‘B’ in an equal number of cases. We hence refer to
the concept of D12 as ‘AB’, and in line, the concept of D02 is ‘AA’, the
concept of D23 is ‘CCC’, and D14 has concept ‘BBCC’.</p>
      <p>S = 1 For the Recent method, we have that in two out of the
three periods, the previous period had the same concept. As a
result, about one third of the predictions are made based on data
from a diferent recurrent concept, decreasing the overall score of
the Recent method.</p>
      <p>S = 2 and S = 4 For S = 2 we will have 6 D2j’s per year, with
concepts ‘AA’, ‘AB’, ‘BB’, ‘CC’, ‘CD’, ‘DD’. As a result it will be dificult
to train models on the second and fifth D2j each year (with concepts
‘AB’ and ‘CD’ respectively). For the Recent method, predictions are
suboptimal because:
• in the third and sixth period each year predictions will be
made with a bad model
• in the second and fifth period each year, only half of the
cases are the same as the previous concept
• in the first and fourth period each year, none of the cases
are the same as the previous concept
As a result, the Recent method has a lower score than for S =
1. GRAEC only has the disadvantage of the poor models during
predictions in the third and sixth periods, because of the optimal
value for τ . The first, second, fourth, and fifth period belong to a
single concept (‘A’ through ‘D’ respectively), and models of previous
years are re-used. As a result, the negative efect on the score for
GRAEC is weaker. The score is still weaker than for S = 1, since
there is less adaption to gradual drift, due to the optimal value of
τ . For S=4, the arguments follow the same reasoning, though their
efect is more present (resulting in a lower score). This is because
all consecutive periods have diferent recurrent concepts.</p>
      <p>S = 3 The efect of the recurrent concept drift, and how Recent
fails to adapt to it while GRAEC does, is most evident for S = 3. Each
D3j will be based of a completely diferent concept compared to
D3j−1. This has a devastating efect on the Recent method, which can
at best tell something about short cases. The GRAEC on the other
hand scores almost as well as for S = 1. The slightly decreased
score comes from the inability to capture the gradual concept drift.
The optimal β = 1, combined with τ = 100 causes the influence of
models trained in diferent periods to be insignificant, and GRAEC
only adapting to the recurrent concept drift.
6.2</p>
    </sec>
    <sec id="sec-14">
      <title>Real-World Dataset</title>
      <p>The results of the real-world study are presented in Figure 5 and
Table 3. Figure 5 shows the overall scores of each of the twelve
methods applied, and Table 3 presents the values of β and τ that
belong to each value of S for GRAEC.</p>
      <sec id="sec-14-1">
        <title>GRAEC Recent Naive</title>
        <p>S β τ ACC F1 ACC F1 ACC F1
7 1 0 0.417 0.418 0.405 0.405 0.251 0.245
14 1 0.1 0.418 0.417 0.412 0.413 0.253 0.245
21 1 0 0.412 0.410 0.405 0.406 0.264 0.258
28 1 1 0.419 0.416 0.415 0.418 0.267 0.259
Table 3: Optimal values for S, β , and τ , and the Accuracy and
F1 scores for the real-world dataset.</p>
        <p>The key result to take away from Figure 5 is that the
improvement of GRAEC over Recent is present, albeit small. The most likely
explanation is that there is no strong efect of recurrent concept
drift present in the real-world dataset. This also matches the results
for τ ; only S = 28 has a value for τ that puts significant weight to
periodically similar models, though for S = 28, there will be two
such models at best. Please note that the method: Recent does not
exist, and that we used it as an intuitive method for comparison.
The results do not disprove the efectiveness of GRAEC. The goal
of GRAEC was to allow adaption to both gradual and recurrent
concept drift. As is evident from the results for τ , GRAEC is in essence
self-correcting the lack of recurrent concept drift.
7</p>
      </sec>
    </sec>
    <sec id="sec-15">
      <title>CONCLUSION AND FUTURE WORK</title>
      <p>The results from the Simulation study show the potential of GRAEC.
We have developed a method that can adapt to both gradual as
well as recurrent concept drift. The case study on the real-world
dataset shows improvement, albeit it small, over retraining the data
periodically. The Simulation study stresses the importance of a
correctly chosen S.</p>
      <p>Like the simulation study, the real-world scenario was analysed
in a post-mortem setting; there was data on completed cases
available during analysis. A future research direction is to apply the
simulation study as an online setting and include online
optimisation of β , τ and possibly S, to see if the adaptions to gradual and
recurrent concept drift are still optimal. In such an online setting,
care should be taken that supervised information is available with
a delay; cases that start in a certain period might complete (long)
after the period ends. In such a scenario, the most recent model
available may not be from the preceding period. This is because all
cases that started in the preceding period might not have ended
once the next period starts. As a result both the Recent and GRAEC
method will deteriorate, though we expect the Recent method to
be more afected by this.</p>
      <p>
        The advantage of the use of a weighted ensemble, as GRAEC,
is the availability of diferent machine learning models. Since we
build one model on each period, we can freely choose the type
of model each time. A diferent method, such as described in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
is to use incremental learners, i.e. we do not create new models
from scratch, but keep updating an existing model. As they show,
especially the Adaptive Hoefding Option Tree from [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] adapts well
to recurring concept drift. The next step in this study is to compare
the efectiveness of our proposed concept drift adaption to theirs,
on their artificial data, our simulation study and our real-world
data.
      </p>
      <p>The weighting functions are based on related work and expert
knowledge. In this work we have limited ourselves to one value
for p. In future work, diferent values or a superposition of values
could be created. Another update to GRAEC would be not to add
weight to the single model from the period at d .time − p, but also
(some) additional weight to the period before and after d .time − p.</p>
      <p>
        One interesting further future direction is to borrow concepts
from the online process discovery of process models [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] in order
to extract process models on the fly and apply advanced stream
mining techniques over those [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to extract the best classifiers
for each period of time in an eficient manner.
      </p>
      <p>
        Additionally, we would like to address the relatively diferent
problem of drift detection in the more complicated and realistic
scenario where interval-based events are considered [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In the case of
overlapping of such events, more relations are considered between
these temporal events appear. Also, we would like to investigate the
implementation of available online drift detection techniques for
observing deviations in streaming conformance checking applications
[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and anytime stream classification [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Manuel</given-names>
            <surname>Baena-García</surname>
          </string-name>
          , José Campo-Ávila, Raúl Fidalgo-Merino,
          <string-name>
            <surname>Albert Bifet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Ricard</given-names>
            <surname>Gavald</surname>
          </string-name>
          , and
          <string-name>
            <surname>Rafael</surname>
          </string-name>
          Morales-Bueno.
          <year>2006</year>
          .
          <article-title>Early Drift Detection Method</article-title>
          .
          <source>Fourth intl. Workshop on Knowledge Discovery from Data Streams (01</source>
          <year>2006</year>
          ),
          <fpage>77</fpage>
          -
          <lpage>86</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Albert</surname>
            <given-names>Bifet</given-names>
          </string-name>
          , Geof Holmes, Bernhard Pfahringer, Richard Kirkby, and
          <string-name>
            <given-names>Ricard</given-names>
            <surname>Gavaldà</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>New Ensemble Methods for Evolving Data Streams</article-title>
          .
          <source>In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '09)</source>
          . ACM, New York, NY, USA,
          <fpage>139</fpage>
          -
          <lpage>148</lpage>
          . https://doi.org/10. 1145/1557019.1557041
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Avrim</given-names>
            <surname>Blum</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Empirical Support for Winnow and Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain</article-title>
          .
          <source>Machine Learning</source>
          <volume>26</volume>
          ,
          <volume>1</volume>
          (
          <issue>01</issue>
          01
          <year>1997</year>
          ),
          <fpage>5</fpage>
          -
          <lpage>23</lpage>
          . https://doi.org/10.1023/A:1007335615132
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>João</given-names>
            <surname>Gama</surname>
          </string-name>
          and
          <string-name>
            <given-names>Petr</given-names>
            <surname>Kosina</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Recurrent concepts in data streams classification</article-title>
          .
          <source>Knowledge and Information Systems</source>
          <volume>40</volume>
          ,
          <issue>3</issue>
          (
          <issue>01</issue>
          <year>Sep 2014</year>
          ),
          <fpage>489</fpage>
          -
          <lpage>507</lpage>
          . https://doi.org/10.1007/s10115-013-0654-6
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>João</given-names>
            <surname>Gama</surname>
          </string-name>
          , Indre˙ Žliobaite˙,
          <string-name>
            <surname>Albert</surname>
            <given-names>Bifet</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Mykola</given-names>
            <surname>Pechenizkiy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Abdelhamid</given-names>
            <surname>Bouchachia</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>A Survey on Concept Drift Adaptation</article-title>
          .
          <source>ACM Comput. Surv</source>
          .
          <volume>46</volume>
          ,
          <issue>4</issue>
          ,
          <string-name>
            <surname>Article 44</surname>
          </string-name>
          <source>(March</source>
          <year>2014</year>
          ),
          <volume>44</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>44</lpage>
          :37 pages.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Marwan</given-names>
            <surname>Hassani</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Eficient Clustering of Big Data Streams</article-title>
          .
          <source>Ph.D. Dissertation</source>
          . RWTH Aachen University.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Marwan</given-names>
            <surname>Hassani</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Overview of Eficient Clustering Methods for HighDimensional Big Data Streams</article-title>
          . Springer International Publishing, Cham,
          <fpage>25</fpage>
          -
          <lpage>42</lpage>
          . https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -97864-
          <issue>2</issue>
          _
          <fpage>2</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hassani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Siccha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Richter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Seidl</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Eficient Process Discovery From Event Streams Using Sequential Pattern Mining</article-title>
          .
          <source>In 2015 IEEE Symposium Series on Computational Intelligence</source>
          .
          <fpage>1366</fpage>
          -
          <lpage>1373</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Paulo</given-names>
            <surname>Mauricio Gonçalves</surname>
          </string-name>
          Jr and Roberto Souto Maior de Barros.
          <year>2013</year>
          .
          <article-title>RCD: A recurring concept drift framework</article-title>
          .
          <source>Pattern Recognition Letters</source>
          <volume>34</volume>
          ,
          <issue>9</issue>
          (
          <year>2013</year>
          ),
          <fpage>1018</fpage>
          -
          <lpage>1025</lpage>
          . https://doi.org/10.1016/j.patrec.
          <year>2013</year>
          .
          <volume>02</volume>
          .005
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Philipp</surname>
            <given-names>Kranen</given-names>
          </string-name>
          , Marwan Hassani, and
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Seidl</surname>
          </string-name>
          .
          <year>2012</year>
          . BT*
          <article-title>- An Advanced Algorithm for Anytime Classification</article-title>
          .
          <source>In Scientific and Statistical Database Management - 24th International Conference, SSDBM</source>
          <year>2012</year>
          , Chania, Crete, Greece, June 25-27,
          <year>2012</year>
          . Proceedings.
          <volume>298</volume>
          -
          <fpage>315</fpage>
          . https://doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -31235-9_
          <fpage>20</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Yifeng</surname>
            <given-names>Lu</given-names>
          </string-name>
          , Marwan Hassani, and
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Seidl</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Incremental Temporal Pattern Mining Using Eficient Batch-Free Stream Clustering</article-title>
          .
          <source>In Proceedings of the 29th International Conference on Scientific and Statistical Database Management</source>
          , Chicago, IL, USA, June 27-29,
          <year>2017</year>
          .
          <volume>7</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          :
          <fpage>12</fpage>
          . https://doi.org/10.1145/3085504. 3085511
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Marco</given-names>
            <surname>Maisenbacher</surname>
          </string-name>
          and
          <string-name>
            <given-names>Matthias</given-names>
            <surname>Weidlich</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Handling Concept Drift in Predictive Process Monitoring</article-title>
          .
          <source>In 2017 IEEE International Conference on Services Computing (SCC)</source>
          , Vol.
          <volume>00</volume>
          . 1-
          <fpage>8</fpage>
          . https://doi.org/10.1109/SCC.
          <year>2017</year>
          .10
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>João</given-names>
            <surname>Bártolo Gomes ; Mohamed Medhat Gaber ; Pedro A. C. Sousa</surname>
          </string-name>
          ; Ernestina Menasalvas.
          <year>2014</year>
          .
          <article-title>Mining Recurring Concepts in a Dynamic Feature Space</article-title>
          .
          <source>IEEE Transactions on Neural Networks and Learning Systems 25, 1 (Jan</source>
          <year>2014</year>
          ),
          <fpage>95</fpage>
          -
          <lpage>110</lpage>
          . https://doi.org/10.1109/TNNLS.
          <year>2013</year>
          .2271915
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Fabian</surname>
            <given-names>Pedregosa</given-names>
          </string-name>
          , Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          , Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau,
          <string-name>
            <given-names>Matthieu</given-names>
            <surname>Brucher</surname>
          </string-name>
          , Matthieu Perrot, and
          <string-name>
            <given-names>Édouard</given-names>
            <surname>Duchesnay</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Scikit-learn: Machine Learning in Python</article-title>
          .
          <source>J. Mach. Learn. Res</source>
          .
          <volume>12</volume>
          (
          <issue>Nov</issue>
          .
          <year>2011</year>
          ),
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          . http://dl.acm.org/citation.cfm?id=
          <volume>1953048</volume>
          .
          <fpage>2078195</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Sasthakumar</given-names>
            <surname>Ramamurthy</surname>
          </string-name>
          and
          <string-name>
            <given-names>Raj</given-names>
            <surname>Bhatnagar</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Tracking recurrent concept drift in streaming data using ensemble classifiers</article-title>
          .
          <source>In Sixth International Conference on Machine Learning and Applications (ICMLA</source>
          <year>2007</year>
          ).
          <fpage>404</fpage>
          -
          <lpage>409</lpage>
          . https://doi.org/10. 1109/ICMLA.
          <year>2007</year>
          .80
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Sripirakas</surname>
            <given-names>Sakthithasan</given-names>
          </string-name>
          , Russel Pears,
          <string-name>
            <given-names>Albert</given-names>
            <surname>Bifet</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Bernhard</given-names>
            <surname>Pfahringer</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Use of ensembles of Fourier spectra in capturing recurrent concepts in data streams</article-title>
          .
          <source>In 2015 International Joint Conference on Neural Networks, IJCNN</source>
          <year>2015</year>
          , Killarney, Ireland,
          <source>July 12-17</source>
          ,
          <year>2015</year>
          . 1-
          <fpage>8</fpage>
          . https://doi.org/10.1109/IJCNN.
          <year>2015</year>
          . 7280583
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Erik</surname>
            <given-names>Scharwächter</given-names>
          </string-name>
          , Emmanuel Müller, Jonathan F. Donges, Marwan Hassani, and
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Seidl</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Detecting Change Processes in Dynamic Networks by Frequent Graph Evolution Rule Mining</article-title>
          . In ICDM'
          <volume>16</volume>
          .
          <fpage>1191</fpage>
          -
          <lpage>1196</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Arik</surname>
            <given-names>Senderovich</given-names>
          </string-name>
          , Matthias Weidlich, Avigdor Gal, and
          <string-name>
            <given-names>Avishai</given-names>
            <surname>Mandelbaum</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Queue Mining - Predicting Delays in Service Processes</article-title>
          .
          <source>In Advanced Information Systems Engineering</source>
          , Matthias Jarke, John Mylopoulos, Christoph Quix, Colette Rolland, Yannis Manolopoulos, Haralambos Mouratidis, and Jennifer Horkof (Eds.). Springer International Publishing, Cham,
          <fpage>42</fpage>
          -
          <lpage>57</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Sebastiaan</surname>
            <given-names>J. van Zelst</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alfredo Bolt</surname>
          </string-name>
          , Marwan Hassani, Boudewijn F. van
          <string-name>
            <surname>Dongen</surname>
          </string-name>
          , and
          <string-name>
            <surname>Wil M. P. van der Aalst</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Online conformance checking: relating event streams to process models using prefix-alignments</article-title>
          .
          <source>International Journal of Data Science and Analytics (27 Oct</source>
          <year>2017</year>
          ). https://doi.org/10.1007/s41060-017-0078-6
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Haixun</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wei</surname>
            <given-names>Fan</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Philip S. Yu</surname>
          </string-name>
          , and Jiawei Han.
          <year>2003</year>
          .
          <article-title>Mining Conceptdrifting Data Streams Using Ensemble Classifiers</article-title>
          .
          <source>In KDD '03. ACM</source>
          ,
          <volume>226</volume>
          -
          <fpage>235</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>