<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Applying Neural Networks for Concept Drift Detection in Financial Markets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bruno Silva</string-name>
          <email>bruno.silva@estsetubal.ips.pt</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CITI and Departamento de Informa ́tica, FCT, Universidade Nova de Lisboa</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ISEGI, Instituto Superior de Estat ́ıstica e Gest a ̃o de Informac ̧a ̃o</institution>
          ,
          <addr-line>Universi-</addr-line>
        </aff>
      </contrib-group>
      <fpage>43</fpage>
      <lpage>47</lpage>
      <abstract>
        <p>Traditional stock market analysis is based on the assumption of a stationary market behavior. The recent financial crisis was an example of the inappropriateness of such assumption, namely by detecting the presence of much higher variations than what would normally be expected by traditional models. Data stream methods present an alternative for modeling the vast amounts of data arriving each day to a financial analyst. This paper discusses the use of a framework based on an artificial neural network that continuously monitors itself and allows the implementation on a multivariate financial non-stationary model of market behavior. An initial study is performed over ten years of the Dow Jones Industrial Average index (DJI), and shows empirical evidence of concept drift in the multivariate financial statistics used to describe the index data stream.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Data streams are generated naturally within several domains.
Network monitoring, web mining, telecommunications data
management, stock-market analysis and sensor data processing are
applications that have vast amounts of data arriving continuously. In such
applications, the process may not be strictly stationary, i.e., the target
concept may change over time. Concept drift means that the concept
about which data is being collected may shift from time to time, each
time after some minimum permanence [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        In this paper we address the detection and analysis of concept drift
in financial markets by employing a methodology based on Artificial
Neural Networks (ANN). ANN are a set of biologically inspired
algorithms and well-established data mining methods popular for
technical market analysis and price predictions. We are currently
undergoing a wider research on using ANN in Ubiquitous Data Mining.
This work, in essence, is a real-world application of a mechanism to
detect concept drift while processing data streams. The motivation
for this approach in the financial field can be easily explained.
Mathematical finance has made wide use of normal distributions in stock
market analysis to maximize return rates, i.e., they assume
stationary distributions, which are easier to understand and work well most
of the times. However, this traditional approach neglects big
heavytails, i.e.,huge asset losses, in the distributions and their potential risk
evaluation [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ]. This is where the detection of drifting from this
normal behavior is of critical importance to reduce investment risk in
the presence of non-normal distribution of market events.
The main contributions of this work are: (i) a drift detection
method based on the output of Adaptive Resonance Theory (ART)
networks [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] which produce aggregations (or data synopsis in some
literature) of d-dimensional data streams. These fast aggregations
compress a, possibly, high-rate stream while maintaining the intrinsic
relations within the data. A fixed sequence of consecutive
aggregations is then analyzed to infer concept drift in the underlying
distribution – Section 2 ; (ii) an application of the previous scheme to the
stock market, namely to the Dow Jones Industrial index (DJI), using
a stream of with a chosen set of statistical and technical indicators.
The detection of concept drift is performed over an incoming stream
of these observations –Section 3.
      </p>
      <p>
        These contributions adhere to the impositions of data stream
models in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], namely: the data points can only be accessed in the order
in which they arrive; random access to data is not allowed; memory
is assumed to be small relatively to the number of data points, thus
only allowing a limited amount information to be stored. Therefore,
all of the additional indicators are computed using sliding windows,
thus only needing a small subset of data kept in memory. This is also
true for the number of aggregations needed to compute the concept
drift.
      </p>
      <p>At the end of the paper, Section 4, discussion of the results are
made together with final conclusions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>METHODOLOGY</title>
      <p>The presented methodology for drift detection comprises two
modules. The first module uses an ART network that receives the
incoming stream and produces aggregations, or data synopsis, compressing
the data and retaining the intrinsic relationships within the
distribution (Section 2.1). This module feeds a second module that takes a
fixed set of these aggregations and through simple computations
produces an output that can be used to detect concept drift.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Online Data Aggregation</title>
      <p>
        One should point out that algorithms performing on data streams are
expected to produce “only” approximated models [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], since the data
cannot be revisited to refine the generated models. The aggregation
module is responsible for the online summarization of the
incoming stream and processes the stream in blocks of size S. For each S
observations q representative prototypes of data are created, where
q S. This can be related to an incremental clustering process
that is performed by an ART network. Each prototype is included
in a tuple that stores other relevant information, such as the number
of observations described by a particular prototype and the point in
time that a particular prototype was last updated. These data
structures were popularized in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and called micro-clusters.
      </p>
      <p>Hence, we create q “weighted” prototypes of data stored in tuples
Q = {M1, ..., Mj , ..., Mq}, each containing: a prototype of data
Pj ; the number of inputs patterns Nj assigned to that prototype and
a timestamp Tj that contains the point in time that prototype was
last accessed, hence Mj = {Pj , Nj , Tj }. The prototype together
with the number of inputs assigned to it (weighting) is important to
preserve the input space density if one is interested in creating
offline models of the distribution. The timestamp allows the creation
of models from specific intervals in time.</p>
      <p>
        ART [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is a family of neural networks that develop stable
recognition categories (clusters) by self-organization in response to
arbitrary sequences of input patterns. Its fast commitment mechanism
and capability of learning at moderate speed guarantees a high
efficiency. The common algorithm used for clustering in any kind of
ART network is closely related to the k-means algorithm. Both use
single prototypes to internally represent and dynamically adapt
clusters. The k-means algorithm clusters a given set of input patterns into
k groups. The parameter k thus specifies the coarseness of the
partition. In contrast, ART uses a minimum required similarity between
patterns that are grouped within one cluster. The resulting number
k of clusters then depends on the distances (in terms of the applied
metric) between all input patterns, presented to the network during
training. This similarity parameter is called vigilance ρ. K-means is
a popular algorithm in clustering data streams, e.g., [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], but suffers
from the problem that the initial k clusters have to be set either
randomly or through other methods. This has a strong impact on the
quality of the clustering process. On the other hand, ART networks
do not suffer from this problem.
      </p>
      <p>
        More formally, a data stream is a sequence of data items
(observations) x1, ..., xi, ..., xn such that the items are read once in
increasing order of the indexes i. If each observation contains a
set of d-dimensional features, then a data stream is a sequence of
X1d, ..., Xid, ..., Xnd vectors. We employ an ART2-A [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] network
specially geared towards fast one-shot training, with an important
modification given our goals: constrain the network on a maximum of q
prototypes. It shares the basic processing of all ART networks, which
is based on competitive learning. ART requires the same input
pattern size for all patterns, i.e., the dimension d of the input space where
the clusters regions shall be placed. Starting with an empty set of
prototypes P1d, ..., Pjd, ..., Pqd each input pattern Xid is compared to the
j stored prototypes in a search stage, in a winner-takes-all fashion.
If the degree of similarity between current input pattern and best
fitting prototype WJ is at least as high as vigilance ρ, this prototype is
chosen to represent the micro-cluster containing the input. Similarity
between the input pattern i and a prototype j is given by Equation 1,
where the distance is subtracted from one to get SXi,Pj = 1 if input
and prototype are identical. The distance is normalized with the
dimension d of an input vector. This keeps measurements of similarity
independent of the number of features.
      </p>
      <p>
        The degree of similarity is limited to the range [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]. If similarity
between the input pattern and the best matching prototype does not
fit into the vigilance interval [ρ, 1], i.e., SXi,Pj &lt; ρ, a new
microcluster has to be created, where the current input is used as the
prototype initialization. Otherwise, if one of the previously committed
prototypes (micro-clusters) matches the input pattern well enough, it
is adapted by shifting the prototype’s values towards the values of the
input by the update rule in Equation 2.
      </p>
      <p>
        The constant learning rate η ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] is chosen to prevent
prototype PJ from moving too fast and therefore destabilizing the learning
process. However, given our goals, i.e., to perform an adaptive vector
quantization, we define η dynamically in such a way that the mean
quantization error of inputs represented by a prototype is minimized.
Equation 3 establishes the dynamic value of η, where NJ is the
current number of assigned input patterns for prototype J . This way, it
is expected that the prototypes converge to the mean of the assigned
input patterns.
      </p>
      <p>η = NJ (3)</p>
      <p>NJ + 1</p>
      <p>
        This does not guarantee the convergence to local minimum,
however, according to the adaptive vector quantization (AVQ)
convergence theorem [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], AVQ can be viewed as a way to learn prototype
vector patterns of real numbers; it can guarantee that average
synaptic vectors converge to centroids exponentially quickly.
      </p>
      <p>Another needed modification arises from the fact that ART
networks, by design, form as much prototypes as needed based on the
vigilance value. At the extremes, ρ = 1 causes each unique input to
be encoded by a separate prototype, whereas ρ = 0 causes all inputs
to be represented by a single prototype. Therefore, for decreasing
values of ρ coarser prototypes are formed. However, to achieve
exactly q prototypes solely on a manually tuned value of ρ is a very
hard task, mainly due to the input space density, that can change over
time, and is also different from application to application.</p>
      <p>To overcome this, we make a modification to the ART2-A
algorithm to impose a restriction on creating a maximum of q
prototypes and dynamically adjusting the vigilance parameter. We start
with ρ = 1 so that a new micro-cluster is assigned to each arriving
input vector. After learning an input vector, a verification is made to
check if q = j + 1, where j is the current number of stored
microclusters. If this condition is met, then to keep only q we need to merge
the nearest pair of micro-clusters. Let Tr,s = min{kPr − Psk2 :
r, s = 1, ..., q, r 6= s} be the minimum Euclidean distance between
prototypes stored in micro-clusters Mr and Ms. We merge the two
micro-clusters into one:</p>
      <p>Mmerge = {Pmerge, Nr + Ns, max{Tr, Ts}}
with the new prototype being a “weighted” average between the
previous two:.</p>
      <p>Pmerge =</p>
      <p>Nr
Nr + Ns</p>
      <p>Pr +</p>
      <p>Ns
Nr + Ns</p>
      <p>Ps</p>
      <p>With d-dimensional input vectors, Equation 1 defines a
hypersphere around any stored prototype with radius r = (1 − ρ) · √d.
By solving this equation in respect to ρ, we update the vigilance
parameter dynamically with Equation 6, hence ρ(new) &lt; ρ(old) and the
radius, consequently, increases.</p>
      <p>ρ(new) = 1 − T√r,s (6)
d</p>
      <p>Our experimental results show that this approach is effective in
providing a summarization of the underlying distribution within the
data streams. The inclusion of these results is out of the scope of this
paper.</p>
      <p>We must point out that the aggregation module produces more
information that it is actually necessary for the concept drift detection,
(4)
(5)
namely the weighting of the prototypes and the timestamps. This
module is an integrating part of a larger framework that also
generates offline models of the incoming stream for specific points in
time.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Detecting Concept Drift</title>
      <p>
        Our method assumes that if the underlying distribution is stationary
that the error-rate of the learning algorithm will decrease as the
number of samples increases [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Hence, we compute the quantization
error at each aggregation phase of the ART network and track the
changes of these errors over time.
      </p>
      <p>We use a queue B of b aggregation results, such that B =
{Ql, Ql−1, ..., Ql−b+1}, where Ql is the last aggregation obtained.
For each Ql that arrives, we compute the average Euclidean distance
between each prototype Pi in Ql and the closest one in Bl−1 =
{Ql−1, ..., Ql−b+1}. Equation 7 formalizes this Average
Quantization Error (AQE) computation for the lth aggregation, where k · k2
is the Euclidean distance and q is the number of prototypes in Ql by
definition. This computes the error of the last aggregation in
“quantifying” previous aggregations in a particular point in time.</p>
      <p>q
AQE(l) = 1 X
q
i=1
min(
k Pi − Pj k2,
∀Pj ∈ Bl−1
) (7)</p>
      <p>By repeating this procedure over time, we obtain a series of errors
that stabilizes and/or decreases when the underlying distribution is
stationary and presents increases on this curve when the underlying
distribution is changing, i.e., concept drift is occurring. This series of
errors is the drift curve.</p>
      <p>Larger values of b are used to detect abrupt changes in the
underlying distribution, whereas to detect gradual concept drift a lower
value should be adopted. We exemplify the automatic concept drift
detection in this drift curve using a moving average in Section 3.2.
3</p>
    </sec>
    <sec id="sec-5">
      <title>APPLICATION TO DOW JONES</title>
    </sec>
    <sec id="sec-6">
      <title>INDUSTRIAL</title>
      <p>
        We present an application of the previous methodology to the stock
market, namely to the Dow Jones Industrial index (DJI). Instead of
using daily prices of several stocks that compose the DJI, our
approach to this problem uses the DJI daily index values themselves
and other computed statistical and technical indicators, which are
explained in Section 3.1. We make extensive use of moving averages,
as they reduce the short term volatility of time series and retain
information from previous market events; another statistical indicator
is the Hurst index [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], defined as a function to uncover changes in the
direction of the trend of a set of values in time. We believe that these
indicators, together with the index value, can provide a multi-variate
insight to hidden and subtle changes in the normality of financial
events and be used to assess the risk of investment at any point in
time, thus lowering exposure to risk.
      </p>
      <p>This application makes use of data gathered from the period
comprised between the 1st of January of 2001 to the 31st of December
of 2011, in a total of 2767 observations.
3.1</p>
    </sec>
    <sec id="sec-7">
      <title>Variable Selection and Generated Data Stream</title>
      <p>The data gathered was composed by a set of technical variables
including different index values for one trading day like Open, Close,
High and Low values. From these we chose the lowest daily price
(PX LOW) because it provides better insight to the risk of a fall.
Other available technical indicator was the trading Volume.</p>
      <p>In terms of statistical indicators, we initially considered a large
number of them, like moving averages (MA) from 20 to 180 trading
days, relative numbers, i.e., the DJI index value divided by moving
averages (AVG), price fluctuation and Hurst index. However, it was
important to reduce the number of variables because redundant
variables can reduce the model efficiency. For this purpose we performed
an analysis with the VARCLUS procedure (SAS/STAT).The
VARCLUS procedure can be used as a variable-reduction method. The
VARCLUS procedure divides a set of numeric variables into
disjoint or hierarchical clusters through principal component analysis.
All variables were treated as equally important. VARCLUS created
an output was used by the TREE procedure to draw a tree diagram
of hierarchical clusters (SAS/STAT R 9.1 User’s Guide p. 4797). The
tree diagram is depicted in Figure 1. We can observe in the
hierarchical clustering that the price variables and moving averages are
correlated, so it was only chosen PX LOW of Cluster 1. In Cluster
2 all variables were selected because, although they are correlated,
they measure different characteristics. In the case of relative
numbers different averages were selected because it is interesting to see
the differences between the analysis of short, medium and long term.
Finally in Cluster 3 and Cluster 4 just Hurst index and price
fluctuation appeared, because they are not correlated with any other
variable, so these variables were included in the final data set.</p>
      <p>Hence, the complete set of features in the data stream is the
following:</p>
      <sec id="sec-7-1">
        <title>PX LOW: Minimum daily price;</title>
        <p>PX VOLUME: Volume of daily business;
IX HURST: Hurst index computed for 30 days;
IX CAP FLUTUATION: PX LOW(t)/ PX LOW (t 1). This
variable represents price fluctuation for one day interval;
AVG 20: PX LOW / 20-day moving average. This variable
represents the relative number of current price divided by the 20-day
Moving Average. This shows whether the current price is cheap,
average value, expensive or really expensive. The same applies to
the next indicators but within other time frames;</p>
      </sec>
      <sec id="sec-7-2">
        <title>AVG 30: PX LOW / 30-day moving average;</title>
        <p>AVG 60: PX LOW / 60-day moving average;
AVG 100: PX LOW / 100-day moving average;
AVG 120: PX LOW / 120-day moving average;
AVG 180: PX LOW / 180-day moving average;</p>
        <p>
          The dataset is depicted in Figure 2, where the behavior of all
variables can be seem. This data is our data stream. The stream comprises
10 features, e.g., a 10-dimensional stream.
The methodology presented in Section 2 was applied to the above
data. It is converted into a data stream by taking the data input
order as the order of the streaming. All features were previously
normalized to the range [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] so they have equal importance in the
Euclidean distances used to process them. The largest moving
average indicator computed was over 180 days. Therefore, only after
the 180th observation can the stream be presented to the algorithm.
        </p>
        <p>However, since we are dealing with financial time-series, it is
important to retain the time dependency of the sequence of
observations. Therefore, in this application, we use a sliding-window of 100
trading days, i.e., approximately a trimester of trading as input to
each aggregation phase. Note that a year of trading has
approximately 260 days. This means that the stream is processed in blocks
of 100 observations that are kept in a queue. For each new
observation that arrives the oldest in the queue is discarded and the new one
added. The parameterization used was the following:
Block size: S = 100;
Number of micro-clusters: q = 10;
Concept drift buffer size: b = 15</p>
        <p>The result of the procedure of Section 2.2 applied to the data
stream is presented in Figure 3. Each point of the series corresponds
to the error of the model for a particular trading day, thus
providing possible indications of drifting. It can be seen an overall shape
of a curve that indicates the drift over time. Since this drift is being
computed for every trading day, the “noise” around the curve is
considered normal since it is affected by the daily volatility of the index
values.</p>
        <p>To obtain a “clean” curve we apply a convolution filter along this
drift series of the same size as b, i.e., 15 days. An alarm scheme is
created through the generation of an empirical moving average of 60
days performed over the drift series. The cleaned drift curve and its
moving average are depicted in Figure 4a).</p>
        <p>We then compare the differences between the drift series and its
moving average obtaining a line that oscillates around zero. We call
this line the drift trend, shown in Figure 4b). Whenever the drift
series has values lower than its moving average we are in a descending
trend. This is reflected in the drift trend with values lower than zero.
Whenever the moving average is crossed by the drift series it signals
a shift in the trend and the drift trend crosses zero. This reasoning to
detect trends is also very popular in financial technical analysis. In
this context, the 60 trading days moving average reflects the intuitive
notion of long-term “decreasing” or “increasing” trend of the drift.</p>
        <p>All plots in Figure 4 are aligned in time for easy comparison.
Figure 4c) shows the time series of PX LOW, i.e., the DJI index, that we
compare to the detection of drift performed.
4</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>DISCUSSION AND CONCLUSIONS</title>
      <p>Based on experiments we found that a tenth of prototypes relative
to the number of observations are sufficient in most applications to
represent them adequately, hence, q = 10. Usage of higher values
of q did not improve the results with the additional problem of
increased computational time. Additionally, since we are both
interested in abrupt and gradual drift detection we used a moderate sized
buffer of aggregations (b = 15) to compute the series of
quantification errors. During our experiments we found that this value was
appropriate for the established goals.</p>
      <p>By inspecting Figure 4 and comparing the drift trend with the
behavior of the DJI index we can make two important observations: (i)
the drift trend crossed zero before the market crash of 2008 (around
day 1500). It appears that the concept that was being learned changed
sometime before the crash occurred. (ii) it may be reasonable to
assume that in periods of normality the long-term tendency of these
indexes is upwards. One of such periods is after the recovery of the
2002 market crash, i.e., the dot-com bubble, until the other crash of
2008 (approximately between days 300 and 1300). During such
period it is interesting to see that the drift trend was always below zero.</p>
      <p>In the present work we have shown a methodology to detect
concept drift in financial markets. We intend to apply this same
methodology to intra-day trading as soon as it is possible, thus reinforcing
the need to efficient processing of large volumes of data. The
proposed methodology applied over a data stream comprised of
carefully chosen technical and statistical indicators seems promising in
detecting changes in markets events ahead of time that can reduce
the exposure to risk.</p>
      <p>
        The characterization of the drifts, i.e., trying to understand what is
really changing in the markets through inspection of hidden changes
in the indicators is reserved for future work. Work is under way in
this subject and we are using Self-Organizing Maps [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] to produce
different mappings of the variables for particular segments in time,
namely ones where the market seems to exhibit a stable behavior and
comparing with others where it does not. This segments are obtained
by segmenting time with the concept drift detection. As another
immediate future work we will apply this methodology to other indexes
and perform the same study.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.C.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          , J. Han,
          <string-name>
            <given-names>J</given-names>
            .
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.S.</given-names>
            <surname>Yu</surname>
          </string-name>
          , '
          <article-title>A framework for clustering evolving data streams'</article-title>
          ,
          <source>in Proceedings of the 29th International Conference on Very Large Databases</source>
          , volume
          <volume>29</volume>
          , pp.
          <fpage>81</fpage>
          -
          <lpage>92</lpage>
          . Morgan Kaufmann Publishers Inc., (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bart</surname>
          </string-name>
          ,
          <article-title>Neural networks and fuzzy systems: A dynamical systems approach to machine intelligence</article-title>
          ,
          <source>Prentice-Hall of India</source>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.A.</given-names>
            <surname>Carpenter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Grossberg</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.B.</given-names>
            <surname>Rosen</surname>
          </string-name>
          , 'Art 2
          <article-title>-a: An adaptive resonance algorithm for rapid category learning and recognition'</article-title>
          ,
          <source>Neural networks</source>
          ,
          <volume>4</volume>
          (
          <issue>4</issue>
          ),
          <fpage>493</fpage>
          -
          <lpage>504</lpage>
          , (
          <year>1991</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Farnstrom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Elkan</surname>
          </string-name>
          , '
          <article-title>Scalability for clustering algorithms revisited'</article-title>
          ,
          <source>in ACM SIGKDD Explorations Newsletter</source>
          , volume
          <volume>2</volume>
          , pp.
          <fpage>51</fpage>
          -
          <lpage>57</lpage>
          . ACM, (
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Medas</surname>
          </string-name>
          , G. Castillo, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Rodrigues</surname>
          </string-name>
          , '
          <article-title>Learning with drift detection'</article-title>
          ,
          <source>Advances in Artificial Intelligence-SBIA</source>
          <year>2004</year>
          ,
          <volume>66</volume>
          -
          <fpage>112</fpage>
          , (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Joao</given-names>
            <surname>Gama</surname>
          </string-name>
          ,
          <article-title>Knowledge discovery from data streams</article-title>
          ,
          <source>Chapman &amp; Hall/CRC Data Mining and Knowledge Discovery Series</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Grossberg</surname>
          </string-name>
          , '
          <article-title>Adaptive pattern classification and universal recording: II. Feedback, expectation</article-title>
          , olfaction, illusions',
          <source>Biological Cybernetics</source>
          ,
          <volume>23</volume>
          , (
          <year>1976</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Monika</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Henzinger</surname>
          </string-name>
          , Prabhakar Raghavan, and Sridhar Rajagopalan, '
          <article-title>External memory algorithms'</article-title>
          ,
          <source>chapter Computing on data streams</source>
          ,
          <fpage>107</fpage>
          -
          <lpage>118</lpage>
          , American Mathematical Society, Boston, MA, USA, (
          <year>1999</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.E.</given-names>
            <surname>Hurst</surname>
          </string-name>
          , RP Black,
          <article-title>and YM Simaika, Long-term storage: An experimental study</article-title>
          ,
          <source>Constable</source>
          ,
          <year>1965</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kohonen</surname>
          </string-name>
          , '
          <article-title>Self-organized formation of topologically correct feature maps'</article-title>
          ,
          <source>Biological cybernetics</source>
          ,
          <volume>43</volume>
          (
          <issue>1</issue>
          ),
          <fpage>59</fpage>
          -
          <lpage>69</lpage>
          , (
          <year>1982</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B.</given-names>
            <surname>Mandelbrot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.L.</given-names>
            <surname>Hudson</surname>
          </string-name>
          , and E. Grunwald, '
          <article-title>The (mis) behaviour of markets'</article-title>
          ,
          <source>The Mathematical Intelligencer</source>
          ,
          <volume>27</volume>
          (
          <issue>3</issue>
          ),
          <fpage>77</fpage>
          -
          <lpage>79</lpage>
          , (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>N.N.</given-names>
            <surname>Taleb</surname>
          </string-name>
          , '
          <article-title>Errors, robustness, and the fourth quadrant'</article-title>
          ,
          <source>International Journal of Forecasting</source>
          ,
          <volume>25</volume>
          (
          <issue>4</issue>
          ),
          <fpage>744</fpage>
          -
          <lpage>759</lpage>
          , (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>