=Paper=
{{Paper
|id=Vol-2400/paper-12
|storemode=property
|title=Evolution of Financial Time Series Clusters
|pdfUrl=https://ceur-ws.org/Vol-2400/paper-12.pdf
|volume=Vol-2400
|authors=Davide Azzalini,Fabio Azzalini,Mirjana Mazuran,Letizia Tanca
|dblpUrl=https://dblp.org/rec/conf/sebd/AzzaliniAMT19
}}
==Evolution of Financial Time Series Clusters==
<pdf width="1500px">https://ceur-ws.org/Vol-2400/paper-12.pdf</pdf>
<pre>
      Evolution of Financial Time Series Clusters
                               (Discussion Paper)

    Davide Azzalini1 , Fabio Azzalini1 , Mirjana Mazuran2 , and Letizia Tanca1
               1
                   Politecnico di Milano, Italy {name.surname}@polimi.it
                        2
                          Inria, France mirjana.mazuran@inria.fr

        Abstract. Nowadays, a huge amount of applications exist that natively
        adopt a data-streaming model to represent highly dynamic phenomena.
        A challenging application is constituted by data from the stock market,
        where the stock prices are naturally modeled as data streams that fluc-
        tuate very much and remain meaningful only for short amounts of time.
        In this paper we present a technique to track evolving clusters of finan-
        cial time series, with the aim of constructing reliable models for this
        highly dynamic application. In our technique the clustering over a set of
        time series is iterated over time through sliding windows and, at each
        iteration, the differences between the current clustering and the previ-
        ous one are studied to determine those changes that are “significant”
        with respect to the application. For example, in the financial domain, if
        a company that has belonged to the same cluster for a certain amount
        of time moves to another cluster, this may be a signal of a significant
        change in its economical or financial situation.


1     Introduction
A data stream is an ordered sequence of data elements that are made available
over time, and is potentially unlimited. Very interesting data stream applications
can be found in finance, where the life of a company evolves with time and, at
each time instant, is characterized by different streams of data reporting its stock
price, exchange volumes, balance data, and so on. Companies may even appear
and disappear from the market, or change their financial behavior.
    Several techniques for analyzing streaming data have been studied: many rely
on the use of a sliding window, i.e. a model of the data based on a window that
moves over time, continuously taking into account the newest data. Also, in the
financial context, a variety of clustering methods have been proposed to cluster
companies that have the same behavior, w.r.t. prices or other measures.
    However, when stock prices are concerned, clusterings done on different win-
dows – even though these might overlap greatly – tend to change a lot, due to
the high dynamics of the market and to the scarce permanence of relationships
between companies over time. In such a scenario the model tends to fluctuate
and remains meaningful only for short amounts of time.

    Copyright c 2019 for the individual papers by the papers’ authors. Copying per-
    mitted for private and academic purposes. This volume is published and copyrighted
    by its editors. SEBD 2019, June 16-19, 2019, Castiglione della Pescaia, Italy.
    We present PETRA, a Progressive clustEring TRAcker where the clustering
analysis is iterated over time in order to constantly provide a reliable, stable,
up-to-date and consistent model. PETRA also focuses on identifying “signifi-
cant” changes in the clusterings constructed over time by means of parameters
that allow the system to adapt to specific application domains. When applying
PETRA to the case of stock prices, we can capture the intrinsic dynamics of the
stocks’ (and companies’) relationships, through the recognition of points in time
where their co-movements change, that is, we identify changes in the behavior of
a company relatively to the behavior of the other companies of the same cluster.
A company changing cluster is a signal to present to investors as well as a build-
ing block for more accurate prediction tasks. For example, well-known companies
communicate more than small ones, however, small companies are often more
interesting for investors since they can bring a higher profit. After clustering we
can use the rich information available for the well-known companies in a cluster
to obtain some guidelines for the small ones in the same cluster.
    PETRA is a part of a broader project: Mercurio [6], whose aim is to support
financial investors in their decision process. In particular, PETRA helps investors
understand how companies influence each other, for a deeper understanding of
the market structure. The system has been designed and optimized for the finan-
cial application domain; however, it can be applied to any set of time series with
rapidly evolving behaviors, provided that the clustering algorithm, the similarity
metrics and some other parameters are appropriately tuned. Figure 1 depicts an
example of stock-based companies’ clustering.
    Most previous works on
clustering stock-market firms
showed how custom-made clus-
ters outperform traditional
industrial grouping criteria [13,
14]. Our work differs from
them in that we move from
a static to a progressive clus-
tering approach, which allows
us to overcome The big prob-
lem found when dealing with
financial information: stock
markets move fast, financial Fig. 1. A cluster of similarly-behaving companies
insights expire and get old
very quickly. Our progressive clustering provides fresh suggestions daily.
    Cluster Tracking in evolving data streams is partially covered by the scientific
community. A lot has been done on visually tracking groups of objects [7], e.g., of
moving people [8] or facial features [9], and on wireless channel modeling to track
multi-path components [10]. Our work is different because we are not interested
in tracking the shift in position of a cluster at consecutive time instants; rather,
we focus on tracking the membership of each object.
     General-purpose techniques [11, 12] exist that face the cluster-tracking prob-
lem in terms of outliers, i.e. the clustering is considered valid until the number of
outliers crosses a predefined threshold, then re-clustering is needed. In PETRA
instead, we re-cluster at each iteration since we are not just interested in having
a consistent model but also in detecting as soon as possible the entities that
change cluster, and promptly report it.
     MONIC [15] also models and monitors cluster transitions. However, PETRA
tracks the movements of the companies between clusters in order to generate
signals about each company, while MONIC is interested in the transition of the
entities between clusters with the scope of drawing conclusion on the lifetime and
stability of the clusters. Moreover, MONIC detects the transitions by keeping
track of the entity trajectories on spatiotemporal coordinates, while PETRA
detects movements by analyzing the contents of the clusters.
     Our objective is to dynamically track the evolution of a set of entities E =
(e1 , e2 , ..., ej , ..., eN ) by iteratively clustering their data streams over a sliding
window Wi = [ti−w , ti ] of width w. Figure 1 depicts one such cluster and shows
how related companies appear inside the cluster. At each time ti , the sliding
window shifts one time-point ahead, and its content is clustered in K clusters
C i = (C1i , C2i , ..., CK  i
                              ). Then, C i is compared with C i−1 by associating each new
cluster with its ”previous version”. At the end of the procedure PETRA identifies
those entities that have made a ”persistent” change of cluster3 . In particular, our
method is aimed at clustering time series that: (i) represent entities belonging to
the same domain (e.g. stock-market prices of different companies) and are enough
to be grouped into clusters; (ii) are sampled at regular intervals, with the same
sampling frequency; (iii) contain a number of time points sufficient to allow a
sliding window to move along a significant number of positions4 . Summarizing,
our main contributions are: (i) the identification of an approach to generate
clusters that are well-balanced w.r.t. the problem under consideration; (ii) the
design of a method to trace clusters’ behaviors and spot entities’ changes of
membership; (iii) the definition of a set of rules needed to identify relevant entity
switches. Section 2 provides the details about our approach while in Section 3
we show the experimental results.

2    The PETRA Method
We first discuss our design choices and some domain-dependent parameters that
need to be set in PETRA, then concentrate on the clustering procedure, and
finally show how we track the clusters along time.
    Time series clustering algorithms rely on similarity measures for quantifying
how homogeneous two observations are. PETRA uses similarity in time, which
is based on comparing different time series at the same time points, as it is the
most suitable one to identify sudden changes in the streams’ behaviors.
3
  Note that some changes might be less interesting because being temporary, in the
  sense that one stream might return to a previous cluster after some time.
4
  Note that the term significant is strongly domain-dependent and should be judged
  by the designer.
                                   4        Davide Azzalini, Fabio Azzalinir, Mirjana Mazuran, and Letizia Tanca

                                   finally show how we track the clusters along time. The description follows the
                                   steps of Algorithm 1.


                                       K     desired number of clusters;
                                       i   0;
    The similarity mea-                repeat
                                          CORR = computeCorrelationMatrix(Wi )
sures that can be adopted                 D = computeDistanceMatrix(CORR)
for clustering time se-                   C i =clusterData(D, K)
ries [1] range from sim-                  forall change 2 trackClusters(C i , C i 1 ) do
ple ones such as Euclidean                     if change is relevant then
                                                   trigger warning;
distance to more complex                       end
ones like Dynamic Time                    end
Warping (DTW). DTW is                     i     i + 1;
extremely powerful, as it              until new data keeps coming;
finds a relationship be-                           Algorithm 1: The PETRA procedure
tween two time series S1 and S2 not only when S2 is obtained by shifting S1
in time, but also when S2 is a contraction or an expansion of S1 ; e.g. DTW
associates the last week of a company with the last month’s behavior of another
                              2.1 Design choices and parametrizations
company. However, this feature is not needed in the financial case, where we need
to compare companies on the   Time   series
                                  same       clustering
                                          time          algorithms
                                                 interval           rely same
                                                           and at the    on similarity  measures for quantifying
                                                                                time point,
thus, simpler approaches, suchhow as
                                   homogeneous     two observations
                                       the Euclidean      metrics andare.metrics
                                                                          PETRA based
                                                                                    uses similarity
                                                                                            on      in time, which
                              is based  on
correlation, are sufficiently expressive.   comparing   di↵erent time series at the same   time points, as it is the
                                   most suitable one to identify sudden changes in the streams’ behaviors.
    As for the clustering method, ThePETRA
                                         similarityemploys
                                                      measureshierarchical
                                                                  that can be clustering,
                                                                                  adopted for since
                                                                                                 clustering time series [1]
it does not require to set therange
                                numberfromofsimple
                                               clusters
                                                      onesa-priori,    and at thedistance
                                                             such as Euclidean         same time     is complex ones like
                                                                                               to more
                              Dynamic
well suited for easy visualization.        Timethe
                                       Using      Warping     (DTW).
                                                      Euclidean          DTW is
                                                                     distance    andextremely    powerful, as it finds a re-
                                                                                        hierarchical
                              lationship
clustering allows us to use Ward’s          betweenfor
                                         method       two   time series
                                                         linkage          S1 and as
                                                                    [2], which,     S2 we
                                                                                        not will
                                                                                             only see,
                                                                                                   when S2 is obtained by
                              shifting S1 in time, but also when S2 is a contraction or an expansion of S1 ;
provides a big advantage in the tracking phase. At each iteration, the Pearson
                              e.g. DTW associates the last week of a company with the last month’s behavior
Correlation corrij [16] between    each pair
                              of another         of timeHowever,
                                            company.       series i this
                                                                      andfeature
                                                                            j is computed       and ain the financial case,
                                                                                     is not needed
global similarity matrix CORR whereis we
                                      obtained.     Then, tocompanies
                                           need to compare       obtain a on  suitable
                                                                                 the sameinput
                                                                                            timefor   a
                                                                                                  interval  and at the same
clustering algorithm, a transformation       fromsimpler
                              time point, thus,     correlation     coefficients
                                                              approaches,    such astothe
                                                                                        distances
                                                                                           Euclidean is metrics and metrics
needed. According to Gowerbased       on correlation,ifare
                               and Legendre[3],               sufficiently
                                                           CORR             expressive.
                                                                      is a positive
                                                                             p         semidefinite
similarity matrix, the dissimilarity matrix D where dij = (1employs
                                  As   for the  clustering   method,    PETRA        − corrijhierarchical
                                                                                                ) is a      clustering, since
                              it does   not  require  to  set  the number
matrix of euclidean distances. Finally, the width w of the sliding window, andof clusters  a-priori, and  at  the same time
                              is well suited for easy visualization. Using a Euclidean distance and hierarchical
the number of desired clusters      K, are left as parameters in PETRA since the
                              clustering allows us to use Ward’s method for linkage [2], which, as we will see,
best choice strictly depends provides
                               upon the      application
                                          a big  advantagegoal;in theintracking
                                                                         Sectionphase.
                                                                                    3 we Atpresent    a
                                                                                               each iteration,  the Pearson
possible solution to performCorrelation
                                an educated  corrguess;     we chose
                                                  ij [18] between    eacha pair
                                                                            window
                                                                                 of timeofseries
                                                                                           105 idays
                                                                                                   and j is computed and a
and a number of clusters equal     tosimilarity
                              global    4.         matrix CORR is obtained. Then, to obtain a suitable input for a
                                   clustering algorithm, a transformation from correlation coefficients to distances is
    At each new iteration, the distance matrix D is computed for the portions
                             needed. According to Gower and Legendre[3], if CORR is a positive
                                                                                       p       semidefinite
of time series inside the window.   Then,
                             similarity   the hierarchical
                                        matrix,               clustering
                                                the dissimilarity        using
                                                                  matrix D     Ward’s
                                                                           where dij = (1 corrij ) is a
linkage method is performed, a hierarchy of clusters – represented by means of a
dendrogram like in Figure 2 – is obtained, and the dendrogram is cut in order to
obtain K clusters. Ward’s method promotes the merges of small clusters, which
is crucial to the tracking phase since well-balanced clusters are easier to track.
    However, depending on the level of the cut, we obtain different numbers of
clusters, and a constant-height cut is not suitable in this case: cutting a dendro-
gram to obtain a fixed number of clusters irrespectively of its general structure is
likely to result in highly unstable clusters. Langfelder at al. [4] solved the prob-
lem of sub-optimal performances on complicated dendrograms presenting the
Dynamic Tree Cut (DTC) algorithm, which provides a dynamic branch-cutting
method for detecting clusters in a dendrogram, depending on their shape. Like in
the standard dendrogram creation procedure, clusters are merged in a bottom-
up fashion, but this merging depends on a set of parametric criteria such as the
minimum size of each cluster and the minimum gap between the joining heights.
By using DTC we can obtain much more balanced clusters, both in their size and
composition, avoiding the downsides of the standard approach and facilitating
the tracking phase. Figure 2 shows how, although both cuts identify three clus-
ters, those obtained with the fixed-height cut are highly imbalanced (i.e. there
are two clusters of just one element), while DTC cut delivers very balanced
clusters.
      The goal of the tracking phase
is to observe how clusters evolve
and how the clustered streams
                                         Fixed­height cut
change their membership. To Dynamic tree cut
track cluster dynamics, we have
to recognize the same clusters at                         e1  e2    e3    e4   e5   e5      e6
different times without relying on
any external reference. We ad- Fig. 2. Fixed-height versus Dynamic Tree Cut
dress this problem by using the
Jaccard similarity index [16] and Linear Programming (LP). Given two clus-
terings C i and C i−1 obtained from the series over two consecutive windows Wi
and Wi−1 , we compute the Jaccard score of each pair of clusters in the two
clusterings. The pairs with highest Jaccard score are the most similar clusters,
likely representing the same cluster at consecutive time points. Of course, once a
cluster is ascribed to one predecessor it cannot be ascribed to any other one: each
cluster is linked to one and only one predecessor. This can easily be formalized in
LP. Given two clusterings C i = (C1i , C2i , ..., Chi , ..., CK
                                                              i
                                                                 ) and C i−1 = (C1i−1 , C2i−1 ,
       i−1       i−1
..., Cj , ..., CK ), we define the square matrix M of dimension K as the ma-
trix of the Jaccard scores of all the possible pairs (Chi , Cji−1 ) of clusters. Then
we define the square matrix X as a decision variable matrix of dimension K,
such that xhj is 1 if Jac(Chi , Cji−1 ) in M is chosen by the optimization process,
0 otherwise. The LP model aims to select from M the K values having high-
est sum, with the strong constraint that each selected value can share neither
the row nor the column with the other selected values. This is equivalent to
selecting the K cluster pairs with the highest Jaccard scores, without selecting
a cluster more than once. The objective function is defined as maximize M ◦ X
(where
P         ◦ represents the element-wise
                             P            matrix product), subject to the constraints:
    j X hj =  1 ∀h ∈ [1, K],    X
                               h hj  =  1  ∀j ∈ [1, K], Xhj ∈ N ∀h, j ∈ [1, K].
      At each new iteration PETRA moves its analysis forward of one time point,
performs a new clustering and links each new cluster to its ”previous version”.
By doing so, we detect when an entity moves to another cluster. Three types of
change are possible:
 – exit: a company exits the Reference Cluster, moving into a different cluster;
 – re-entry: an entity re-enters the Reference Cluster;
 – change outside the reference cluster : an entity moves between two ”non-
   Reference Clusters”.
Note that the Reference Clusters strictly depend on the domain under assess-
ment; in our use case the reference clusters turn out to be the industrial sectors.
Reference clusters are re-computed every time we move the time window, and
are assigned the industrial sector of the majority of the companies inside it.
Thus, a cluster that contains mostly banks will be assigned the label “Bank”,
one that contains mostly automotive companies will be assigned “Automotive”,
and so on. For example, the industrial sector of Unicredit, a famous Italian bank,
is “Bank”, thus its reference cluster is the one that contains mostly banks. After
an iteration, if most of the companies inside Unicredit’s cluster are banks, we
say that Unicredit is in its reference cluster, and outside its reference cluster
otherwise. This allow us to give more importance to those cluster changes that
involve the Reference Cluster.
    Note that, when using a sliding window, the change in the new configuration
might be caused by the new point added or by the one that has been removed.
Our goal is to detect changes that are caused by current events, thus, PETRA
triggers warnings only when a change is caused by the new point added. To do so
we use the intersection window Wi∩ = [ti−w , ti−1 ] = Wi ∩ Wi−1 , and the union
window Wi∪ = [ti−w−1 , ti ] = Wi ∪ Wi−1 : by computing the clustering also on the
intersection window and comparing it with that of Wi we can say, in case they
are different, that the new time point ti has actually influenced the clustering
results. Moreover, by computing the clustering also on Wi∪ and comparing it
with that of Wi , if no change has happened, we can also say that the removal
of the last point ti−w−1 has not contributed to the change, meaning that it is
entirely due to the new point added.
3     Experimental Results
To test and validate PETRA in a real application, we perform a market simu-
lation consisting in the purchase and sale of companies’ stocks. This is because,
since cluster switches are a latent feature, there is no general ground truth to
test against and prove their actual relevance directly. However, as far as the
financial domain is concerned, we can use the market gain as an indicator for
the goodness of the clustering performed by PETRA.
    We consider the stock market prices of the 40 most highly capitalized compa-
nies in the Italian Stock Exchange 5 between 2008 and 2017. There are 2 values
for each trading day: the opening price (open) and the closing price (close); we
cluster based on the close-open indicator. Among the 40 companies we randomly
select some of them as leaders. Leaders are companies whose behaviour we know
in advance (i.e. when their price trend changes): for each of them we identify
some companies that are “close to the leader” (friends) for which we don’t know
the future behavior. The simulation consists in buying and selling stocks of the
friends depending on the behavior of the leaders. If the clustering is correct, the
friends’ behavior is similar to the leader’s one, hence the simulation will buy and
sell the stocks at the right time, resulting overall in a profit.
    For the validation, since no ground truth for the trend changes of a company’s
stock price is available, and since the state-of-the-art techniques for segmenting
5
    https://en.wikipedia.org/wiki/FTSE MIB
time series revealed to be not suitable as they do not lend themselves to a unique
parametrization suitable for all the companies, we designed an ad-hoc procedure
for identifying the major changes in the price trend of a leader a-posteriori. The
procedure divides the price timeline of a company into intervals of up-trend,
down-trend and congestion 6 according to the position of the company’s price
w.r.t. three moving averages over the price [6]. For each friend, its closeness to
the leader is evaluated using PETRA, thus, we first need to appropriately tune
the window length and the number of clusters.
     The width of the sliding window is a crucial parameter: a too narrow window
would result in an excessive sensitivity providing poor generalization potential,
while with a too wide one we risk to miss important events. To identify the best
width we analyze each possible width value, that is: (i) for each value n between
2 (the smallest possible window) and 1260 (half of the 2008-2017 period) we
divide the companies’ time series into segments of width n; (ii) for each n, for
each time segment T of length n, and for each pair of companies, we compute
the correlation between their segments, and arrange them all into a matrix AT,n ;
(iii) we compute the difference between each correlation matrix and the one at
the previous time slot; (iv) we compute their variance and then (v) we average all
variances together obtaining a single score for each window width n. We chose the
value associated with the smallest score corresponding to 105 days. We estimated
the number of clusters through knee/elbow analysis, which suggested an optimal
number of 4 clusters. After parameter tuning, the simulation picks 3 leaders and
then runs PETRA which, at each iteration, adopts a conservative policy:
 – If a leader enters an up-trend phase, pick the 4 closest companies in the same
   cluster as friends and buy them.
 – If a leader enters a down-trend phase, sell all its friends.
 – If a friend changes cluster, sell it.
To constitute valuable evidence we run the simulation ten thousand times over
different portfolios of leaders and then averaged the profits. Comparing the re-
sults to a stock market index is the common practice when it comes to market
simulation, thus, we use the FTSE MIB index as a baseline; Figure 3 reports the
aggregated results for each year by showing the average profits obtained with
PETRA and the FTSE MIB profits for the same year. The profits yielded by our
procedure are always positive: even in years like 2008, when the market under-
went a severe crisis, our procedure generated profits. Although there are certain
years in which FTSE MIB outperforms our approach, our results remains still
very comparable, especially when considered on the long term.
     The traditional approach [5] to validating Stock Market Clustering is to check
if the resulting clusters are consistent w.r.t. the business sectors. Being the indus-
trial sectors static entities, we decided not to use this approach for the validation
since the aim of our procedure is to dynamically spot anomalies and changes.
Nevertheless our clusters have proved to be consistent also under this point of
view. The four clusters identified by our procedure coincide with the sectors:
banking, energy, consumer products and industrial products.
6
    A period in which the price undergoes oscillations without a clear direction.
                                      2008      3,37    -49,53
                                      2009     13,06     16,52
                                      2010      4,85    -14,32
                                      2011      9,65    -26,16
                                      2012      9,34       5,3
                                      2013      9,18      12,3
                                      2014      5,04      0,44
                                      2015      9,15     11,96
                                      2016     11,66     -10,2
                                      2017      4,14     11,69
4    Conclusions and Thanks
We presented a technique 20
to cluster time series where 10
the clustering is iterated over   0

time and, at each iteration, -10
the differences between the -20
current clustering and the -30
previous one are studied to -40                                             PETRA
                                                                            FTSE MIB
spot significant changes.       -50
    We are grateful to U. P.        2008 2009 2010 2011 2012 2013 2014 2015 2016   2017

De Vos and to L. Raimondi
for taking part in the devel- Fig. 3. Average profit (in %) for each year between
opment of the algorithms and 2008 and 2017
the analysis of the results.
References
1. S. Aghabozorgi, A. S. Shirkhorshidi, T. Y. Wah: Time-series clustering-A decade
   review. Information Systems (2015)
2. J. H. Ward Jr.: Hierarchical grouping to optimize an objective function. Journal of
   the American statistical association (1963)
3. J. C. Gower, P. Legendre: Metric and Euclidean properties of dissimilarity coeffi-
   cients. Journal of classification (1986)
4. P. Langfelder, B. Zhang, S. Horvath: Defining clusters from a hierarchical cluster
   tree: the Dynamic Tree Cut package for R. Bioinformatics (2008)
5. M. Gavrilov, D. Anguelov, P. Indyk, R. Motwani: Mining the stock market: Which
   measure is best?. ACM SIGKDD (2000)
6. D. Azzalini, F. Azzalini, D. Greco, M. Mazuran, L. Tanca: Event Recognition Strate-
   gies applied in the Mercurio Project. MIDAS (2017)
7. K. Granstrom, M. Baum, S. Reuter: Extended Object Tracking: Introduction,
   Overview and Applications. Journal of Advances in Information Fusion (2017)
8. S. J. McKenna, S. Jabri, Z. Duric, A. Rosenfeld, H. Wechsler: Tracking groups of
   people. Computer vision and image understanding (2000)
9. M. N. Islam, M. Seera, C. K. Loo: A robust incremental clustering-based facial
   feature tracking. Applied Soft Computing (2017)
10. C. Huang, R. He, Z. Zhong, B. Ai, Z. Zhong: Comparison of Automatic Track-
   ing and Clustering Algorithms for Time-Variant Multipath Components. Globecom
   Workshops (2017)
11. D. Barbará: Requirements for clustering data streams. ACM SIGKDD Explo-
   rations Newsletter (2002)
12. D. Barbará, P. Chen: Tracking Clusters in Evolving Data Sets. FLAIRS Conference
   (2001)
13. E. J. Edwin, M. J. Gruber: Improved forecasting through the design of homoge-
   neous groups. The Journal of Business (1971)
14. S. Aghabozorgi, T. Y. Wah: Stock market co-movement assessment using a three-
   phase clustering method. Expert Systems with Applications (2014)
15. M. Spiliopoulou, I. Ntoutsi, Y. Theodoridis, R. Schult: Monic: modeling and mon-
   itoring cluster transitions. ACM SIGKDD (2006)
16. J. Han, J. Pei, M. Kamber: Data mining: concepts and techniques (2011)

</pre>