=Paper= {{Paper |id=None |storemode=property |title=Using Multi-Objective Optimization for the Selection of Ensemble Members |pdfUrl=https://ceur-ws.org/Vol-1422/108.pdf |volume=Vol-1422 |dblpUrl=https://dblp.org/rec/conf/itat/BartonK15 }} ==Using Multi-Objective Optimization for the Selection of Ensemble Members== https://ceur-ws.org/Vol-1422/108.pdf
J. Yaghob (Ed.): ITAT 2015 pp. 108–114
Charles University in Prague, Prague, 2015



    Using Multi-Objective Optimization for the Selection of Ensemble Members

                                                    Tomáš Bartoň1,2 Pavel Kordík1
                                                1 Faculty of Information Technology

                             Czech Technical University in Prague, Thakurova 9, Prague 6, Czech Republic
                                                     http://www.fit.cvut.cz
                                                   tomas.barton@fit.cvut.cz
                                       2 Institute of Molecular Genetics of the ASCR, v. v. i.

                                          Vídeňská 1083, 142 20 Prague 4, Czech Republic

Abstract: In this paper we propose a clustering process                   The problem is how to obtain a final clustering π ∗ =
which uses a multi-objective evolution to select a set of              {C1∗ ,C2∗ , . . . ,CK∗ }, where K is the number of clusters in the
diverse clusterings. The selected clusterings are then com-            result and π ∗ summarizes the information from ensem-
bined using a consensus method. This approach is com-                  ble Π.
pared to a clustering process where no selection is applied.              The process consists of two major steps. Firstly, we
We show that careful selection of input ensemble members               need to generate a set of clustering solutions and secondly
can improve the overall quality of the final clustering. Our           we need ti combine the information from these solutions.
algorithm provides more stable clustering results and in               A typical result of the ensemble process is a single clus-
many cases overcomes the limitations of base algorithms.               tering. It has been shown in supervised ensembles that the
                                                                       best results are achieved when using a set of predictors
                                                                       whose errors are dissimilar [25]. Thus it is desirable to
1    Introduction                                                      introduce diversity between ensemble members.
Clustering is a popular technique that can be used to reveal
patterns in data or as a preprocessing step in data analysis.          2.1 Ensemble Generation Strategies
Clustering tends to group similar objects into groups that
are called clusters and to place dissimilar objects into dif-          Several approaches have been used to initialize clustering
ferent clusters. Its application domains include data min-             solutions in order to create an ensemble.
ing, information retrieval, bioinformatics, image process-
                                                                         • Homogeneous ensembles: Base clusterings are cre-
ing and many others.
                                                                           ated using repeated runs of a single clustering algo-
   Many clustering algorithms have been introduced so
                                                                           rithm. This is quite a popular approach, repeated runs
far, an extensive overview of clustering techniques can be
                                                                           of k-means with random center initialization have
found in [1]. Since we have the No Free Lunch Theo-
                                                                           been used in [14]. When using k-means
                                                                                                              √       the number
rem [36], which states that there is no single, supreme al-
                                                                           of clusters is typically fixed to d ne, where n is the
gorithm that would optimally run on all datasets, it is com-
                                                                           size of the dataset [22].
plicated for a user to decide which algorithm to choose.
   Ensemble methods have been successfully applied in                    • Varying k: Repeated runs of k-means with random
supervised learning [25] and similar attempts appeared in                  initialization and k [15],
                                                                                                   √ a golden standard is using k
the unsupervised learning area [32], [14], [34]. Cluster                   in the range from 2 to n.
ensemble methods should eliminate the drawbacks of indi-
vidual methods by combining multiple solutions into one                  • Random subspacing An ensemble is created from
final clustering.                                                          base clusterings that use different initial data. This
                                                                           could be achieved by projecting data onto differ-
                                                                           ent subspaces [13], [16] choosing different subsets
2    Cluster Ensembles                                                     of features [32], [37], or using data sampling tech-
                                                                           niques [10].
The main objective of clustering ensembles is to combine
multiple clusterings into one, preferably high-quality so-               • Heterogeneous ensembles: Diversity of solutions is
lution.                                                                    introduced by applying different algorithms on the
   Let X = {x1 , x2 , . . . , xn } be a set of n data points, where        same dataset.
each xi ∈ X is represented by a vector of d attributes. A
cluster ensemble is defined as Π = {π1 , π2 , . . . , πm } with
                                                                       2.2 Consensus Functions
m clusterings. Each base clustering πi consists of a set
of clusters πi = {C1i ,C2i , . . . ,Cki i }, such that ∪kj=1
                                                          i
                                                            Cij = X,   Having multiple clusterings in an ensemble, many func-
where ki is the number of clusters in a given ensemble                 tions have been proposed to derive a final clustering.
member (it does not have to be the same for all members).              When only one solution is considered as the result, it is
Using Multi-Objective Optimization for the Selection of Ensemble Members                                                             109


usually referred as a consensus function, unlike meta clus-                in the dataset. Each number is a pointer to another instance
tering where the output is a set of multiple clusterings [6].              (an edge in the graph), since it is connected to the instance
   There are several approaches as to how to represent in-                 at a given index. This easily enables the application of
formation contained in base clusterings, some use matrices                 mutation and crossover operations.
while others use graph representation.                                        As a Multi-Objective Evolutionary Algorithm (MOEA),
                                                                           MOCK employs the Pareto Envelope-based Selection Al-
   • Pairwise similarities: A pairwise similarity matrix                   gorithm version 2 (PESA-II) [7], which keeps two popula-
     is created and afterwards a clustering algorithm (e.g.                tions, an internal population of fixed size and a larger ex-
     hierarchical agglomerative clustering) is applied to                  ternal population which is exploited to explore good solu-
     group together items that were most frequently to-                    tions. Two complementary objectives, deviation and con-
     gether in the same cluster in all the base cluster-                   nectivity, are used as objectives in the evolutionary pro-
     ings [14]. the Cluster-based Similarity Partitioning                  cess.
     Algorithm (CSPA) from Strehl and Ghosh [32] uses                         A clear disadvantage of MOCK is its computation com-
     METIS [24] for partitioning a similarity matrix into k                plexity, which is a typical characteristic of evolutionary
     components.                                                           algorithms. Nevertheless, the computation time spent on
                                                                           MOCK should result in high-quality solutions. Faceli et
   • Feature-based approach: The ensemble problem is                       al. [12] reported that for some high-dimensional data it is
     formulated as categorical data clustering. For each                   not guaranteed that the algorithm will complete, unless the
     data point an m-dimensional vector containing labels                  control front distribution has been adjusted for the given
     in base clusterings is created. The goal is to find a                 data set.
     partition π ∗ which summarizes the information gath-                     Faceli et al. [12] combined a multi-objective approach
     ered from the partitions Π [28], [33], [4].                           to clustering with ensemble methods and the resulting al-
   • Graph based: Many methods use graph represen-                         gorithm is called MOCLE (Muli-Objective Clustering En-
     tation for capturing relationships between base clus-                 semble Algorithm). The objectives used in the MOCLE
     terings. Strehl and Ghosh [32] also proposed the                      algorithm are the same as those used in MOCK [19]: de-
     HyperGraph-Partitioning Algorithm (HGPA), where                       viation and connectivity. Unlike MOCK, in this case the
     vertices correspond to data points and a hyperedge                    evolutionary algorithm used is NSGA-II [9].
     represents clusters. Another approach chooses CO-
     MUSA [27] which increases the weight of the edge                      4   Our Approach
     for each occurrence of data pairs in the same clus-
     ter. Afterwards the nodes are sorted by the attach-                   There are many ways to produce a final consensus,
     ment score, which is defined as the ratio between the                 nonetheless in this work we focus on the selection of high-
     sum of the node’s weights and its number of inci-                     quality and diverse clusterings.
     dent edges. The nodes with the highest attachment                        In order to optimize ensemble member selection, we
     score are then used as a foundation for new clusters.                 apply a multi-objective optimization. The whole process
     This approach is relatively fast to compute, however                  is shown in Fig. 1. In our previous work [3] we have
     it might fail to capture complex relationships between                shown that introducing multiple objectives into the cluster-
     very diverse clusterings.                                             ing process can improve the quality of clustering, specifi-
                                                                           cally using the Akaike information criterion (AIC) [2] (or
                                                                           BIC [31]) with another criterion leads to better results. The
3 Multi-Objective Clustering                                               second criterion is typically based on computing a ratio
                                                                           between cluster compactness and the sum of distances be-
Multi-objective clustering usually optimizes two objective                 tween cluster centres.
functions. Using more than a few objectives is not usual                      AIC is typically used in supervised learning when trying
because the whole process of optimization becomes less                     to estimate model error. Essentially it attempts to estimate
effective.                                                                 the optimism of the model and then add it to the error [20]:
   The first multi-objective evolutionary clustering algo-
rithm was introduced in 2004 by Handl and Knowles [18]                                                 log (Ln (k))     k
and is called VIENNA (the Voronoi Initialized Evolution-                                 fAIC = −2 ·                +2·             (1)
                                                                                                            n           n
ary Nearest-Neighbour Algorithm).
                                                                           where Ln (k) is the maximum likelihood of a model
   Subsequently, in 2007 Handl and Knowles published a
                                                                                        with k parameters based on a sample of size n.
Pareto-based multi-objective evolutionary clustering algo-
rithm called MOCK [19] (Multi-Objective Clustering with                       The SD index was introduced in 2000 by Halkidi
automatic K-determination). Each individual in MOCK is                     et al. [17] and it is based on concepts of average cluster
represented as a directed graph which is then translated                   scattering and total separation of clusters which were pre-
into a clustering. The genotype is encoded as an array of                  viously used by Rezaee et al. [29] for evaluation of fuzzy
integers whose length is same as the number of instances                   clusterings.
110                                                                                                               T. Bartoň, P. Kordík



                                                                               Dmin =       min      ( c̄i − c̄ j )               (5)
                                                                                        i, j∈{1,...,k}
                                                                                             i6= j

                                                                  Then we can define the SD validity index as follows:

                                                                                fSD = α · Scat(k) + Dis(k)                        (6)

                                                                where α should be a weighting factor equal to Dis(cmax )
                                                                with cmax being the maximum number of clusters [17].
                                                                This makes perfect sense for fuzzy clustering (as was pro-
                                                                posed in [29]), however it is rather unclear how to compute
                                                                cmax in the case of crisp clustering, when cmax  k with-
                                                                out running another clustering with cmax as the requested
                                                                number of clusters. Nonetheless, [17] mentions that “SD
                                                                proposes an optimal number of clusters almost irrespec-
                                                                tive of cmax , the maximum number of clusters”, thus we
                                                                consider the special case where cmax = k:

                                                                              fSD = Dis(k) · Scat(k) + Dis(k)                     (7)
                                                                                  = Dis(k) · (Scat(k) + 1)                        (8)

                                                                   The idea of minimizing inner cluster distances and max-
                                                                imizing distances between cluster centres it not really
                                                                new. In fact most of the clustering objective functions
Figure 1: Cluster Ensemble Process: Firstly using the
                                                                use this idea. Instead of the SD index we could have
same input dataset we generate multiple clusterings, then
                                                                used C-index [21], Calinski-Harabasz Index [5], Davies-
select m best clusterings for the ensemble and combine
                                                                Bouldin [8] or Dunn’s index [11], just to name a few. Each
them into a single solution.
                                                                of these indexes might perform better on some dataset,
                                                                however it is out of scope of this paper to compare which
  The average scattering is defined as:                         combination would be better. We considered
                                                                   As an evolutionary algorithm we use NSGA-II [9] while
                              1 k kσ (c̄i )k                    using only the mutation operator and a relatively small
               Scatt(k) =       ∑ k σ (X)k
                              k i=1
                                                          (2)
                                                                population (see Table 2). Each individual in the popula-
                                                                tion contains a configuration of a clustering algorithm. For
where     kxk        is the norm of a vector,                   this experiment we used only the basic k-means [26] algo-
          c̄i        is the centroid of the i-th cluster,       rithm with random initialization. The number of clusters k
          σ (X)      is the variance of the input dataset.      is the only parameter that we change during the evolution-
                                                                ary process. For k we used a constraint in order to keep
σ (X) ∈ Rm with m being the number of dataset dimen-            the number of clusters within reasonable
sions. Variance for a dimension d is defined as:                                                          √ boundaries. For
                                                                all test datasets we used the interval h2, ni.
                          1 n           2                        In order to evaluate the effect of Pareto optimal solu-
                     σ d = ∑ xid − x̄d                          tions selection for the ensemble bagging process, we com-
                          n i=1
                          s                                     pare two strategies for generating base clusterings. The
                                 m                              first one generates 10 k-means clusterings√with random
               kσ (X)k =        ∑ (σ d )2                       initialization with k within the interval h2, ni. The sec-
                                i=1
                                                                ond method uses multi-objective evolution of clusterings
  The total separation is given by:                             where each individual represents a k-means clustering.
                                                    !−1            As a consensus function we use either a graph based
                  Dmax k              k                         COMUSA algorithm or a hierarchical agglomerative clus-
         Dis(k) =      ∑
                  Dmin i=1           ∑ c̄i − c̄ j         (3)   tering of the co-association matrix. Both approaches have
                                     j=1
                                                                their drawbacks, the COMUSA algorithm is locally con-
where Dmax is the maximum distance and Dmin is the min-         sistent, however fails to integrate the information con-
imum distance between cluster centers (c̄i ) and k is the       tained in different base clusterings, thus final clustering
number of clusters.                                             does not overcome limitations of base clustering algo-
                                                                rithms. The latter could be very slow on larger datasets
              Dmax =      max ( c̄i − c̄ j )              (4)   (complexity O(n3 )) and might produce noisy clustering in
                       i, j∈{1,...,k}
                            i6= j                               case of very diverse base clusterings.
Using Multi-Objective Optimization for the Selection of Ensemble Members                                                              111


        Dataset            d       n      classes      source                     Dataset        C-RAND        C-MO      k-means
        aggregation        2     788            7       [15]                      aggregation        0.75        0.84       0.84
        atom               3     800            2       [35]                      atom               0.39        0.61       0.28
        chainlink          3    1000            2       [35]                      chainlink          0.07        0.50       0.07
        complex8           2    2551            8       [30]                      complex8           0.59        0.64       0.59
        complex9           2    3031            9       [30]                      complex9           0.64        0.66       0.65
        diamond9           2    3000            9       [30]                      diamond9           1.00        0.87       0.97
        jain               2     373            2       [23]                      jain               0.36        0.51       0.37
        long1              2    1000            2       [18]                      long1              0.00        0.49       0.03
        longsquare         2     900            6       [18]                      longsquare         0.83        0.86       0.80
        lsun               2     400            3       [35]                      lsun               0.54        0.69       0.54
        target             2     770            6       [35]                      target             0.67        0.57       0.69
        tetra              3     400            4       [35]                      tetra              1.00        0.86       0.99
        triangle1          2    1000            4       [18]                      triangle1          0.95        0.78       0.94


           Table 1: Datasets used for experiments.                          Table 3: NMIsqrt score from various artificial dataset clus-
                                                                            terings. C-RAND uses bagging of 10 k-means runs with
  Parameter                              Setting                            a fixed value k that corresponds to the number of classes in
  Number of generations                  5                                  the given dataset. The same approach is used in the case of
                                                                            k-means, but without bagging. C-MO is a NSGA-II based
  Population size                        10
                                                                            algorithm with criteria AIC and SD index; the number of
  Mutation probability                   30%                                clusters √
                                                                                     is randomized during evolution within the inter-
  Mutation                               Polynomial mutation                val 2 to n. Both C-RAND and C-MO use the COMUSA
                                                                            algorithm to form final (consensus) partitioning. The NMI
    Table 2: Parameter settings for NSGA-II evolution                       values are averages from 30 independent runs.

   To evaluate clustering quality we use NMI1 (Normal-
ized Mutual Information) [32]. NMI has proved to be
a better criterion than the Adjusted Rand Index (ARI) for                   grounds. We tried using Deviation and Connectivity, as it
the evaluation of datasets with apriori known labels. How-                  was proposed by Handl [19] (and later used by [12]), but
ever it does not capture noise in clusters (which does not                  for all tested datasets, our approach was better or at least
occur in the case of traditional k-means), which might be                   comparable with these objectives.
introduced by some consensus methods. Another limita-
tion of these evaluation metrics is apparent from Figure 6,                    In the case of the COMUSA approach, the multi-
where clustering with obvious 50% assignment error gets                     objective selection either improved the final NMI score
lowest possible score. Both indexes do not penalize cor-                    or it was comparable with the random selection. There
rectly higher number of clusters, thus algorithms produc-                   were only a few exceptions, especially in case of compact
ing many small clusters will be preferred by these objec-                   datasets like the 9 diamond dataset which contains a spe-
tives.                                                                      cific regular pattern that is unlikely to appear in any real-
                                                                            world dataset (see Table 3). The number of clusters pro-
                                                                            duces by COMUSA approach was usually higher than the
5 Results                                                                   correct number of clusters and clustering contains many
                                                                            nonsense clusters. Despite these facts the NMI score does
Most of the datasets used in these experiments come from                    not penalize these properties appropriately.
the Fundamental Clustering Problems Suite [35]. We have
intentionally chosen low dimensional data in order to be                       Agglomerative clustering of the co-association matrix
able to visually evaluate the results. An overview of the                   despite its high computational requirement provide pretty
datasets used can be found in Table 1.                                      good results. Nonetheless in the case of the chainlink
   As objectives during the multi-objective evolution we                    dataset there is over 50% improvement in NMI. It is im-
used AIC and SD index in all cases. The advantage is com-                   portant to note, that for HAC-RAND and HAC-MO we
bination of two measures that are based on very different                   did not provide the information about the correct number
                                                                            of clusters. Still these methods manages to estimate num-
     1 There are several versions of this evaluation method, sometimes it   ber of clusters or at least provide result that is very close
is referred as NMIsqrt                                                      to the number of supervised classes.
112                                                                                                         T. Bartoň, P. Kordík


        Dataset         HAC-RAND          HAC-MO
        aggregation           0.74            0.71
        atom                  0.44            0.68
        chainlink             0.47            0.99
        complex8              0.72            0.71
        complex9              0.71            0.70
        diamond9              0.83            0.75
        jain                  0.61            0.60
        long1                 0.73            0.81
        longsquare            0.73            0.89
        lsun                  0.66            0.71
        target                0.40            0.40
        tetra                 0.75            0.99
        triangle1             0.78            0.89

                                                                 Figure 3: Consensus of 10 independent k-means runs on
Table 4: NMIsqrt score from the clustering of various            the longsquare dataset using k = 6. The co-association
artificial datasets. As a consensus method, hierarchical         matrix obtained was clustered using hierarchical clustering
agglomerative clustering (with complete linkage) of co-          (average linkage). The resulting clustering is qualitatively
association matrix is used. In the case of HAC-RAND              better than a single k-means run, however it contains noise
we run 10 independent k-means clusterings   √ with a ran-        in all clusters (Ajusted Rand Index = 0.99, NMI = 0.95).
dom number of clusters (between 2 and n), then form
a co-association matrix and finally run agglomerative clus-
tering of the matrix. HAC-RAND works in very similar
manner, but instead of the first step a multi-objective evo-
lution of k-means is performed. 10 dominating solutions
are selected and the rest of the algorithm is the same.




                                                                 Figure 4: Consensus of 10 independent k-means runs on
                                                                 the long1 dataset using k = 6. On many datasets setting
                                                                 the correct k does not improve the quality of the resulting
                                                                 clustering. The co-association matrix obtained was clus-
                                                                 tered using hierarchical clustering (average linkage).

Figure 2: Typical clustering longsquare dataset using
k-means (k = 6). K-Means algorithm fails to reveal non-          lection improved the overall quality of clustering results,
spherical clusters (Ajusted Rand Index = 0.94, NMI =             however ensemble methods might produce noisy cluster-
0.85).                                                           ing with a higher evaluation score. Noisy clusterings are
                                                                 hard to measure with current evaluation metrics, therefore
                                                                 it might be beneficial to include an unsupervised score in
6 Conclusion                                                     the results. In further research we would like to exam-
                                                                 ine the process of selecting optimal objectives for each
During our experiments we have shown that careful se-            dataset.
lection of clusterings for the ensemble process can signif-
icantly improve overall clustering quality for non-trivial
datasets (measured by NMI).                                      Acknowledgement
   It is interesting that non-spherical clusters could be dis-
covered by consensus function when agglomerative hier-           The research reported in this paper was supported by the
archical clustering is used (compare Fig. 2 and 3).              Czech Science Foundation (GAČR) grant 13-17187S.
   Using a multi-objective optimization for clustering se-
Using Multi-Objective Optimization for the Selection of Ensemble Members                                                                 113


                                                                            [6] Caruana, R., Elhawary, M., Nguyen, N., Smith, C.: Meta
                                                                                clustering. In: Proceedings of the Sixth International Con-
                                                                                ference on Data Mining (Washington, DC, USA, 2006),
                                                                                ICDM’06, IEEE Computer Society, 107–118
                                                                            [7] Corne, D., Jerram, N., Knowles, J., Oates, M.: PESA-II:
                                                                                region-based selection in evolutionary multiobjective opti-
                                                                                mization. In: Proceedings of the Genetic and Evolutionary
                                                                                Computation Conference (GECCO-2001) (2001)
                                                                            [8] Davies, D. L., Bouldin, D. W.: A cluster separation mea-
                                                                                sure. IEEE Trans. Pattern Anal. Mach. Intell. 1 (2) (1979),
                                                                                224–227
                                                                            [9] Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and
                                                                                elitist multiobjective genetic algorithm: NSGA-II. IEEE
Figure 5: Consensus of 10 independent k-means runs on                           Transactions on Evolutionary Computation 6 (2) (2002),
the triangle1 dataset using k = 6. In this case the consen-                     182–197
sus produces a nonsense cluster that mixes items from 3                    [10] Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy
other clusters together.                                                        of a clustering procedure. Bioinformatics 19 (9) (2003),
                                                                                1090–1099
                                                                           [11] Dunn, J. C.: Well-separated clusters and optimal fuzzy par-
                                                                                titions. Journal of Cybernetics 4 (1) (1974), 95–104
                                                                           [12] Faceli, K., de Souto, M. C. P., de Araujo, D. S. A., de Car-
                                                                                valho, A. C. P. L. F.: Multi-objective clustering ensemble
                                                                                for gene expression data analysis. Neurocomputing 72 (13-
                                                                                15) (2009), 2763–2774
                                                                           [13] Fern, X. Z., Brodley, C. E.: Random projection for high
                                                                                dimensional data clustering: a cluster ensemble approach.
                                                                                In: ICML (2003), T. Fawcett and N. Mishra, (Eds.), AAAI
                                                                                Press, 186–193
                                                                           [14] Fred, A. L. N., Jain, A. K.: Combining multiple clusterings
                                                                                using evidence accumulation. IEEE Trans. Pattern Anal.
                                                                                Mach. Intell. 27 (6) (2005), 835–850
Figure 6: Single k-means runs on the long1 dataset using                   [15] Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggrega-
k = 2. K-Means algorithm fails to reveal non-spherical                          tion. TKDD 1 (1) (2007)
clusters. Both supervised indexes Ajusted Rand Index and                   [16] Gullo, F., Domeniconi, C., Tagarelli, A.: Projective clus-
NMI assigns this clustering score 0.0, even though there is                     tering ensembles. Data Min. Knowl. Discov. 26 (3) (2013),
50% assignment error.                                                           452–511
                                                                           [17] Halkidi, M., Vazirgiannis, M., Batistakis, Y.: Quality
                                                                                scheme assessment in the clustering process. In: PKDD
                                                                                (2000), D. A. Zighed, H. J. Komorowski, and J. M. Zytkow,
References                                                                      (Eds.), vol. 1910 of Lecture Notes in Computer Science,
                                                                                Springer, 265–276
 [1] Aggarwal, C. C., Reddy, C. K., (Eds.): Data clustering: al-           [18] Handl, J., Knowles, J.: Evolutionary multiobjective clus-
     gorithms and applications. CRC Press, 2014                                 tering. Parallel Problem Solving from Nature – PPSN VIII
                                                                                (2004), 1081–1091
 [2] Akaike, H.: A new look at the statistical model identifica-
     tion. IEEE Transactions on Automatic Control 19 (6 De-                [19] Handl, J., Knowles, J.: An evolutionary approach to mul-
     cember 1974), 716–723                                                      tiobjective clustering. Evolutionary Computation, IEEE
                                                                                Transactions on 11 (1) (2007), 56–76
 [3] Bartoň, T., Kordík, P.: Evaluation of relative indexes for
     multi-objective clustering. In: Hybrid Artificial Intelli-            [20] Hastie, T., Tibshirani, R., Friedman, J., Corporation, E.:
     gent Systems, E. Onieva, I. Santos, E. Osaba, H. Quin-                     The elements of statistical learning. Springer, Dordrecht,
     tián, and E. Corchado, (Eds.), vol. 9121 of Lecture Notes                  2009
     in Computer Science. Springer International Publishing,               [21] Hubert, L., Levin, J.: A general statistical framework for
     2015, 465–476                                                              assessing categorical clustering in free recall. Psychologi-
 [4] Boulis, C., Ostendorf, M.: Combining multiple clustering                   cal Bulletin 83 (6) (1976), 1072
     systems. In: PKDD (2004), J.-F. Boulicaut, F. Esposito,               [22] Iam-on, N., Boongoen, T.: Improved link-based cluster
     F. Giannotti, and D. Pedreschi, (Eds.), vol. 3202 of Lecture               ensembles. In: IJCNN (2012), IEEE, 1–8.
     Notes in Computer Science, Springer, 63–74                            [23] Jain, A. K., Law, M. H. C.: Data clustering: a user’s
 [5] Caliński, T., Harabasz, J.: A dendrite method for cluster                 dilemma. In: PReMI (2005), S. K. Pal, S. Bandyopadhyay,
     analysis. Communications in Statistics-theory and Meth-                    and S. Biswas, (Eds.), vol. 3776 of Lecture Notes in Com-
     ods 3 (1) (1974), 1–27                                                     puter Science, Springer, 1–10
114                                                                  T. Bartoň, P. Kordík


[24] Karypis, G., Kumar, V.: A fast and high quality multi-
     level scheme for partitioning irregular graphs. SIAM J. Sci.
     Comput. 20 (December 1998), 359–392
[25] Kittler, J., Hatef, M., Duin, R. P. W.: Combining classi-
     fiers. In: Proceedings of the Sixth International Conference
     on Pattern Recognition (Silver Spring, MD, 1996), IEEE
     Computer Society Press, 897–901
[26] MacQueen, J. B.: Some methods for classification and
     analysis of multivariate observations. In: Proc. of the Fifth
     Berkeley Symposium on Mathematical Statistics and Prob-
     ability (1967), L. M. L. Cam and J. Neyman, (Eds.), vol. 1,
     University of California Press, 281–297
[27] Mimaroglu, S., Erdil, E.: Obtaining better quality final
     clustering by merging a collection of clusterings. Bioinfor-
     matics 26 (20) (2010), 2645–2646
[28] Nguyen, N., Caruana, R.: Consensus clusterings. In:
     ICDM (2007), IEEE Computer Society, 607–612
[29] Rezaee, M. R., Lelieveldt, B., Reiber, J.: A new cluster
     validity index for the fuzzy c-mean. Pattern Recognition
     Letters 19 (3–4) (1998), 237–246
[30] Salvador, S., Chan, P.: Determining the number of clus-
     ters/segments in hierarchical clustering/segmentation algo-
     rithms. In: ICTAI (2004), IEEE Computer Society, 576–
     584
[31] Schwarz, G.: Estimating the dimension of a model. Annals
     of Statistics 6 (1978), 461–464
[32] Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge
     reuse framework for combining multiple partitions. Jour-
     nal on Machine Learning Research (JMLR) 3 (December
     2002), 583–617
[33] Topchy, A. P., Jain, A. K., Punch, W. F.: Clustering en-
     sembles: models of consensus and weak partitions. IEEE
     Trans. Pattern Anal. Mach. Intell. 27 (12) (2005), 1866–
     1881
[34] Tumer, K., Agogino, A. K.: Ensemble clustering with vot-
     ing active clusters. Pattern Recognition Letters 29 (14)
     (2008), 1947–1953
[35] Ultsch, A., Mörchen, F.: ESOM-Maps: tools for cluster-
     ing, visualization, and classification with Emergent SOM.
     Technical Report No. 46, Dept. of Mathematics and Com-
     puter Science, University of Marburg, Germany (2005)
[36] Wolpert, D. H., Macready, W. G.: No free lunch theorem
     for optimization. IEEE Transactions on Evolutionary Com-
     putation 1 (1) (1997)
[37] Yu, Z., Wong, H. -S., Wang, H. -Q.: Graph-based con-
     sensus clustering for class discovery from gene expression
     data. Bioinformatics 23 (21) (2007), 2888–2896