<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Multi-Objective Optimization for the Selection of Ensemble Members</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tomáš Bartonˇ</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pavel Kordík</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Information Technology Czech Technical University in Prague</institution>
          ,
          <addr-line>Thakurova 9, Prague 6</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Molecular Genetics of the ASCR</institution>
          ,
          <addr-line>v. v. i. Vídenˇská 1083, 142 20 Prague 4</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>108</fpage>
      <lpage>114</lpage>
      <abstract>
        <p>In this paper we propose a clustering process which uses a multi-objective evolution to select a set of diverse clusterings. The selected clusterings are then combined using a consensus method. This approach is compared to a clustering process where no selection is applied. We show that careful selection of input ensemble members can improve the overall quality of the final clustering. Our algorithm provides more stable clustering results and in many cases overcomes the limitations of base algorithms.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Clustering is a popular technique that can be used to reveal
patterns in data or as a preprocessing step in data analysis.
Clustering tends to group similar objects into groups that
are called clusters and to place dissimilar objects into
different clusters. Its application domains include data
mining, information retrieval, bioinformatics, image
processing and many others.</p>
      <p>
        Many clustering algorithms have been introduced so
far, an extensive overview of clustering techniques can be
found in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Since we have the No Free Lunch
Theorem [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ], which states that there is no single, supreme
algorithm that would optimally run on all datasets, it is
complicated for a user to decide which algorithm to choose.
      </p>
      <p>
        Ensemble methods have been successfully applied in
supervised learning [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] and similar attempts appeared in
the unsupervised learning area [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ], [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ]. Cluster
ensemble methods should eliminate the drawbacks of
individual methods by combining multiple solutions into one
final clustering.
      </p>
      <p>The problem is how to obtain a final clustering π∗ =
{C1∗,C2∗, . . . ,CK∗ }, where K is the number of clusters in the
result and π∗ summarizes the information from
ensemble Π.</p>
      <p>
        The process consists of two major steps. Firstly, we
need to generate a set of clustering solutions and secondly
we need ti combine the information from these solutions.
A typical result of the ensemble process is a single
clustering. It has been shown in supervised ensembles that the
best results are achieved when using a set of predictors
whose errors are dissimilar [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. Thus it is desirable to
introduce diversity between ensemble members.
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Ensemble Generation Strategies</title>
      <p>Several approaches have been used to initialize clustering
solutions in order to create an ensemble.</p>
      <p>
        • Homogeneous ensembles: Base clusterings are
created using repeated runs of a single clustering
algorithm. This is quite a popular approach, repeated runs
of k-means with random center initialization have
been used in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. When using k-means the number
of clusters is typically fixed to d√ne, where n is the
size of the dataset [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
• Varying k: Repeated runs of k-means with random
initialization and k [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], a golden standard is using k
in the range from 2 to √n.
• Random subspacing An ensemble is created from
base clusterings that use different initial data. This
could be achieved by projecting data onto
different subspaces [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] choosing different subsets
of features [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ], [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ], or using data sampling
techniques [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
• Heterogeneous ensembles: Diversity of solutions is
introduced by applying different algorithms on the
same dataset.
2.2
      </p>
    </sec>
    <sec id="sec-3">
      <title>Consensus Functions</title>
      <p>
        Having multiple clusterings in an ensemble, many
functions have been proposed to derive a final clustering.
When only one solution is considered as the result, it is
usually referred as a consensus function, unlike meta
clustering where the output is a set of multiple clusterings [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>There are several approaches as to how to represent
information contained in base clusterings, some use matrices
while others use graph representation.</p>
      <p>
        • Pairwise similarities: A pairwise similarity matrix
is created and afterwards a clustering algorithm (e.g.
hierarchical agglomerative clustering) is applied to
group together items that were most frequently
together in the same cluster in all the base
clusterings [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. the Cluster-based Similarity Partitioning
Algorithm (CSPA) from Strehl and Ghosh [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ] uses
METIS [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] for partitioning a similarity matrix into k
components.
• Feature-based approach: The ensemble problem is
formulated as categorical data clustering. For each
data point an m-dimensional vector containing labels
in base clusterings is created. The goal is to find a
partition π ∗ which summarizes the information
gathered from the partitions Π [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ], [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
• Graph based: Many methods use graph
representation for capturing relationships between base
clusterings. Strehl and Ghosh [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ] also proposed the
HyperGraph-Partitioning Algorithm (HGPA), where
vertices correspond to data points and a hyperedge
represents clusters. Another approach chooses
COMUSA [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] which increases the weight of the edge
for each occurrence of data pairs in the same
cluster. Afterwards the nodes are sorted by the
attachment score, which is defined as the ratio between the
sum of the node’s weights and its number of
incident edges. The nodes with the highest attachment
score are then used as a foundation for new clusters.
This approach is relatively fast to compute, however
it might fail to capture complex relationships between
very diverse clusterings.
3
      </p>
      <sec id="sec-3-1">
        <title>Multi-Objective Clustering</title>
        <p>Multi-objective clustering usually optimizes two objective
functions. Using more than a few objectives is not usual
because the whole process of optimization becomes less
effective.</p>
        <p>
          The first multi-objective evolutionary clustering
algorithm was introduced in 2004 by Handl and Knowles [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]
and is called VIENNA (the Voronoi Initialized
Evolutionary Nearest-Neighbour Algorithm).
        </p>
        <p>
          Subsequently, in 2007 Handl and Knowles published a
Pareto-based multi-objective evolutionary clustering
algorithm called MOCK [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] (Multi-Objective Clustering with
automatic K-determination). Each individual in MOCK is
represented as a directed graph which is then translated
into a clustering. The genotype is encoded as an array of
integers whose length is same as the number of instances
in the dataset. Each number is a pointer to another instance
(an edge in the graph), since it is connected to the instance
at a given index. This easily enables the application of
mutation and crossover operations.
        </p>
        <p>
          As a Multi-Objective Evolutionary Algorithm (MOEA),
MOCK employs the Pareto Envelope-based Selection
Algorithm version 2 (PESA-II) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], which keeps two
populations, an internal population of fixed size and a larger
external population which is exploited to explore good
solutions. Two complementary objectives, deviation and
connectivity, are used as objectives in the evolutionary
process.
        </p>
        <p>
          A clear disadvantage of MOCK is its computation
complexity, which is a typical characteristic of evolutionary
algorithms. Nevertheless, the computation time spent on
MOCK should result in high-quality solutions. Faceli et
al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] reported that for some high-dimensional data it is
not guaranteed that the algorithm will complete, unless the
control front distribution has been adjusted for the given
data set.
        </p>
        <p>
          Faceli et al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] combined a multi-objective approach
to clustering with ensemble methods and the resulting
algorithm is called MOCLE (Muli-Objective Clustering
Ensemble Algorithm). The objectives used in the MOCLE
algorithm are the same as those used in MOCK [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]:
deviation and connectivity. Unlike MOCK, in this case the
evolutionary algorithm used is NSGA-II [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
4
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Our Approach</title>
        <p>There are many ways to produce a final consensus,
nonetheless in this work we focus on the selection of
highquality and diverse clusterings.</p>
        <p>
          In order to optimize ensemble member selection, we
apply a multi-objective optimization. The whole process
is shown in Fig. 1. In our previous work [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] we have
shown that introducing multiple objectives into the
clustering process can improve the quality of clustering,
specifically using the Akaike information criterion (AIC) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] (or
BIC [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]) with another criterion leads to better results. The
second criterion is typically based on computing a ratio
between cluster compactness and the sum of distances
between cluster centres.
        </p>
        <p>
          AIC is typically used in supervised learning when trying
to estimate model error. Essentially it attempts to estimate
the optimism of the model and then add it to the error [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]:
fAIC = −2 ·
log (Ln(k))
n
        </p>
        <p>k
+ 2 · n
(1)
where Ln(k) is the maximum likelihood of a model
with k parameters based on a sample of size n.</p>
        <p>
          The SD index was introduced in 2000 by Halkidi
et al. [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] and it is based on concepts of average cluster
scattering and total separation of clusters which were
previously used by Rezaee et al. [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ] for evaluation of fuzzy
clusterings.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
Dmin =
        </p>
        <p>min
i, j∈{1,...,k}
i6= j</p>
        <p>( c¯i − c¯j )
fSD = α · Scat(k) + Dis(k)</p>
        <p>
          Then we can define the SD validity index as follows:
where α should be a weighting factor equal to Dis(cmax)
with cmax being the maximum number of clusters [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
This makes perfect sense for fuzzy clustering (as was
proposed in [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]), however it is rather unclear how to compute
cmax in the case of crisp clustering, when cmax k
without running another clustering with cmax as the requested
number of clusters. Nonetheless, [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] mentions that “SD
proposes an optimal number of clusters almost
irrespective of cmax, the maximum number of clusters”, thus we
consider the special case where cmax = k:
fSD = Dis(k) · Scat(k) + Dis(k)
        </p>
        <p>= Dis(k) · (Scat(k) + 1)</p>
        <p>
          The idea of minimizing inner cluster distances and
maximizing distances between cluster centres it not really
new. In fact most of the clustering objective functions
use this idea. Instead of the SD index we could have
used C-index [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], Calinski-Harabasz Index [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ],
DaviesBouldin [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] or Dunn’s index [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], just to name a few. Each
of these indexes might perform better on some dataset,
however it is out of scope of this paper to compare which
combination would be better. We considered
        </p>
        <p>
          As an evolutionary algorithm we use NSGA-II [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] while
using only the mutation operator and a relatively small
population (see Table 2). Each individual in the
population contains a configuration of a clustering algorithm. For
this experiment we used only the basic k-means [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]
algorithm with random initialization. The number of clusters k
is the only parameter that we change during the
evolutionary process. For k we used a constraint in order to keep
the number of clusters within reasonable boundaries. For
all test datasets we used the interval h2, √ni.
        </p>
        <p>In order to evaluate the effect of Pareto optimal
solutions selection for the ensemble bagging process, we
compare two strategies for generating base clusterings. The
first one generates 10 k-means clusterings with random
initialization with k within the interval h2, √ni. The
second method uses multi-objective evolution of clusterings
where each individual represents a k-means clustering.</p>
        <p>As a consensus function we use either a graph based
COMUSA algorithm or a hierarchical agglomerative
clustering of the co-association matrix. Both approaches have
their drawbacks, the COMUSA algorithm is locally
consistent, however fails to integrate the information
contained in different base clusterings, thus final clustering
does not overcome limitations of base clustering
algorithms. The latter could be very slow on larger datasets
(complexity O(n3)) and might produce noisy clustering in
case of very diverse base clusterings.
σ (X ) ∈ Rm with m being the number of dataset
dimensions. Variance for a dimension d is defined as:
σ d =
kσ (X )k =
1 ∑n xid − x¯d 2
n i=1
s m
∑(σ d )2
i=1
The total separation is given by:</p>
        <p>Dis(k) = Dmax ∑k
k
∑
Dmin i=1 j=1
c¯i − c¯j
!−1
where Dmax is the maximum distance and Dmin is the
minimum distance between cluster centers (c¯i) and k is the
number of clusters.</p>
        <p>Dmax =</p>
        <p>
          max
i, j∈{1,...,k}
i6= j
( c¯i − c¯j )
To evaluate clustering quality we use NMI1
(Normalized Mutual Information) [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]. NMI has proved to be
a better criterion than the Adjusted Rand Index (ARI) for
the evaluation of datasets with apriori known labels.
However it does not capture noise in clusters (which does not
occur in the case of traditional k-means), which might be
introduced by some consensus methods. Another
limitation of these evaluation metrics is apparent from Figure 6,
where clustering with obvious 50% assignment error gets
lowest possible score. Both indexes do not penalize
correctly higher number of clusters, thus algorithms
producing many small clusters will be preferred by these
objectives.
5
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Results</title>
        <p>
          Most of the datasets used in these experiments come from
the Fundamental Clustering Problems Suite [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ]. We have
intentionally chosen low dimensional data in order to be
able to visually evaluate the results. An overview of the
datasets used can be found in Table 1.
        </p>
        <p>As objectives during the multi-objective evolution we
used AIC and SD index in all cases. The advantage is
combination of two measures that are based on very different
1There are several versions of this evaluation method, sometimes it
is referred as NMIsqrt</p>
        <p>In the case of the COMUSA approach, the
multiobjective selection either improved the final NMI score
or it was comparable with the random selection. There
were only a few exceptions, especially in case of compact
datasets like the 9 diamond dataset which contains a
specific regular pattern that is unlikely to appear in any
realworld dataset (see Table 3). The number of clusters
produces by COMUSA approach was usually higher than the
correct number of clusters and clustering contains many
nonsense clusters. Despite these facts the NMI score does
not penalize these properties appropriately.</p>
        <p>Agglomerative clustering of the co-association matrix
despite its high computational requirement provide pretty
good results. Nonetheless in the case of the chainlink
dataset there is over 50% improvement in NMI. It is
important to note, that for HAC-RAND and HAC-MO we
did not provide the information about the correct number
of clusters. Still these methods manages to estimate
number of clusters or at least provide result that is very close
to the number of supervised classes.
During our experiments we have shown that careful
selection of clusterings for the ensemble process can
significantly improve overall clustering quality for non-trivial
datasets (measured by NMI).</p>
        <p>It is interesting that non-spherical clusters could be
discovered by consensus function when agglomerative
hierarchical clustering is used (compare Fig. 2 and 3).</p>
        <p>Using a multi-objective optimization for clustering
selection improved the overall quality of clustering results,
however ensemble methods might produce noisy
clustering with a higher evaluation score. Noisy clusterings are
hard to measure with current evaluation metrics, therefore
it might be beneficial to include an unsupervised score in
the results. In further research we would like to
examine the process of selecting optimal objectives for each
dataset.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgement</title>
      <p>The research reported in this paper was supported by the
Czech Science Foundation (GACˇ R) grant 13-17187S.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Aggarwal</surname>
            ,
            <given-names>C. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reddy</surname>
            ,
            <given-names>C. K.</given-names>
          </string-name>
          , (Eds.):
          <article-title>Data clustering: algorithms and applications</article-title>
          . CRC Press,
          <year>2014</year>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Akaike</surname>
            ,
            <given-names>H.:</given-names>
          </string-name>
          <article-title>A new look at the statistical model identification</article-title>
          .
          <source>IEEE Transactions on Automatic Control</source>
          <volume>19</volume>
          (
          <issue>6 December 1974</issue>
          ),
          <fpage>716</fpage>
          -
          <lpage>723</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Bartonˇ</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kordík</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Evaluation of relative indexes for multi-objective clustering</article-title>
          .
          <source>In: Hybrid Artificial Intelligent Systems</source>
          , E. Onieva,
          <string-name>
            <given-names>I.</given-names>
            <surname>Santos</surname>
          </string-name>
          , E. Osaba,
          <string-name>
            <given-names>H.</given-names>
            <surname>Quintián</surname>
          </string-name>
          , and E. Corchado, (Eds.), vol.
          <volume>9121</volume>
          of Lecture Notes in Computer Science. Springer International Publishing,
          <year>2015</year>
          ,
          <fpage>465</fpage>
          -
          <lpage>476</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Boulis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ostendorf</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Combining multiple clustering systems</article-title>
          . In: PKDD (
          <year>2004</year>
          ),
          <string-name>
            <given-names>J.-F.</given-names>
            <surname>Boulicaut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Esposito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Giannotti</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Pedreschi</surname>
          </string-name>
          , (Eds.), vol.
          <volume>3202</volume>
          of Lecture Notes in Computer Science, Springer,
          <fpage>63</fpage>
          -
          <lpage>74</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5] Calin´ski, T.,
          <string-name>
            <surname>Harabasz</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A dendrite method for cluster analysis</article-title>
          .
          <source>Communications in Statistics-theory and Methods</source>
          <volume>3</volume>
          (
          <issue>1</issue>
          ) (
          <year>1974</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Caruana</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elhawary</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Meta clustering</article-title>
          .
          <source>In: Proceedings of the Sixth International Conference on Data Mining</source>
          (Washington, DC, USA,
          <year>2006</year>
          ), ICDM'06, IEEE Computer Society,
          <fpage>107</fpage>
          -
          <lpage>118</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Corne</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jerram</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knowles</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oates</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>PESA-II: region-based selection in evolutionary multiobjective optimization</article-title>
          .
          <source>In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001)</source>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Davies</surname>
            ,
            <given-names>D. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bouldin</surname>
            ,
            <given-names>D. W.:</given-names>
          </string-name>
          <article-title>A cluster separation measure</article-title>
          .
          <source>IEEE Trans. Pattern Anal. Mach. Intell</source>
          .
          <volume>1</volume>
          (
          <issue>2</issue>
          ) (
          <year>1979</year>
          ),
          <fpage>224</fpage>
          -
          <lpage>227</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Deb</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pratap</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meyarivan</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>A fast and elitist multiobjective genetic algorithm: NSGA-II</article-title>
          .
          <source>IEEE Transactions on Evolutionary Computation</source>
          <volume>6</volume>
          (
          <issue>2</issue>
          ) (
          <year>2002</year>
          ),
          <fpage>182</fpage>
          -
          <lpage>197</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Dudoit</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fridlyand</surname>
          </string-name>
          , J.:
          <article-title>Bagging to improve the accuracy of a clustering procedure</article-title>
          .
          <source>Bioinformatics</source>
          <volume>19</volume>
          (
          <issue>9</issue>
          ) (
          <year>2003</year>
          ),
          <fpage>1090</fpage>
          -
          <lpage>1099</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Dunn</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          :
          <article-title>Well-separated clusters and optimal fuzzy partitions</article-title>
          .
          <source>Journal of Cybernetics</source>
          <volume>4</volume>
          (
          <issue>1</issue>
          ) (
          <year>1974</year>
          ),
          <fpage>95</fpage>
          -
          <lpage>104</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Faceli</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Souto</surname>
          </string-name>
          , M. C. P.,
          <string-name>
            <surname>de Araujo</surname>
          </string-name>
          , D. S. A.,
          <string-name>
            <surname>de Carvalho</surname>
          </string-name>
          , A.
          <string-name>
            <surname>C. P. L. F.</surname>
          </string-name>
          <article-title>: Multi-objective clustering ensemble for gene expression data analysis</article-title>
          .
          <source>Neurocomputing</source>
          <volume>72</volume>
          (
          <fpage>13</fpage>
          -
          <lpage>15</lpage>
          ) (
          <year>2009</year>
          ),
          <fpage>2763</fpage>
          -
          <lpage>2774</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Fern</surname>
            ,
            <given-names>X. Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brodley</surname>
            ,
            <given-names>C. E.</given-names>
          </string-name>
          :
          <article-title>Random projection for high dimensional data clustering: a cluster ensemble approach</article-title>
          . In: ICML (
          <year>2003</year>
          ),
          <string-name>
            <given-names>T.</given-names>
            <surname>Fawcett</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Mishra</surname>
          </string-name>
          , (Eds.), AAAI Press,
          <fpage>186</fpage>
          -
          <lpage>193</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Fred</surname>
            ,
            <given-names>A. L. N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>A. K.</given-names>
          </string-name>
          :
          <article-title>Combining multiple clusterings using evidence accumulation</article-title>
          .
          <source>IEEE Trans. Pattern Anal. Mach. Intell</source>
          .
          <volume>27</volume>
          (
          <issue>6</issue>
          ) (
          <year>2005</year>
          ),
          <fpage>835</fpage>
          -
          <lpage>850</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Gionis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannila</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsaparas</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Clustering aggregation</article-title>
          .
          <source>TKDD</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ) (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Gullo</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Domeniconi</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tagarelli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Projective clustering ensembles</article-title>
          .
          <source>Data Min. Knowl. Discov</source>
          .
          <volume>26</volume>
          (
          <issue>3</issue>
          ) (
          <year>2013</year>
          ),
          <fpage>452</fpage>
          -
          <lpage>511</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Halkidi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vazirgiannis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Batistakis</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Quality scheme assessment in the clustering process</article-title>
          .
          <source>In: PKDD</source>
          (
          <year>2000</year>
          ),
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Zighed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Komorowski</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Zytkow</surname>
          </string-name>
          , (Eds.), vol.
          <source>1910 of Lecture Notes in Computer Science</source>
          , Springer,
          <fpage>265</fpage>
          -
          <lpage>276</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Handl</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knowles</surname>
          </string-name>
          , J.:
          <article-title>Evolutionary multiobjective clustering. Parallel Problem Solving from Nature -</article-title>
          PPSN
          <string-name>
            <surname>VIII</surname>
          </string-name>
          (
          <year>2004</year>
          ),
          <fpage>1081</fpage>
          -
          <lpage>1091</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Handl</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knowles</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>An evolutionary approach to multiobjective clustering</article-title>
          .
          <source>Evolutionary Computation, IEEE Transactions on 11 (1)</source>
          (
          <year>2007</year>
          ),
          <fpage>56</fpage>
          -
          <lpage>76</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Hastie</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tibshirani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corporation</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>The elements of statistical learning</article-title>
          . Springer, Dordrecht,
          <year>2009</year>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Hubert</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levin</surname>
          </string-name>
          , J.:
          <article-title>A general statistical framework for assessing categorical clustering in free recall</article-title>
          .
          <source>Psychological Bulletin</source>
          <volume>83</volume>
          (
          <issue>6</issue>
          ) (
          <year>1976</year>
          ),
          <fpage>1072</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Iam-on</surname>
          </string-name>
          , N.,
          <string-name>
            <surname>Boongoen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Improved link-based cluster ensembles</article-title>
          .
          <source>In: IJCNN</source>
          (
          <year>2012</year>
          ), IEEE,
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>A. K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Law</surname>
            ,
            <given-names>M. H. C.</given-names>
          </string-name>
          :
          <article-title>Data clustering: a user's dilemma</article-title>
          . In: PReMI (
          <year>2005</year>
          ),
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bandyopadhyay</surname>
          </string-name>
          , and
          <string-name>
            <surname>S. Biswas</surname>
          </string-name>
          , (Eds.), vol.
          <volume>3776</volume>
          of Lecture Notes in Computer Science, Springer,
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Karypis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
          </string-name>
          , V.:
          <article-title>A fast and high quality multilevel scheme for partitioning irregular graphs</article-title>
          .
          <source>SIAM J. Sci. Comput</source>
          .
          <volume>20</volume>
          (
          <year>December 1998</year>
          ),
          <fpage>359</fpage>
          -
          <lpage>392</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Kittler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hatef</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duin</surname>
            ,
            <given-names>R. P. W.:</given-names>
          </string-name>
          <article-title>Combining classifiers</article-title>
          .
          <source>In: Proceedings of the Sixth International Conference on Pattern Recognition (Silver Spring</source>
          , MD,
          <year>1996</year>
          ), IEEE Computer Society Press,
          <fpage>897</fpage>
          -
          <lpage>901</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>MacQueen</surname>
            ,
            <given-names>J. B.:</given-names>
          </string-name>
          <article-title>Some methods for classification and analysis of multivariate observations</article-title>
          .
          <source>In: Proc. of the Fifth Berkeley Symposium on Mathematical Statistics and Probability</source>
          (
          <year>1967</year>
          ),
          <string-name>
            <given-names>L. M. L.</given-names>
            <surname>Cam</surname>
          </string-name>
          and
          <string-name>
            <surname>J. Neyman</surname>
          </string-name>
          , (Eds.), vol.
          <volume>1</volume>
          , University of California Press,
          <fpage>281</fpage>
          -
          <lpage>297</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Mimaroglu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erdil</surname>
          </string-name>
          , E.:
          <article-title>Obtaining better quality final clustering by merging a collection of clusterings</article-title>
          .
          <source>Bioinformatics</source>
          <volume>26</volume>
          (
          <issue>20</issue>
          ) (
          <year>2010</year>
          ),
          <fpage>2645</fpage>
          -
          <lpage>2646</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caruana</surname>
          </string-name>
          , R.:
          <article-title>Consensus clusterings</article-title>
          . In: ICDM (
          <year>2007</year>
          ), IEEE Computer Society,
          <fpage>607</fpage>
          -
          <lpage>612</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Rezaee</surname>
            ,
            <given-names>M. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lelieveldt</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reiber</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A new cluster validity index for the fuzzy c-mean</article-title>
          .
          <source>Pattern Recognition Letters</source>
          <volume>19</volume>
          (
          <issue>3-4</issue>
          ) (
          <year>1998</year>
          ),
          <fpage>237</fpage>
          -
          <lpage>246</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Salvador</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms</article-title>
          . In: ICTAI (
          <year>2004</year>
          ), IEEE Computer Society,
          <fpage>576</fpage>
          -
          <lpage>584</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Schwarz</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Estimating the dimension of a model</article-title>
          .
          <source>Annals of Statistics</source>
          <volume>6</volume>
          (
          <year>1978</year>
          ),
          <fpage>461</fpage>
          -
          <lpage>464</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Strehl</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghosh</surname>
          </string-name>
          , J.:
          <article-title>Cluster ensembles - a knowledge reuse framework for combining multiple partitions</article-title>
          .
          <source>Journal on Machine Learning Research (JMLR) 3 (December</source>
          <year>2002</year>
          ),
          <fpage>583</fpage>
          -
          <lpage>617</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Topchy</surname>
            ,
            <given-names>A. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>A. K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Punch</surname>
            ,
            <given-names>W. F.</given-names>
          </string-name>
          :
          <article-title>Clustering ensembles: models of consensus and weak partitions</article-title>
          .
          <source>IEEE Trans. Pattern Anal. Mach. Intell</source>
          .
          <volume>27</volume>
          (
          <issue>12</issue>
          ) (
          <year>2005</year>
          ),
          <fpage>1866</fpage>
          -
          <lpage>1881</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Tumer</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agogino</surname>
            ,
            <given-names>A. K.</given-names>
          </string-name>
          :
          <article-title>Ensemble clustering with voting active clusters</article-title>
          .
          <source>Pattern Recognition Letters</source>
          <volume>29</volume>
          (
          <issue>14</issue>
          ) (
          <year>2008</year>
          ),
          <fpage>1947</fpage>
          -
          <lpage>1953</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>Ultsch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mörchen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>ESOM-Maps: tools for clustering, visualization, and classification with Emergent SOM</article-title>
          .
          <source>Technical Report No. 46</source>
          , Dept. of Mathematics and Computer Science, University of Marburg, Germany (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>Wolpert</surname>
            ,
            <given-names>D. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Macready</surname>
          </string-name>
          , W. G.:
          <article-title>No free lunch theorem for optimization</article-title>
          .
          <source>IEEE Transactions on Evolutionary Computation</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ) (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wong</surname>
          </string-name>
          , H. -S.,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H. -Q.</given-names>
          </string-name>
          :
          <article-title>Graph-based consensus clustering for class discovery from gene expression data</article-title>
          .
          <source>Bioinformatics</source>
          <volume>23</volume>
          (
          <issue>21</issue>
          ) (
          <year>2007</year>
          ),
          <fpage>2888</fpage>
          -
          <lpage>2896</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>