<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Fuzzy Cluster Validity with Generalized Silhouettes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mohammad Rawashdeh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anca Ralescu</string-name>
          <email>Anca.Ralescu@uc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computing Sciences and Informatics University of Cincinnati Cincinnati</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A review of some popular fuzzy cluster validity indices is given. An index that is based on the generalization of silhouettes to fuzzy partitions is compared with the reviewed indices in conjunction with fuzzy c-means clustering.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Prevalent in many applications,
        <xref ref-type="bibr" rid="ref13">(Jain, Murty, &amp; Flynn
1999)</xref>
        , the problem of clustering involves design decisions
about representation (i.e. set of features), similarity
measure, criterion and mechanism of a clustering
algorithm. The clustering literature is very rich in various
schemes that address these ingredients
        <xref ref-type="bibr" rid="ref13 ref21">(Jain, Murty, &amp;
Flynn 1999; Xu &amp; Wunsch 2005)</xref>
        . However, the problem
itself is centered on the intuitive and easily stated goal of
partitioning a set of objects into groups such that objects
within one group are similar to each other and dissimilar
to objects in other groups, which has become a common
description of clustering
        <xref ref-type="bibr" rid="ref12 ref13 ref2 ref21">(Jain, Murty, &amp; Flynn 1999;
Berkhin 2002; Kleinberg 2002; Xu &amp; Wunsch 2005)</xref>
        . As
opposed to classification, only few of the existing
clustering algorithms are widely used. Indeed, clustering is
less appreciated among practitioners of data analysis due
to the lack of class labels. Labels are used in the evaluation
of loss functions, formal assessments of the goal. This has
encouraged researchers to treat clustering in a
semisupervised manner by incorporating as much information
as available such as in must-link and cannot-link
constraints
        <xref ref-type="bibr" rid="ref7">(Blienko, Basu, &amp; Mooney 2004)</xref>
        in order to
achieve satisfactory results. Usually, there is an end-goal
to clustering of a dataset or an end-use of the final
clustering. For example, clustering of documents by topic,
clustering of images by common content and clustering of
proteins by function have as respective end goal a better
understanding of a corpus of documents, or of one or more
proteins. This suggests that a better treatment for clustering
should be in the context of end-use rather than in an
application-independent mathematical manner
        <xref ref-type="bibr" rid="ref11">(Guyon,
von Luxburg, &amp; Williamson 2009)</xref>
        . Accordingly, the
unknown desired clustering is the only ground truth
assumed about the problem. The properties of the
similarity measure sufficient to cluster well, that is, to
achieve low error with respect to the ground-truth
clustering, are given in
        <xref ref-type="bibr" rid="ref1">(Balcan, Blum, &amp; Vempala 2008)</xref>
        .
The features, the measure and the algorithm all should be
chosen in the context of the end-use. For example, it would
be unwise to employ a measure that pairs two images
because they show the same person while a clustering by
facial expression is desired. This applies to the set of
features as well; the features should accommodate for the
different possible expressions. In the absence of end-use,
clustering becomes an exploratory approach to data
analysis, looking for the right ingredients to get the best
structure.
      </p>
      <p>
        The c-means, alternatively k-means
        <xref ref-type="bibr" rid="ref14">(MacQueen 1967)</xref>
        , is
one popular clustering algorithm that partitions a set of data
points { } into disjoint subsets
{ }. The exclusive cluster assignment
characterizes hard clustering and hence it is also referred
by hard c-means (HCM). Fuzzy c-means (FCM) family of
algorithms imposes relaxed constraints on cluster
assignment by allowing nonexclusive but partial
memberships, thereby, modeling cluster overlapping. The
first FCM algorithm was proposed in
        <xref ref-type="bibr" rid="ref9">(Dunn 1973)</xref>
        . Its
convergence was later improved in
        <xref ref-type="bibr" rid="ref4">(Bezdek 1981)</xref>
        . Both
schemes, crisp and fuzzy, optimize a variance-criterion
with respect to cluster center and point membership for the
specified cluster number. The final clustering is given by a
membership matrix, [ ]; is the membership of
in . When assumes values in { } or [ ], the
matrix characterizes crisp or fuzzy partitions respectively.
      </p>
      <p>It is common to define the pairwise
similaritiesdissimilarities in terms of distances which, in turn, give a
structure i.e. the dataset underlying structure. The
clustering algorithm, by processing the pairwise distances
implicitly or explicitly, produces a structure, a partition. Its
success is determined by the extent to which the produced
partition aligns with the underlying structure, or more
precisely, agrees with the pairwise distances. Supplying
inconsistent values for , forces the algorithm either to
separate similar points or to group dissimilar points in the
same cluster. Hence the issue of cluster number is crucial
and largely affects clustering quality. Even by choosing
features and a measure consistent with the end-use, the
inherent number of clusters might be unknown. For
example, in a topic-driven clustering application, terms that
are significant to each possible topic or common theme
might be universally known, but the number of topics
represented by documents in a particular dataset is
unknown. Even if the cluster number is guessed correctly,
there is the unfortunate possibility of obtaining a
suboptimal clustering due to local optimum convergence.
Besides, clustering of different types, crisp versus fuzzy,
can be obtained on the same dataset. For “this and that
kind” of reasons, cluster analysis is incomplete without the
assessment of clustering quality. The issues of cluster
number and quality are the main concerns of cluster
validity.</p>
      <p>This study reviews some fuzzy cluster validity indices
then presents a generalization of silhouettes to fuzzy
partitions. The performance of all reviewed indices is
compared, with discussion, using two different datasets.</p>
    </sec>
    <sec id="sec-2">
      <title>Fuzzy c-Means Algorithm</title>
      <p>
        FCM, described in
        <xref ref-type="bibr" rid="ref6">(Bezdek, Ehrlich, &amp; Full 1984)</xref>
        ,
incorporates fuzzy membership values in its
variancebased criterion as
where is the center of cluster . The clustering
mechanism is carried as a Picard iterative process that
alternates the application of
and
The update rules are derived from the necessary conditions
of (1) constrained by
FCM output ranges from crisp partitions, as produced by
HCM, to the fuzziest possible partition for the specified
number of clusters i.e. [ ]. Informally
speaking, there are two sources for fuzziness in a produced
partition. First is the amount of overlapping in the
underlying structure; equation (3) assigns each point
(1)
(2)
(3)
(4)
(5)
almost the same membership to overlapping clusters whose
centers are within small proximity. Second is the exponent;
the ratios in (3) become compressed around the value 1
when is too high, thereby, weakening the role of
‘geometry’ as a key factor in shaping membership values.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Cluster Validity</title>
      <p>
        Modeling the pairwise similarities-dissimilarities by a
distance measure restates the goal of clustering as the
search for optimally compact and separated clusters. One
cluster is compact only if its member points are within
small proximity from each other. Two clusters are well
separated only if their member points are distant from each
other. Accordingly, the variance-based criterion found in
cmeans can be thought of as a measure of compactness,
which was shown to be equivalent to a measure of
separation for the same number of clusters
        <xref ref-type="bibr" rid="ref23">(Zhao &amp;
Karypis 2001)</xref>
        . Hence, c-means is capable of producing
partitions that are optimally compact and well separated,
for the specified number of clusters. Note that better
clustering might still be achieved by specifying different
cluster numbers. Since clustering algorithms are supposed
to optimize their output in compactness and separation,
both should be assessed to find clustering quality.
One might need to distinguish between the desired
structure, the underlying structure, and candidate
structures produced by clustering algorithms. The desired
structure is the ground truth clustering, mentioned earlier.
What is known about this clustering might be vague or
incomplete but it should drive the problem design. The
underlying structure is the one shaped by the pairwise
distances which suggests unique clustering (Fig. 1a),
multiple clusterings (Fig. 1b) or no clustering due to the
lack of any structure (Fig. 1c). A clustering algorithm
produces different partitions for different configurations i.e.
distance measure, parameters, etc. The best case scenario is
when the pairwise distances structure the points into the
desired grouping and an algorithm successfully produces a
clustering that aligns with the underlying structure.
Validating a produced clustering with respect to the
underlying structure is possible by means of cluster validity
indices.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Partition Coefficient</title>
      <p>
        The partition coefficient, PC, is defined as the Frobenius
norm of the membership matrix, divided by the number of
points, as in
∑ ∑
(6)
The coefficient is bounded by and 1, if applied to FCM
output. Its use as a measure of partition fuzziness was first
investigated by Bezdek in his Ph.D. dissertation
        <xref ref-type="bibr" rid="ref3">(Bezdek
1973)</xref>
        . Although it can be used as a validity index with
some success, it has been shown to be irrelevant to the
problem of cluster validity
        <xref ref-type="bibr" rid="ref19">(Trauwaert 1988)</xref>
        . Clearly, the
coefficient does not incorporate the pairwise distances that
are necessary to the assessment of compactness and
separation. Therefore, it is not reliable for the validation of
any given partition, for example, one produced by random
cluster assignment. Also, the coefficient assumes its upper
value on any crisp partition, regardless of its clustering
quality. Nevertheless, the coefficient does what it knows
best, measuring fuzziness.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Xie-Beni Index</title>
      <p>
        An index that really measures compactness and separation
was proposed by Xie and Beni, XB index
        <xref ref-type="bibr" rid="ref20">(Xie &amp; Beni
1991)</xref>
        . XB takes the form of a ratio; the minimum
centerto-center distance appears in its denominator and , as
exactly as in FCM, is in its numerator but divided by .
Hence, XB is a measure of compactness divided by a
measure of separation, given by
∑ ∑
{
}
(7)
In (7), { } denotes the set of cluster
centers. The authors define the variation of each cluster as
the sum of point fuzzy deviations, squared. With respect to
a cluster, a point deviation is its distance from the cluster
center weighted by its membership. The total variation is
the sum of all cluster variations that gives the compactness
of the partition, when divided by . This description
explains why memberships and distances in (7) are
squared. However, they suggest substituting in place of
where is the same as the value used in FCM, justified
by making the index ‘compatible’ with FCM. The final
value of (1) can be directly plugged in (7), provided it is
still available, or else recomputed. It is unclear how being
compatible with FCM or raising membership values to
powers different than 2 relates to the assessment of
compactness and separation, or to the ‘geometry’
underlying the data.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Fuzzy Hypervolume</title>
      <p>
        The development of the index started as part of work that
formulates clustering as a problem of maximum likelihood
estimation (MLE) of a mixture of multivariate densities
        <xref ref-type="bibr" rid="ref22">(Wolfe 1970)</xref>
        ; the dataset is assumed to be drawn from
such a mixture. Bezdek and Dunn, in
        <xref ref-type="bibr" rid="ref5">(Bezdek &amp; Dunn
1975)</xref>
        , give the MLE algorithm and FCM as well. The
MLE algorithm solves for a composite parameter vector of
densities’ means, covariance matrices and the a priori
probabilities. They describe the use of FCM to approximate
the ML parameters. Substituting FCM-generated
membership values for posterior probabilities computes the
remaining ML parameters. A justification is given by
comparing the behavior of two update rules in both
algorithms. They produce small values when evaluated on
data points that are distant from some density-cluster center
relative to their distance from nearest center. However,
they point out the fact that both algorithms compute
different centers and distances.
      </p>
      <p>
        Gath and Geva in
        <xref ref-type="bibr" rid="ref10">(Gath &amp; Geva 1989)</xref>
        , first, give a similar
description of FCM, and fuzzy MLE that is derived from
FCM, as opposed to the separate treatment given by
Bezdek and Dunn. Then, they suggest a 2-stage clustering
scheme, FCM followed by MLE, justified by the unstable
behavior of MLE as a standalone clustering scheme. As
part of their work, some validity measures were proposed;
among them is the fuzzy hypervolume measure (FHV). FHV
is defined in terms of the covariance matrix determinants.
The covariance matrix of cluster can be constructed
using
∑
(
∑
) (
)
The hypervolume is then computed by
∑
[
]
The determinants are functions of cluster spreads and point
memberships. A clustering that is evaluated the smallest is
assumed to be optimal. However, the following
observations can be made:
 According to the authors, the index is sensitive to
substantial overlapping in the dataset.
(8)
(9)
 It is unclear how the measure accounts for compactness
and separation.
 Assuming that an MLE mixture has been successfully
found, is it the best clustering in compactness and
separation?
 Is the measure applicable to crisp partitions?
 The use of FCM as MLE requires setting m=2; how
does the measure performs on partitions obtained using
m different than 2?
      </p>
    </sec>
    <sec id="sec-7">
      <title>Pakhira-Bandyopadhyay-Maulik Index</title>
      <p>
        Pakhira et el. proposed an index, referred here as PBM,
that targets both fuzzy and crisp partitions
        <xref ref-type="bibr" rid="ref15">(Pakhira,
Bandyopadhyay, &amp; Maulik 2004)</xref>
        . Its fuzzy version is
defined as
(
∑
(
)
      </p>
      <p>{
∑ ∑
(
}
)
)
In (10), is the center of the whole dataset. The index can
be factorized into a measure of compactness
∑
∑
(
)
(10)
(11)
(12)
(13)
is a measure of the set clustering quality. To illustrate
silhouette construction, consider for the data point , the
cluster to which has been assigned, , and let be any
cluster different than . The silhouette is
defined in terms of a measure of compactness and a
measure of separation . The average distance of to
points in computes while is the minimum average
distance from to all other clusters. Let denotes the
cluster corresponding to (see Fig. 2).
Clearly, (14) evaluates to values in [ ]. The average
silhouette over a cluster or the whole dataset are given
respectively by
∑
{</p>
      <p>}
∑
∑
∑
a measure of separation
and an artificial factor, irrelevant to compactness and
separation,
{</p>
      <p>}
∑
(
)
The whole term in (10) is raised to power two that is also
irrelevant to the assessment of clustering quality. Larger
values of the index are assumed to indicate better
clustering. It can be noticed though that the quantity in
(12) does not necessarily capture the separation between
all pairs of clusters; an authentic separation measure
should account for the poor separation found in partitions
into large number of clusters. .</p>
    </sec>
    <sec id="sec-8">
      <title>Average Silhouette Index</title>
      <p>
        Rousseeuw proposed an index for the validation of crisp
partitions
        <xref ref-type="bibr" rid="ref17">(Rousseeuw 1987)</xref>
        . It is based on the notion of
silhouette. A silhouette, constructed for each data point,
measures the clustering quality for that point. The average
over members of the whole dataset or an individual cluster
Note that the membership values in (15) are crisp and
denotes the cluster, as a set.
      </p>
      <p>From the data point perspective, the measure assumes
positive values if the separation distance is larger than the
compactness distance and negative values if vice versa. A
value near zero indicates that the point is at clusters
boundary region, of course in the context of clustering on
hands. At the coarser cluster level, the average silhouette
indicates weak structure if near zero, strong if near +1 and
misclustering if near -1. Since a clustering algorithm
cannot do any better than the underlying structure, an
average close to +1 is attainable only in the presence of a
strong structure.</p>
      <p>The following appealing properties recommend the
silhouette index:
(14)
(15)
(16)
 As opposed to other indices, it validates a given
clustering at point level, providing thus the finest
granularity.
 It is algorithm-independent.
 It takes as input only the pairwise
similaritiesdissimilarities and the membership matrix.
 As explained in the original work of Rousseeuw, it can
be used to ‘visualize’ the clustering quality of a given
partition.
 Its assessment of compactness and separation conforms
literally to the stated goal of clustering; a relatively
small compared to means that has been
successfully grouped with its similar points in the same
cluster in a way that separates from its dissimilar points.</p>
    </sec>
    <sec id="sec-9">
      <title>Extended Average Silhouette Index</title>
      <p>
        The above construction of silhouettes is not directly
applicable to fuzzy partitions since it requires crisp cluster
boundaries, necessary to the computation of cluster
average distances. Nevertheless, a fuzzy partition might be
validated by silhouettes after being defuzzified, for
example, by setting the maximum membership degree of
each point to one and nullifying the rest. However, this
discards cluster overlapping, defeating the reason of using
FCM not HCM. An extension that integrates fuzzy values
with silhouettes, computed from the defuzzified partition,
into an average silhouette-based index was proposed in
        <xref ref-type="bibr" rid="ref8">(Campello &amp; Hruschka 2006)</xref>
        . They suggest computing a
weighted mean in which each silhouette is weighted by the
difference in the two highest fuzzy membership values of
the associated point. More precisely, if and
denote cluster indices with the two highest membership
values associated with then the index is given by
∑ (
∑ (
)
)
(17)
Therefore, points around cluster centers become significant
to the computation of the index since they have higher
weights, as opposed to the insignificant points found in
overlapping regions. Clearly, such an assessment is not
thorough since it tends to ignore the clustering of points in
overlapping regions.
      </p>
    </sec>
    <sec id="sec-10">
      <title>Generalized Intra-Inter Silhouette Index</title>
      <p>A generalization of silhouettes to fuzzy partitions is given
in (Rawashdeh &amp; Ralescu), based on the following central
observations:
 A partition of a set of points into any number of clusters
is essentially a clustering of the associated pairwise
distances into intra-distances (within-cluster) and
interdistances (between-cluster).
 A strong structure, a good clustering, has small
intradistances and large inter-distances i.e. similar points are
grouped together and dissimilar points are separated.
 In the context of a crisp partition, each distance is either
intra-distance or inter-distance. This is modeled by
intrainter scores associated to a distance that assume the
values 0 and 1, indicating distance membership.
 In the context of a fuzzy partition, two points belong to
each cluster simultaneously and separately with some
degree, intuitively suggesting the assignment of fuzzy
intra-inter scores to the pairwise distances,
The original construction of silhouettes, which already
incorporates the pairwise distances, is reformulated to
employ intra-inter scores. The following is applicable to
both crisp and fuzzy partitions, and it carries similar
computation as in the original construction, provided that
the partition is crisp. As input, the construction requires the
pairwise distances and the membership matrix .
Step 1. Given a partition into clusters, each distance ,
associated with and , is intra-distance with respect to
either cluster and inter-distance with respect to any of the
2-combinations of clusters. The following constructs all
of the intra-inter matrices:
,
(18)
(19)
}
(20)
}
(21)
Step 2. With respect to each point , weighted means
over the associated distances are computed, using
intrainter scores as weights; from which the compactness
distance and the separation distances are selected.
That is
Step 3. The silhouette of each point is found using (14).
Similar to the original average index, the average
intrainter silhouette, gSil, over members of the whole dataset is
an assessment of its clustering quality. For each fuzzy
cluster, a weighted mean using point membership values as
weights, is a measure of its clustering quality.</p>
    </sec>
    <sec id="sec-11">
      <title>Experiments and Discussion</title>
      <p>For further evaluation of the validity indices presented
above, a few concrete examples are considered as follows:</p>
    </sec>
    <sec id="sec-12">
      <title>Example 1.</title>
      <p>Clustering algorithms rely on pairwise distances to form
their output and this should be taken into consideration
when testing any proposed validity index. Consider the
dataset given in Fig. 3. It is tempting to claim that is
the optimal number of clusters, however, this requires a
similarity measure better to the task, than the Euclidean
distance.</p>
      <p>Although HCM, using the Euclidean distance, successfully
detects the two clusters, XB, PBM, Sil, eSil and gSil all
score better than due to better compactness
(Fig. 4). Only FHV gives a better score, since it is
based on fitting the data with a mixture of Gaussians. A
single bi-Gaussian fits the overlapping clusters in Fig. 4b
better than two Gaussians, assuming the crisp
probabilitymembership values produced by HCM.</p>
    </sec>
    <sec id="sec-13">
      <title>Example 2.</title>
      <p>Different FCM partitions, using , were obtained
on a dataset shown in Fig. 5.</p>
      <p>Fig. 6 shows the performance of PC, Sil, eSil and gSil.
PBM, XB and FHV are shown in Figs. 7, 8 and 9
respectively.</p>
      <p>The extended index, eSil, scores the partition with
clusters (Fig. 5a) higher than the one with clusters
(Fig. 5b). A different ranking is suggested by the original
index, Sil, and the generalized index, gSil. Both Sil and eSil
incorporate the same silhouettes that are computed from the
defuzzified membership matrix; clearly, the disagreement
is caused by the weights in eSil. The points that occupy the
undetected middle cluster (Fig. 5a) are not assigned high
memberships to any of the three detected clusters; hence,
they have low weights. The index eSil just ignores these
points that are of insignificant weights and of
approximately zero silhouettes. For the same reason, eSil
always appears above the curve of Sil. The generalized
index gSil can be naturally applied to both crisp and fuzzy
partitions. It accounts for changes in the parameter and
does not require any defuzzification of partitions. It scores
highest the partition with clusters (Fig. 5c).
The PBM index evaluates the clustering in Fig. 5d as the
best. The separation measure, maximum center-to-center
distance, does not decrease with even after reaching the
cluster number that is sufficient for a good clustering of the
dataset. In addition, the compactness measure decreases
monotonically with , provided a reasonable clustering
algorithm is used. Therefore, PBM has a nondecreasing
behavior with that can be easily exposed using small toy
datasets. Moreover, it is not recommended to use any factor
that is irrelevant to the assessment of compactness and
separation, as part of a validity index.</p>
      <p>The XB index also fails in its ranking; it scores better
than . The separation measure, minimum
center-tocenter distance, does not account for the spread of detected
clusters: in Fig. 5a, the centers are well separated but there
is overlapping among the clusters in the middle region. The
separation measure is not thorough in its assessment as
opposed to silhouette-based indices that make assessments
at point level. Therefore, XB is not reliable to detect good
cluster numbers, and to compare between any two
partitions, in general; it is in disagreement with the
silhouette-based indices in its scoring of the partitions with
and .</p>
      <p>Distance-based similarities and dissimilarities are
inferred from how distance values compare with each
other, not from distance magnitudes. Hence, the strength of
the underlying structure is determined by distance values
relative to each other. Since the quotient in (14) is just a
difference in compactness and separation relative to their
maximum, an average over the whole dataset measures the
strength of a given clustering. Values close to +1, obtained
from the average silhouette index, indicate good clustering
and a strong underlying structure as well. It is worth noting
that, silhouette-based indices are also scale-invariant that
is, scaling the dataset by some factor, multiplying by 100
for example, does not affect their values since the structure
is still the same. This is not the case for FHV and PBM.
Hence, silhouette-based indices are easier to interpret.</p>
    </sec>
    <sec id="sec-14">
      <title>Conclusion</title>
      <p>A satisfactory or useful clustering requires careful
selection of the features and the measure which, combined
together, define the pairwise similarities-dissimilarities.
Clustering algorithms, probably of different models, by
varying model parameters, as in cluster number, produce
partitions, candidate structures. The job of a validity index
is to find the candidate that is best supported by the
pairwise similarities-dissimilarities, in other words, the
clustering that best aligns with the underlying structure.
FCM is used mainly to model cluster overlapping in
datasets, facilitated by partial cluster memberships
assigned to the points, which also results in points in the
same cluster taking different membership values. The
generalized silhouette index is applicable to both
approaches, crisp and fuzzy, of structure modeling to guide
the search for the best structure in the dataset.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Balcan</surname>
          </string-name>
          , M.-F.;
          <string-name>
            <surname>Blum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Vempala</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>A Discriminative Framework for Clustering via Similarity Functions</article-title>
          .
          <source>Proceedings of the 40th annual ACM symposium on Theory of Computing (STOC).</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Berkhin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2002</year>
          .
          <article-title>Survey of Clustering Data Mining Techniques</article-title>
          .
          <source>Technical report, Accrue Software</source>
          , San Jose, CA.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Bezdek</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          <year>1973</year>
          . Fuzzy Mathematics in Pattern Classification. Ph.
          <string-name>
            <given-names>D.</given-names>
            <surname>Diss</surname>
          </string-name>
          ., Cornell University.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Bezdek</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          <year>1981</year>
          .
          <article-title>Pattern Recognition with Fuzzy Objective Function Algorithms</article-title>
          . Plenum Press.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Bezdek</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Dunn</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          <year>1975</year>
          .
          <article-title>Optimal Fuzzy Partitions: A Heuristic for Estimating the Parameters in a Mixture of Normal Distributions</article-title>
          .
          <source>IEEE Transactions on Computers</source>
          <volume>24</volume>
          (
          <issue>4</issue>
          ):
          <fpage>835</fpage>
          -
          <lpage>838</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Bezdek</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ehrlich</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Full,
          <string-name>
            <surname>W.</surname>
          </string-name>
          <year>1984</year>
          .
          <article-title>FCM: the Fuzzy c-Means Clustering Algorithm</article-title>
          .
          <source>Computers and Geosciences</source>
          <volume>10</volume>
          :
          <fpage>191</fpage>
          -
          <lpage>203</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Blienko</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Basu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Mooney,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2004</year>
          .
          <article-title>Integrating Constraints and Metric Learning in Semisupervised Clustering</article-title>
          .
          <source>Proceedings of the 21st International Conference on Machine Learning</source>
          . Banff, Canada.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Campello</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hruschka</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <year>2006</year>
          .
          <article-title>A Fuzzy Extension of the Silhouette Width Criterion for Cluster Analysis</article-title>
          .
          <source>Fuzzy Sets and Systems</source>
          ,
          <volume>157</volume>
          :
          <fpage>2858</fpage>
          -
          <lpage>2875</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Dunn</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          <year>1973</year>
          .
          <article-title>A Fuzzy Relative of the ISODATA Process and its Use in Detecting Compact Well-Separated Clusters</article-title>
          .
          <source>J. Cybernet</source>
          <volume>3</volume>
          :
          <fpage>32</fpage>
          -
          <lpage>57</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Gath</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Geva</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>1989</year>
          .
          <article-title>Unsupervised Optimal Fuzzy Clustering</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>11</volume>
          (
          <issue>7</issue>
          ):
          <fpage>773</fpage>
          -
          <lpage>781</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Guyon</surname>
          </string-name>
          , I.; von Luxburg, U.;
          <string-name>
            <surname>Williamson</surname>
            ,
            <given-names>R. C.</given-names>
          </string-name>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Kleinberg</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2002</year>
          .
          <article-title>An Impossibility Theorem for Clustering</article-title>
          .
          <source>Proceedings of Advances in Neural Information Processing Systems</source>
          <volume>15</volume>
          :
          <fpage>463</fpage>
          -
          <lpage>470</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Murty</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Flynn</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>1999</year>
          .
          <article-title>Data Clustering: A Review</article-title>
          .
          <source>ACM Computing Surveys</source>
          ,
          <volume>31</volume>
          (
          <issue>3</issue>
          ),
          <fpage>264</fpage>
          -
          <lpage>323</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>MacQueen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>1967</year>
          .
          <article-title>Some Methods for Classification and Analysis of Multivariate Observations</article-title>
          .
          <source>Proceedings of the 5th Berkeley Symposium on Mathematics. Statistics and Probability</source>
          <volume>2</volume>
          :
          <fpage>281</fpage>
          -
          <lpage>297</lpage>
          . Berkeley, CA.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Pakhira</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Bandyopadhyay</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Maulik,
          <string-name>
            <surname>U.</surname>
          </string-name>
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <article-title>Validity Index for Crisp and Fuzzy Clusters</article-title>
          .
          <source>Pattern Recognition</source>
          <volume>37</volume>
          :
          <fpage>481</fpage>
          -
          <lpage>501</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Rousseeuw</surname>
            ,
            <given-names>P. J.</given-names>
          </string-name>
          <year>1987</year>
          .
          <article-title>Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>Computational and Applied Mathematics</source>
          <volume>20</volume>
          :
          <fpage>53</fpage>
          -
          <lpage>65</lpage>
          . NorthHolland.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Trauwaert</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <year>1988</year>
          .
          <article-title>On the Meaning of Dunn's Partition Coefficient for Fuzzy Clusters</article-title>
          .
          <source>Fuzzy Sets and Systems</source>
          <volume>25</volume>
          :
          <fpage>217</fpage>
          -
          <lpage>242</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Beni</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <year>1991</year>
          .
          <article-title>A Validity Measure for Fuzzy Clustering</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>3</volume>
          (
          <issue>8</issue>
          ):
          <fpage>841</fpage>
          -
          <lpage>846</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Wunsch,
          <string-name>
            <surname>D. I. I.</surname>
          </string-name>
          <year>2005</year>
          .
          <article-title>Survey of Clustering Algorithms</article-title>
          .
          <source>IEEE Transactions on Neural Networks</source>
          <volume>16</volume>
          (
          <issue>3</issue>
          ):
          <fpage>645</fpage>
          -
          <lpage>678</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Wolfe</surname>
            ,
            <given-names>J. H.</given-names>
          </string-name>
          <year>1970</year>
          .
          <article-title>Pattern Clustering by Multivariate Mixture Analysis</article-title>
          .
          <source>Multivariable Behavioral Research</source>
          , pp.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Karypis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <year>2001</year>
          .
          <article-title>Criterion Functions for Document Clustering: Experiments and Analysis</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>