<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Fast k-Nearest-Neighbor-Consistent Clustering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lars Lenssen</string-name>
          <email>lars.lenssen@tu-dortmund.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Niklas Strahmann</string-name>
          <email>niklas.strahmann@tu-dortmund.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Erich Schubert</string-name>
          <email>erich.schubert@tu-dortmund.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>k-Nearest-Neighbor Consistency, Cluster Analysis, Clustering-Quality Measure</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>LWDA'23: Lernen</institution>
          ,
          <addr-line>Wissen, Daten, Analysen</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>TU Dortmund University</institution>
          ,
          <addr-line>Informatik VIII, Dortmund, 44221</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>There are many ways to measure the quality of a clustering, both extrinsic (when labels are known) and intrinsic (when no labels are available). In this article, we focus on the k-Nearest-Neighbor Consistency measure, which considers a clustering as good if each object is within the same cluster as its nearest neighbors, and hence does not need labels. We propose a variant of the K-means clustering algorithm that uses the k-Nearest-Neighbor Consistency as a constraint while optimizing the sum-of-squares as in regular K-means, resulting in K-means clustering where the nearest neighbors are guaranteed to be in the same cluster. The new version provably yields the same results as the original consistency-preserving K-means algorithm of Ding and He, but needs fewer computation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        CEUR
Workshop
Proceedings
can get stuck in local optima [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. We can also optimize the Medoid Silhouette in a similar
fashion as an example where an evaluation criterion was turned into an eficient algorithm [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
But an undesired trivial clustering may score well for surprisingly many quality measures. For
example the within-cluster-sum-of-squares (WCSS) criterion used by K-means clustering is
optimal when every point is its own cluster. Hence we may need to add additional constraints
such as keeping the number of clusters fixed. In other cases, it is common to use one criterion
to find candidate solutions (e.g., optimizing WCSS with K-means), and then use a diferent
criterion for model selection.
      </p>
      <p>
        In this article, we are interested in a relatively unknown quality criterion called the
k-NearestNeighbor Consistency [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], which measures if the nearest neighbors of each point belong to the
same cluster. This is closely related to the central idea of clustering that similar objects should
be in the same cluster, but also to nearest-neighbor classification, as we would then predict
the correct class. Ding and He [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] also proposed two clustering algorithms to optimize for
this criterion. In this article, we propose a fast variant of the consistency-preserving K-means
algorithm, that provably yields the same results.
      </p>
      <p>We first introduce nearest-neighbor consistency in Section 2 and clustering with consistency
in Section 3. We then propose a faster algorithm in Section 4. We show experimental evidence
of the performance benefits in Section 5 and conclude the paper in Section 6.</p>
    </sec>
    <sec id="sec-3">
      <title>2. k-Nearest-Neighbor Consistency</title>
      <p>
        The k-Nearest-Neighbor Consistency is motivated by k-Nearest-Neighbor classification, but
it can also be used to evaluate clustering validity, as proposed by Ding and He [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In the
following, we only consider clustering that form a total partitioning of the data set, encoded
as a cluster label   for each sample  . Overlapping, hierarchical, or incomplete clustering (e.g.,
noise in DBSCAN) solutions are not considered. For the given samples  = { 1, … ,   }, and the
cluster labels  = { 1, … ,   }, the  NN(  ) are the  nearest neighbors of   . One single sample 
is considered to be  -nearest-neighbor consistent if all neighbors  ∈ 
NN(  ) have the same
cluster label as the sample  :
  ( , ) =
{
1 if ∀  ∈  NN(  ) ∶   =  
0 otherwise
      </p>
      <p>
        To measure the  NN consistency of a clustering where not all points are consistent, Ding
and He [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]also use a fractional  NN consistency
 ( , ) =

1 ∑=1   ( , ).
      </p>
      <p>
        To use the consistency as a quality measure, Handl and Knowles [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] developed a
connectivitybased measure out of the  NN consistency, which gives more emphasis to the nearest neighbors.
The definition of the  NN consistency can also be applied to other neighborhood concepts, for
example, k-Mutual-Nearest-Neighbors [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], which symmetrizes k-Nearest-Neighbors and is
denoted by  MN() :
      </p>
      <p>∈  MN(  ) ⟺   ∈  NN(  ) ∧   ∈  NN(  ).</p>
      <p>
        Algorithm 1: ENFORCE
// assign closed neighborhoods
Mutual nearest neighbors have the benefit of being a symmetric relation, but also being less
sensitive to outliers and more likely to contain multiple components. Intuitively, we remove all
unidirectional edges from the k-nearest-neighbor graph. Consider a data set where we have
one dense cluster and a far away outlier. The outlier’s neighbors will be points of the cluster,
but it will not be in the nearest neighbors of the cluster points. Both the  NN and the  MN are
often used in the context of spectral clustering, which in turn is related to DBSCAN [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ].
      </p>
      <p>
        Ding and He [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] aim at enforcing  NN consistency in their algorithms. We can interpret
this as a form of constrained clustering, where we add must-link constraints for neighbors.
Such constraints have been previously used, e.g., for model selection [14] and for integrating
external constraints into K-means [15, 16].
3. Clustering with Nearest Neighbor Consistency
The k-Nearest-Neighbor consistency was originally proposed by Ding and He [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] along with
two algorithms to create k-Nearest-Neighbor-Consistent clustering, ENFORCE and
consistencypreserving K-means (K-means-CP). The ENFORCE algorithm modifies an existing clustering,
which can be created by any cluster analysis method, to improve its consistency by reassigning
neighborhoods to the most common cluster. To achieve this, ENFORCE uses closed
neighborhoods. A set  ⊂  is a closed neighborhood if for all  ∈  holds  NN() ⊂  and for every
two elements   ,   ∈  there is a path of elements from  NN(  ) or  NN(  ). The definition
can be easily extended to any neighborhood function such as  MN. It is easy to see that all
points in a closed neighborhood must be in the same cluster for the clustering to be consistent
and that every point can only belong to one closed neighborhood. Figure 1 shows the closed
neighbor sets on an example data set using mutual nearest neighbors with  = 3 . To enforce a
 NN-consistent clustering, ENFORCE iterates through all sets of closed neighborhoods and
counts the respective cluster labels. Then it assigns the entire set to the label that occurs most
often. There is also an interesting parallel to DBSCAN clustering here, which computes a similar
transitive closure, but with a density-based notion of the neighborhood. DBSCAN clusters are
closed neighborhoods with respect to density reachability.
      </p>
      <p>
        In contrast to ENFORCE, consistency-preserving K-means (K-means-CP) does not work on an
existing clustering, but integrates consistency into the standard K-means procedure. K-means
uses the quadratic Euclidean distance, although there are other approaches, such as spherical
K-means [17, 18], they will not be considered here. First, initial cluster centers Μ = { 1, ...,   }
are chosen with any of the standard heuristics. Then an alternating optimization as in the
standard algorithm is performed. But in order to obtain a  NN-consistent result, K-means-CP,
like ENFORCE, always assigns entire closed neighborhoods to the same cluster. Thus, Ding and
He [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] define the nearest cluster center nearest(  ) of a closed neighborhood   by the sum of
squared euclidean distances
nearest(  ) = arg min ∑ ‖ −   ‖2 .
      </p>
      <p>∈ 
All points in the closed neighborhood   are then assigned to this cluster. The assignment step
is alternated with a recalculation of the cluster centers. A new cluster center   of the cluster  
is determined as usual in K-means using the arithmetic average of the assigned data
  =
1 ∑  .</p>
      <p>|  | ∈ 
These steps are repeated until no point is reassigned (and hence the cluster centers do not
change anymore). Convergence guarantees of the standard algorithm still apply. This algorithm
computes exactly  ⋅  distances in each step to determine the nearest clusters, and these
distance computations make up a major part of the algorithm’s run time. We can also optimize
the recomputation of the cluster centers by using an incremental computation, but this has
much less efect. In the following, we discuss why many of the distance computations performed
above are unnecessary, and we can hence improve the run time of this algorithm.</p>
      <p>Algorithm 2: K-means-CP
1  ← initialize  cluster;
2  ← calculate sets of closed neighborhoods;
3  ← dummy assignments to non-existent cluster;
4 repeat
5 foreach   ∈  = { 1, … ,  ℎ} do
6  ← null;
7 foreach   ∈  = { 1, … ,   } do
8 foreach   ∈   do
9   ←   + ‖  −   ‖2;
10  ← arg min  ;
11 foreach   ∈   do   ←  ;
12 if  is unchanged then break;
13  ← calculate cluster centers for  ;
14 return  ;
// determine nearest cluster
// assign closed neighborhoods
4. Fast k-Nearest-Neighbor-Consistent Clustering
The K-means-CP algorithm determines the nearest cluster of a closed neighborhood set by the
sum of the squared Euclidean distances of all elements of a closed neighborhood set to the
diferent cluster centers. Using the parallel axis theorem of König, Huygens and Steiner, we
can prove that instead of considering the distances of all samples in a closed neighborhood set
  , it is suficient to consider only the mean vector   ̄ = |1 | ∑∈   . We can also trivially use
the mean vector for updating the cluster centers, if we weight it by the number of points in
the closed neighborhood set. The same property has been previously used by Lee et al. [19] to
reduce uncertain UK-means clustering to regular K-means clustering. Our improved
Neighborconsistent K-Means (NCK-means) hence uses the mean vectors of the closed neighborhood sets.
This yields a faster variant of consistency-preserving K-means with a significantly lower run
time and the guaranteed same result.</p>
      <p>We will reference the set which contains all closed neighborhood sets by  . Because all
elements of a closed neighborhood set will be assigned to the same cluster, we will reference
the set which contains all closed neighborhood sets whose elements have cluster label  by   .</p>
      <sec id="sec-3-1">
        <title>4.1. Proof of correctness</title>
        <p>To prove the equivalence of the results of NCK-means and K-means-CP, we show that the
substeps of the algorithm always produce the same results. We first show that for given cluster
centers, NCK-means and K-means-CP assign the same cluster labels to a closed neighborhood
set. We start by considering how the original K-means-CP assigns cluster labels:
We can prove the above version of the parallel axis theorem as follows:
∑ | − | 2 = ∑ |( −  ) ̄− ( −  ) ̄ |2 = ∑ (( −  ) ̄ 2 − 2( −  ) (̄ −  ) ̄+ ( −  ) ̄ 2)
∈ ∈ ∈
= ∑( −  ) ̄ 2 − 2( −  ) ̄ ∑( −  ) ̄ + | −  | ̄ 2 = ∑ | −  | ̄ 2 +  | −  | ̄ 2
∈ ⏟∈⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ ∈</p>
        <p>=0
The classic parallel axis theorem is obtained for  = 0 .</p>
        <p>We now consider the second step, updating the cluster centers. We start with the original
calculation of the cluster centers   . K-means-CP calculates this via the mean of the elements.
and by using   ̄ = |1 | ∑  ∈    , we obtain</p>
        <p>This shows that for closed neighborhood sets with given cluster labels, NCK-means creates
the same cluster centers as K-means-CP. Because both steps of the algorithms produce the same
results, we have proven that the final result of the algorithms is the same.</p>
      </sec>
      <sec id="sec-3-2">
        <title>4.2. Expected run time improvement</title>
        <p>
          The run time improvement depends on the data reduction using closed neighbor sets. The
standard K-means algorithm has a run time of (  ) where  is the number of points,  is
the number of clusters,  is the dimensionality, and  is the number of iterations. Finding the
closed neighbor sets additionally takes ( 2) time (although indexes for similarity search may
yield a considerable speedup), and K-means-CP of Ding and He [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] hence has a complexity of
( 2 +  ) . NCK-means reduces this to ( 2 + || ) where || is the number of closed
neighbor sets, and we must assume || ∈ ( ) . In a worst-case asymptotic analysis, the
improvements hence are likely negligible because the cost of finding the closed neighbor sets
dominates, but in practice it usually ofers a decent run time improvement on the order of
||/ in the clustering step, at only the cost of little additional memory to store the means of
each closed neighbor set. Hence, there is no reason not to use it. As it is a best practice to run
K-means several times and keep the best outcome [20], the cost to find the closed neighbor sets
can also be amortized over multiple restarts.
        </p>
        <p>The choice of the neighborhood has an important efect on the run time. Finding the nearest
neighbors becomes more expensive with a larger neighborhood size, but at the same time
increasing the number of edges decreases the number of closed neighbor sets. For NCK-means,
a lower number is beneficial, and both algorithms also benefit from an often lower number
of iterations because reassignments are less likely to happen with the additional consistency
constraints.</p>
      </sec>
      <sec id="sec-3-3">
        <title>4.3. Further run-time improvements</title>
        <p>Because our proof shows that it is suficient to use the mean of each closed neighbor set, it is
trivially possible to combine this with acceleration techniques of  -means such as the algorithms
of Elkan [21], Hamerly [22], or the more recent Exponion [23] and Shallot [24] algorithms. The
weight can also be used to build a BETULA tree [25]. We have not experimentally verified these
additional speedups, as this would distract from the main objective of this paper. Furthermore,
the diferences between these algorithms will often be small compared to the cost of finding the
nearest neighbors of all points to construct the closed neighborhoods in the beginning.
1000
2000
3000
4000</p>
        <p>5000 6000
number of samples
7000
8000
9000
10000
(a) Number of iterations required by the K-means- (b) Number of closed neighborhood sets (CNS) for
CP for diferent number of samples for k-nearest diferent number of samples for k-nearest
neighneighbor with  = 1, 2, 3 . bor neighborhood with  = 1, 2, 3 .</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Experiments</title>
      <p>To empirically verify the run time improvements, we implemented both versions with Java in the
ELKI 0.8.0 data mining framework [26]. All implementations are within the same codebase to
reduce confounding factors and improve comparability [27]. We perform 10 restarts on an Intel
i4690K processor using a single thread, and evaluate the average values. We analyze the run time
for the MNIST 784 data set. MNIST 784 contains grayscale pictures of handwritten digits with a
picture size of 28 × 28 pixels, which is vectorized to a vector of length 784. We compared the
algorithms with subsets of sizes  = 1000, 2000, … , 10000 for the  NN and  MN neighborhood
for  = 1, 2, 3 neighbors. The initial cluster centers were chosen with K-means++ [28]. The
number of clusters is set to  = 10 because there are ten digits in the data set.
1MN K-means-CP
1MN NCK-means
2MN K-means-CP
2MN NCK-means
3MN K-means-CP
3MN NCK-means
1000
2000
3000
4000</p>
      <p>5000 6000
number of samples
7000
8000
9000
10000</p>
      <p>The time measured is the time required for calculating the closed neighborhood sets and until
the algorithm converges. The time to calculate the  NN of the data set is omitted because it is
required for both algorithms and is the most time-consuming part. Therefore time variances
in the calculation of the  NNs would overshadow the actual time diference of the algorithms.
The ELKI framework automatically uses a k-d-tree or a vantage-point tree to accelerate nearest
neighbor search [26], depending on data set characteristics and the distance function used.
Because of the dimensionality, ELKI chose a VP-tree automatically for this data set. In Figure 6,
we plot the run time of 1-mutual-nearest neighbor computation and NCK-means for diferent
number of samples of MNIST data. For MNIST with 784 dimensions, the kNN/kMN computation
takes a large part compared to the clustering computation. For MNIST with 784 dimensions,
the kNN/kMN computation takes a large share. We expect the kNN/kMN computation to be
much more lightly weighted for lower dimensions.</p>
      <p>In Figure 2, we plot the results for asymmetric  NN. While the run time improved in all
cases, there are only noticeable diferences for  = 1 , which get larger at  = 8000 . The little
improvements for  = 2, 3 are caused by  NN inducing only very few closed neighborhood sets.
For  = 2 , the maximum amount of six closed neighborhood sets is reached with  = 5000 . For
 = 3 all data sets are covered by a single closed neighborhood set. This leads to algorithms
only needing one or two iterations until convergence. The spike for  = 1 can also be explained
by the iterations needed for the algorithms to converge, which reaches its clear maximum for
 = 8000 . In this case, the run time improves by over 34%.</p>
      <p>In Figure 4, we plot the results for symmetric  MN. There are significant diferences for
all values of  used. This is because the amount of induced closed neighborhood sets seems
to scale approximately linearly for  MN. The spikes in the run time diferences can again be
explained by the varying number of iterations required for convergence. The greatest relative
improvement is reached by  = 3 for  = 8000 , reducing the run time by over 56%. Across all 
and  , we observe an average run time reduction of 30%.
⋅104
1MN
2MN
3MN
(a) Number of iterations required by the K-means- (b) Number of closed neighborhood sets for diferent
CP for diferent number of samples for k-mutual- number of samples for k-mutual-nearest
neighnearest neighbor with  = 1, 2, 3 . bor with  = 1, 2, 3 ..
5000 6000
number of samples</p>
      <p>The run time improvement per iteration would be the largest if there are very few closed
neighborhood sets because, in this case we save a lot of time by using the representative. But
if there are too few closed neighborhood sets, the algorithm converges very fast, e.g., in the
case of 3NN, where the initial clustering is already converged. The clustering quality for very
few closed neighborhood sets is also questionable but sometimes surprisingly good. This is
probably because of the relationship to DBSCAN, which also builds closed neighborhoods, and
spectral clustering, which partitions the nearest-neighbor graph.</p>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusion</title>
      <p>We improve the K-means-CP algorithm for clustering with k-Nearest-Neighbor consistency by
avoiding unnecessary distance computations. Using the parallel axis theorem, we prove that
we can reduce all closed neighborhoods to a single representative, and use these in clustering
instead of considering all samples individually. This allows clustering with nearest-neighbor
consistency on larger data sets than before.</p>
      <p>The technique could be integrated in further algorithms. In COP-K-Means [15] and
PCKmeans [16], for example, this allows us to reduce must-link constraints into using weighted
means, too. But the speedup is only noticeable if we have many such constraints, and the typical
scenario for constrained clustering is with only a few constraints given by the user to guide the
clustering in a semi-supervised way.
– density-connectivity distance unifies DBSCAN, k-center and spectral clustering, in:
Knowledge Discovery and Data Mining (KDD), 2023. doi:10.1145/3580305.3599283, to
appear.
[14] M. Pourrajabi, D. Moulavi, R. J. G. B. Campello, A. Zimek, J. Sander, R. Goebel, Model
selection for semi-supervised clustering, in: Extending Database Technology, EDBT, 2014,
pp. 331–342. doi:10.5441/002/edbt.2014.31.
[15] K. Wagstaf, C. Cardie, S. Rogers, S. Schrödl, Constrained k-means clustering with
background knowledge, in: Int. Conf. Machine Learning (ICML), 2001, pp. 577–584.
[16] S. Basu, A. Banerjee, R. J. Mooney, Active semi-supervision for pairwise constrained
clustering, in: SIAM Data Mining (SDM), 2004, pp. 333–344. doi:10.1137/1.9781611972740.31.
[17] I. S. Dhillon, D. S. Modha, Concept decompositions for large sparse text data using
clustering, Mach. Learn. 42 (2001) 143–175. doi:10.1023/A:1007612920971.
[18] E. Schubert, A. Lang, G. Feher, Accelerating spherical k-means, in: Similarity Search and</p>
      <p>Applications (SISAP), 2021, pp. 217–231. doi:10.1007/978-3-030-89657-7_17.
[19] S. D. Lee, B. Kao, R. Cheng, Reducing UK-means to K-means, in: Workshops Int. Conf.</p>
      <p>Data Mining (ICDM), 2007, pp. 483–488. doi:10.1109/ICDMW.2007.40.
[20] E. Schubert, Stop using the elbow criterion for k-means and how to choose the number of
clusters instead, SIGKDD Explorations 25 (2023) 36–42. doi:10.1145/3606274.3606278.
[21] C. Elkan, Using the triangle inequality to accelerate k-means, in: Int. Conf. Machine</p>
      <p>Learning (ICML), 2003, pp. 147–153.
[22] G. Hamerly, Making k-means even faster, in: SIAM Data Mining (SDM), 2010, pp. 130–140.</p>
      <p>doi:10.1137/1.9781611972801.12.
[23] J. Newling, F. Fleuret, Fast k-means with accurate bounds, in: Int. Conf. Machine Learning
(ICML), 2016, pp. 936–944.
[24] C. Borgelt, Even faster exact k-means clustering, in: Symp. Intelligent Data Analysis
(IDA), 2020, pp. 93–105. doi:10.1007/978-3-030-44584-3_8.
[25] A. Lang, E. Schubert, BETULA: fast clustering of large data with improved BIRCH CF-trees,</p>
      <p>Inf. Syst. 108 (2022) 101918. doi:10.1016/j.is.2021.101918.
[26] E. Schubert, Automatic indexing for similarity search in ELKI, in: Similarity Search and</p>
      <p>Applications (SISAP), 2022. doi:10.1007/978-3-031-17849-8_16.
[27] H. Kriegel, E. Schubert, A. Zimek, The (black) art of runtime evaluation: Are we
comparing algorithms or implementations?, Knowl. Inf. Syst. 52 (2017) 341–378. doi:10.1007/
s10115-016-1004-2.
[28] D. Arthur, S. Vassilvitskii, k-means++: the advantages of careful seeding, in: Symp.</p>
      <p>Discrete Algorithms, SODA, 2007, pp. 1027–1035.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Bonner</surname>
          </string-name>
          ,
          <article-title>On some clustering techniques</article-title>
          ,
          <source>IBM Journal of Research and Development</source>
          <volume>8</volume>
          (
          <year>1964</year>
          )
          <fpage>22</fpage>
          -
          <lpage>32</lpage>
          . doi:
          <volume>10</volume>
          .1147/rd.81.0022.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V.</given-names>
            <surname>Estivill-Castro</surname>
          </string-name>
          ,
          <article-title>Why so many clustering algorithms - a position paper</article-title>
          ,
          <source>SIGKDD Explorations 4</source>
          (
          <year>2002</year>
          )
          <fpage>65</fpage>
          -
          <lpage>75</lpage>
          . doi:
          <volume>10</volume>
          .1145/568574.568575.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ackerman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ben-David</surname>
          </string-name>
          ,
          <article-title>Measures of clustering quality: A working set of axioms for clustering</article-title>
          ,
          <source>in: NIPS</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>121</fpage>
          -
          <lpage>128</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Rousseeuw</surname>
          </string-name>
          ,
          <article-title>Silhouettes: A graphical aid to the interpretation and validation of cluster analysis</article-title>
          ,
          <source>J. Comput. Appl</source>
          . Math.
          <volume>20</volume>
          (
          <year>1987</year>
          )
          <fpage>53</fpage>
          -
          <lpage>65</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Davies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Bouldin</surname>
          </string-name>
          ,
          <article-title>A cluster separation measure</article-title>
          ,
          <source>IEEE Trans. Pattern Anal. Mach</source>
          . Intell. (
          <year>1979</year>
          )
          <fpage>224</fpage>
          -
          <lpage>227</lpage>
          . doi:
          <volume>10</volume>
          .1109/TPAMI.
          <year>1979</year>
          .
          <volume>4766909</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Calinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Harabasz</surname>
          </string-name>
          ,
          <article-title>A dendrite method for cluster analysis</article-title>
          ,
          <source>Communications in Statistics 3</source>
          (
          <year>1974</year>
          )
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          . doi:
          <volume>10</volume>
          .1080/03610927408827101.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Dunn</surname>
          </string-name>
          ,
          <article-title>Well-separated clusters and optimal fuzzy partitions</article-title>
          ,
          <source>Journal of Cybernetics</source>
          <volume>4</volume>
          (
          <year>1974</year>
          )
          <fpage>95</fpage>
          -
          <lpage>104</lpage>
          . doi:
          <volume>10</volume>
          .1080/01969727408546059.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Schubert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Rousseeuw</surname>
          </string-name>
          ,
          <article-title>Fast and eager k-medoids clustering: O(k) runtime improvement of the PAM, CLARA, and CLARANS algorithms</article-title>
          ,
          <source>Information Systems</source>
          <volume>101</volume>
          (
          <year>2021</year>
          )
          <article-title>101804</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.is.
          <year>2021</year>
          .
          <volume>101804</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Lenssen</surname>
          </string-name>
          , E. Schubert,
          <article-title>Clustering by direct optimization of the medoid silhouette</article-title>
          ,
          <source>in: Similarity Search and Applications</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>190</fpage>
          -
          <lpage>204</lpage>
          . doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>031</fpage>
          - 17849- 8_
          <fpage>15</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <article-title>K-nearest-neighbor consistency in data clustering: Incorporating local information into global optimization</article-title>
          ,
          <source>in: Symp. Applied Computing</source>
          ,
          <string-name>
            <surname>SAC</surname>
          </string-name>
          ,
          <year>2004</year>
          , p.
          <fpage>584</fpage>
          -
          <lpage>589</lpage>
          . doi:
          <volume>10</volume>
          .1145/967900.968021.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Handl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Knowles</surname>
          </string-name>
          ,
          <article-title>An evolutionary approach to multiobjective clustering</article-title>
          ,
          <source>IEEE Trans. Evol. Comput</source>
          .
          <volume>11</volume>
          (
          <year>2007</year>
          )
          <fpage>56</fpage>
          -
          <lpage>76</lpage>
          . doi:
          <volume>10</volume>
          .1109/TEVC.
          <year>2006</year>
          .
          <volume>877146</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>E.</given-names>
            <surname>Schubert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hess</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Morik</surname>
          </string-name>
          ,
          <article-title>The relationship of DBSCAN to matrix factorization and spectral clustering</article-title>
          , in: Lernen, Wissen, Daten, Analysen,
          <year>2018</year>
          , pp.
          <fpage>330</fpage>
          -
          <lpage>334</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Beer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Draganov</surname>
          </string-name>
          , E. Hohma,
          <string-name>
            <given-names>P.</given-names>
            <surname>Jahn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Frey</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Assent</surname>
          </string-name>
          , Connecting the dots
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>