Fast k-Nearest-Neighbor-Consistent Clustering

Fast k-Nearest-Neighbor-Consistent Clustering LarsLenssen lars.lenssen@tu-dortmund.de Informatik VIII TU Dortmund University

44221 Dortmund Germany

NiklasStrahmann niklas.strahmann@tu-dortmund.de Informatik VIII TU Dortmund University

44221 Dortmund Germany

ErichSchubert erich.schubert@tu-dortmund.de Informatik VIII TU Dortmund University

44221 Dortmund Germany

Fast k-Nearest-Neighbor-Consistent Clustering 1613-0073 6EB5D374E9199BFE157EF000619E53B4 GROBID - A machine learning software for extracting information from scholarly documents k-Nearest-Neighbor Consistency Cluster Analysis Clustering-Quality Measure

There are many ways to measure the quality of a clustering, both extrinsic (when labels are known) and intrinsic (when no labels are available). In this article, we focus on the k-Nearest-Neighbor Consistency measure, which considers a clustering as good if each object is within the same cluster as its nearest neighbors, and hence does not need labels. We propose a variant of the K-means clustering algorithm that uses the k-Nearest-Neighbor Consistency as a constraint while optimizing the sum-of-squares as in regular K-means, resulting in K-means clustering where the nearest neighbors are guaranteed to be in the same cluster. The new version provably yields the same results as the original consistency-preserving K-means algorithm of Ding and He, but needs fewer computation.

Introduction

Evaluating the quality of a result in unsupervised learning is difficult, as we do not have labels. Many different clustering objectives have been proposed, and we have just as many different evaluation measures. Bonner [1] noted that "none of the many specific definitions that might be given seems 'best' in any general sense", and Estivill-Castro [2] wrote that clustering is "in the eye of the beholder", and every researcher and user may have different believes on what constitutes a cluster. Common principles for clustering include that nearby objects should be in the same cluster, whereas far away objects should belong to different clusters. In agglomerative hierarchical clustering, for example, we always merge the closest two clusters, K-means assigns points to the nearest cluster, DBSCAN connects neighboring points in dense areas, and spectral clustering attempts to find a minimum cut of the nearest-neighbor graph. Similar principles can be found in clustering quality measures (CQM) [3] such as the Silhouette [4], the Davies-Bouldin index [5], the Variance-Ratio criterion [6], the Dunn index [7], and many more. The Silhouette, for example, compares the average distance to points of the same cluster with the average distance to points of the nearest other cluster. Any unsupervised clustering quality measure implies an "optimal" clustering, that could also be found by exhaustive enumeration; whereas clustering algorithms often only find an approximation to the "optimal" clustering, but in an acceptable run time. For example, K-means may find different solutions when run multiple times because it gets stuck in local optima. Similarly, K-medoid algorithms based on swapping can get stuck in local optima [8]. We can also optimize the Medoid Silhouette in a similar fashion as an example where an evaluation criterion was turned into an efficient algorithm [9]. But an undesired trivial clustering may score well for surprisingly many quality measures. For example the within-cluster-sum-of-squares (WCSS) criterion used by K-means clustering is optimal when every point is its own cluster. Hence we may need to add additional constraints such as keeping the number of clusters fixed. In other cases, it is common to use one criterion to find candidate solutions (e.g., optimizing WCSS with K-means), and then use a different criterion for model selection.

In this article, we are interested in a relatively unknown quality criterion called the k-Nearest-Neighbor Consistency [10], which measures if the nearest neighbors of each point belong to the same cluster. This is closely related to the central idea of clustering that similar objects should be in the same cluster, but also to nearest-neighbor classification, as we would then predict the correct class. Ding and He [10] also proposed two clustering algorithms to optimize for this criterion. In this article, we propose a fast variant of the consistency-preserving K-means algorithm, that provably yields the same results.

We first introduce nearest-neighbor consistency in Section 2 and clustering with consistency in Section 3. We then propose a faster algorithm in Section 4. We show experimental evidence of the performance benefits in Section 5 and conclude the paper in Section 6.

k-Nearest-Neighbor Consistency

The k-Nearest-Neighbor Consistency is motivated by k-Nearest-Neighbor classification, but it can also be used to evaluate clustering validity, as proposed by Ding and He [10]. In the following, we only consider clustering that form a total partitioning of the data set, encoded as a cluster label 𝑙 𝑖 for each sample 𝑖. Overlapping, hierarchical, or incomplete clustering (e.g., noise in DBSCAN) solutions are not considered. For the given samples 𝑋 = {𝑥 1 , … , 𝑥 𝑛 }, and the cluster labels 𝐿 = {𝑙 1 , … , 𝑙 𝑛 }, the 𝑘 NN(𝑥 𝑖 ) are the 𝑘 nearest neighbors of 𝑥 𝑖 . One single sample 𝑖 is considered to be 𝑘-nearest-neighbor consistent if all neighbors 𝑝 ∈ 𝑘 NN(𝑥 𝑖 ) have the same cluster label as the sample 𝑖:

𝑘𝑐 𝑖 (𝑋 , 𝐿) = { 1 if ∀𝑥 𝑗 ∈ 𝑘 NN(𝑥 𝑖 ) ∶ 𝑙 𝑗 = 𝑙 𝑖 0 otherwise

To measure the 𝑘 NN consistency of a clustering where not all points are consistent, Ding and He [10] Mutual nearest neighbors have the benefit of being a symmetric relation, but also being less sensitive to outliers and more likely to contain multiple components. Intuitively, we remove all unidirectional edges from the k-nearest-neighbor graph. Consider a data set where we have one dense cluster and a far away outlier. The outlier's neighbors will be points of the cluster, but it will not be in the nearest neighbors of the cluster points. Both the 𝑘 NN and the 𝑘 MN are often used in the context of spectral clustering, which in turn is related to DBSCAN [12,13].

Ding and He [10] aim at enforcing 𝑘 NN consistency in their algorithms. We can interpret this as a form of constrained clustering, where we add must-link constraints for neighbors. Such constraints have been previously used, e.g., for model selection [14] and for integrating external constraints into K-means [15,16].

Clustering with Nearest Neighbor Consistency

The k-Nearest-Neighbor consistency was originally proposed by Ding and He [10] along with two algorithms to create k-Nearest-Neighbor-Consistent clustering, ENFORCE and consistencypreserving K-means (K-means-CP). The ENFORCE algorithm modifies an existing clustering, which can be created by any cluster analysis method, to improve its consistency by reassigning neighborhoods to the most common cluster. To achieve this, ENFORCE uses closed neighborhoods. A set 𝑆 ⊂ 𝑋 is a closed neighborhood if for all 𝑥 ∈ 𝑆 holds 𝑘 NN(𝑥) ⊂ 𝑆 and for every two elements 𝑥 𝑖 , 𝑥 𝑗 ∈ 𝑆 there is a path of elements from 𝑘 NN(𝑥 𝑖 ) or 𝑘 NN(𝑥 𝑗 ). The definition can be easily extended to any neighborhood function such as 𝑘 MN. It is easy to see that all points in a closed neighborhood must be in the same cluster for the clustering to be consistent and that every point can only belong to one closed neighborhood. Figure 1 shows the closed neighbor sets on an example data set using mutual nearest neighbors with 𝑘 = 3. To enforce a 𝑘 NN-consistent clustering, ENFORCE iterates through all sets of closed neighborhoods and counts the respective cluster labels. Then it assigns the entire set to the label that occurs most often. There is also an interesting parallel to DBSCAN clustering here, which computes a similar transitive closure, but with a density-based notion of the neighborhood. DBSCAN clusters are closed neighborhoods with respect to density reachability.

In contrast to ENFORCE, consistency-preserving K-means (K-means-CP) does not work on an For neighbor-consistent clustering, these sets must be assigned to the same cluster.

existing clustering, but integrates consistency into the standard K-means procedure. K-means uses the quadratic Euclidean distance, although there are other approaches, such as spherical K-means [17,18], they will not be considered here. First, initial cluster centers Μ = {𝜇 1 , ..., 𝜇 𝑘 } are chosen with any of the standard heuristics. Then an alternating optimization as in the standard algorithm is performed. But in order to obtain a 𝑘 NN-consistent result, K-means-CP, like ENFORCE, always assigns entire closed neighborhoods to the same cluster. Thus, Ding and He [10] define the nearest cluster center nearest(𝑆 𝑖 ) of a closed neighborhood 𝑆 𝑖 by the sum of squared euclidean distances

nearest(𝑆 𝑖 ) = arg min 𝑗 ∑ 𝑥∈𝑆 𝑖 ‖𝑥 − 𝜇 𝑗 ‖ 2 .

All points in the closed neighborhood 𝑆 𝑖 are then assigned to this cluster. The assignment step is alternated with a recalculation of the cluster centers. A new cluster center 𝜇 𝑗 of the cluster 𝐶 𝑗 is determined as usual in K-means using the arithmetic average of the assigned data

𝜇 𝑗 = 1 |𝐶 𝑗 | ∑ 𝑥∈𝐶 𝑗 𝑥 .

These steps are repeated until no point is reassigned (and hence the cluster centers do not change anymore). Convergence guarantees of the standard algorithm still apply. This algorithm computes exactly 𝑁 ⋅ 𝐾 distances in each step to determine the nearest clusters, and these distance computations make up a major part of the algorithm's run time. We can also optimize the recomputation of the cluster centers by using an incremental computation, but this has much less effect. In the following, we discuss why many of the distance computations performed above are unnecessary, and we can hence improve the run time of this algorithm.

Algorithm 2: K-means-CP

Fast k-Nearest-Neighbor-Consistent Clustering

The K-means-CP algorithm determines the nearest cluster of a closed neighborhood set by the sum of the squared Euclidean distances of all elements of a closed neighborhood set to the different cluster centers. Using the parallel axis theorem of König, Huygens and Steiner, we can prove that instead of considering the distances of all samples in a closed neighborhood set 𝑆 𝑖 , it is sufficient to consider only the mean vector s𝑖 = 1 |𝑆 𝑖 | ∑ 𝑥∈𝑆 𝑖 𝑥. We can also trivially use the mean vector for updating the cluster centers, if we weight it by the number of points in the closed neighborhood set. The same property has been previously used by Lee et al. [19] to reduce uncertain UK-means clustering to regular K-means clustering. Our improved Neighborconsistent K-Means (NCK-means) hence uses the mean vectors of the closed neighborhood sets. This yields a faster variant of consistency-preserving K-means with a significantly lower run time and the guaranteed same result.

We will reference the set which contains all closed neighborhood sets by 𝑆. Because all elements of a closed neighborhood set will be assigned to the same cluster, we will reference the set which contains all closed neighborhood sets whose elements have cluster label 𝑖 by 𝐶 𝑖 .

Proof of correctness

To prove the equivalence of the results of NCK-means and K-means-CP, we show that the substeps of the algorithm always produce the same results. We first show that for given cluster centers, NCK-means and K-means-CP assign the same cluster labels to a closed neighborhood set. We start by considering how the original K-means-CP assigns cluster labels:

𝑆 𝑗 ∈ 𝐶 𝑖 ⟺ 𝑖 = arg min 𝑖 ∑ 𝑥∈𝑆 𝑗 ‖𝑥 − 𝜇 𝑖 ‖ 2 .

Algorithm 3: NCK-means NCK-means assigns its cluster labels by the equivalent optimization:

𝑆 𝑗 ∈ 𝐶 𝑖 ⟺ 𝑖 = arg min 𝑖 ‖ s𝑗 − 𝜇 𝑖 ‖ 2 .

This equivalence easily follows from the following version of the parallel axis theorem:

∑ 𝑥∈𝑋 |𝑥 − 𝑎| 2 = ∑ 𝑥∈𝑋 (|𝑥 − X | 2 ) + 𝑁 |𝑎 − X | 2

by using the vector form (a sum over all components) and then 𝑎 = 𝜇 𝑖 and 𝑋 = 𝑆 𝑗 :

∑ 𝑥∈𝑆 𝑗 ‖𝑥 − 𝜇 𝑖 ‖ 2 = ∑ 𝑥∈𝑆 𝑗 (‖𝑥 − s𝑗 ‖ 2 ) + |𝑆 𝑗 | ⋅ ‖𝜇 𝑖 − s𝑗 ‖ 2 .

Because only the last term depends on 𝑖, we can omit the others for the minimization over 𝑖:

arg min 𝑖 ∑ 𝑥∈𝑆 𝑗 ‖𝑥 − 𝜇 𝑖 ‖ 2 = arg min 𝑖 ‖ s𝑗 − 𝜇 𝑖 ‖ 2 .

We can prove the above version of the parallel axis theorem as follows:

∑ 𝑥∈𝑋 |𝑥 − 𝑎| 2 = ∑ 𝑥∈𝑋 |(𝑥 − X ) − (𝑎 − X )| 2 = ∑ 𝑥∈𝑋 ((𝑥 − X ) 2 − 2(𝑥 − X )(𝑎 − X ) + (𝑎 − X ) 2 ) = ∑ 𝑥∈𝑋 (𝑥 − X ) 2 − 2(𝑎 − X ) ∑ 𝑥∈𝑋 (𝑥 − X ) ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ =0 +𝑁 |𝑎 − X | 2 = ∑ 𝑥∈𝑋 |𝑥 − X | 2 + 𝑁 |𝑎 − X | 2

The classic parallel axis theorem is obtained for 𝑎 = 0.

We now consider the second step, updating the cluster centers. We start with the original calculation of the cluster centers 𝜇 𝑖 . K-means-CP calculates this via the mean of the elements.

𝜇 𝑖 = 1 |{𝑥 𝑗 | 𝑙 𝑗 = 𝑖}| ∑ 𝑥 𝑗 ∈𝑋 |𝑙 𝑗 =𝑖 𝑥 𝑗

Because the labels of each closed set are consistent, we can rewrite this to:

𝜇 𝑖 = 1 ∑ 𝑆 𝑗 ∈𝐶 𝑖 |𝑆 𝑗 | ∑ 𝑆 𝑗 ∈𝐶 𝑖 ∑ 𝑥 𝑘 ∈𝑆 𝑗 𝑥 𝑘

and by using s𝑗 = 1 |𝑆 𝑗 | ∑ 𝑥 𝑘 ∈𝑆 𝑗 𝑥 𝑘 , we obtain

𝜇 𝑖 = 1 ∑ 𝑆 𝑗 ∈𝐶 𝑖 |𝑆 𝑗 | ∑ 𝑆 𝑗 ∈𝐶 𝑖 |𝑆 𝑗 | s𝑗 .

This shows that for closed neighborhood sets with given cluster labels, NCK-means creates the same cluster centers as K-means-CP. Because both steps of the algorithms produce the same results, we have proven that the final result of the algorithms is the same.

Expected run time improvement

The run time improvement depends on the data reduction using closed neighbor sets. The standard K-means algorithm has a run time of 𝑂(𝑁 𝐾 𝑑𝑖) where 𝑁 is the number of points, 𝐾 is the number of clusters, 𝑑 is the dimensionality, and 𝑖 is the number of iterations. Finding the closed neighbor sets additionally takes 𝑂(𝑁 2 ) time (although indexes for similarity search may yield a considerable speedup), and K-means-CP of Ding and He [10] hence has a complexity of 𝑂(𝑁 2 + 𝑁 𝑘𝑑𝑖). NCK-means reduces this to 𝑂(𝑁 2 + |𝑆|𝐾 𝑑𝑖) where |𝑆| is the number of closed neighbor sets, and we must assume |𝑆| ∈ 𝑂(𝑁 ). In a worst-case asymptotic analysis, the improvements hence are likely negligible because the cost of finding the closed neighbor sets dominates, but in practice it usually offers a decent run time improvement on the order of |𝑆|/𝑁 in the clustering step, at only the cost of little additional memory to store the means of each closed neighbor set. Hence, there is no reason not to use it. As it is a best practice to run K-means several times and keep the best outcome [20], the cost to find the closed neighbor sets can also be amortized over multiple restarts.

The choice of the neighborhood has an important effect on the run time. Finding the nearest neighbors becomes more expensive with a larger neighborhood size, but at the same time increasing the number of edges decreases the number of closed neighbor sets. For NCK-means, a lower number is beneficial, and both algorithms also benefit from an often lower number of iterations because reassignments are less likely to happen with the additional consistency constraints.

Further run-time improvements

Because our proof shows that it is sufficient to use the mean of each closed neighbor set, it is trivially possible to combine this with acceleration techniques of 𝑘-means such as the algorithms of Elkan [21], Hamerly [22], or the more recent Exponion [23] and Shallot [24] algorithms. The weight can also be used to build a BETULA tree [25]. We have not experimentally verified these additional speedups, as this would distract from the main objective of this paper. Furthermore, the differences between these algorithms will often be small compared to the cost of finding the nearest neighbors of all points to construct the closed neighborhoods in the beginning.

Experiments

To empirically verify the run time improvements, we implemented both versions with Java in the ELKI 0.8.0 data mining framework [26]. All implementations are within the same codebase to reduce confounding factors and improve comparability [27]. We perform 10 restarts on an Intel i4690K processor using a single thread, and evaluate the average values. We analyze the run time for the MNIST 784 data set. MNIST 784 contains grayscale pictures of handwritten digits with a picture size of 28 × 28 pixels, which is vectorized to a vector of length 784. We compared the algorithms with subsets of sizes 𝑛 = 1000, 2000, … , 10000 for the 𝑘 NN and 𝑘 MN neighborhood for 𝑘 = 1, 2, 3 neighbors. The initial cluster centers were chosen with K-means++ [28]. The number of clusters is set to 𝐾 = 10 because there are ten digits in the data set. The time measured is the time required for calculating the closed neighborhood sets and until the algorithm converges. The time to calculate the 𝑘 NN of the data set is omitted because it is required for both algorithms and is the most time-consuming part. Therefore time variances in the calculation of the 𝑘 NNs would overshadow the actual time difference of the algorithms. The ELKI framework automatically uses a k-d-tree or a vantage-point tree to accelerate nearest neighbor search [26], depending on data set characteristics and the distance function used. Because of the dimensionality, ELKI chose a VP-tree automatically for this data set. In Figure 6, we plot the run time of 1-mutual-nearest neighbor computation and NCK-means for different number of samples of MNIST data. For MNIST with 784 dimensions, the kNN/kMN computation takes a large part compared to the clustering computation. For MNIST with 784 dimensions, the kNN/kMN computation takes a large share. We expect the kNN/kMN computation to be much more lightly weighted for lower dimensions.

In Figure 2, we plot the results for asymmetric 𝑘 NN. While the run time improved in all cases, there are only noticeable differences for 𝑘 = 1, which get larger at 𝑛 = 8000. The little improvements for 𝑘 = 2, 3 are caused by 𝑘 NN inducing only very few closed neighborhood sets. For 𝑘 = 2, the maximum amount of six closed neighborhood sets is reached with 𝑛 = 5000. For 𝑘 = 3 all data sets are covered by a single closed neighborhood set. This leads to algorithms only needing one or two iterations until convergence. The spike for 𝑘 = 1 can also be explained by the iterations needed for the algorithms to converge, which reaches its clear maximum for 𝑛 = 8000. In this case, the run time improves by over 34%.

In Figure 4, we plot the results for symmetric 𝑘 MN. There are significant differences for all values of 𝑘 used. This is because the amount of induced closed neighborhood sets seems to scale approximately linearly for 𝑘 MN. The spikes in the run time differences can again be explained by the varying number of iterations required for convergence. The greatest relative improvement is reached by 𝑘 = 3 for 𝑛 = 8000, reducing the run time by over 56%. Across all 𝑘 and 𝑛, we observe an average run time reduction of 30%. The run time improvement per iteration would be the largest if there are very few closed neighborhood sets because, in this case we save a lot of time by using the representative. But if there are too few closed neighborhood sets, the algorithm converges very fast, e.g., in the case of 3NN, where the initial clustering is already converged. The clustering quality for very few closed neighborhood sets is also questionable but sometimes surprisingly good. This is probably because of the relationship to DBSCAN, which also builds closed neighborhoods, and spectral clustering, which partitions the nearest-neighbor graph.

Conclusion

We improve the K-means-CP algorithm for clustering with k-Nearest-Neighbor consistency by avoiding unnecessary distance computations. Using the parallel axis theorem, we prove that we can reduce all closed neighborhoods to a single representative, and use these in clustering instead of considering all samples individually. This allows clustering with nearest-neighbor consistency on larger data sets than before.

The technique could be integrated in further algorithms. In COP-K-Means [15] and PCKmeans [16], for example, this allows us to reduce must-link constraints into using weighted means, too. But the speedup is only noticeable if we have many such constraints, and the typical scenario for constrained clustering is with only a few constraints given by the user to guide the clustering in a semi-supervised way.

Figure 1 :1Figure 1: Closed neighbor sets for mutual nearest neighbors with 𝑘 = 3.For neighbor-consistent clustering, these sets must be assigned to the same cluster.

Figure 2 :2Figure 2: Run time of K-means-CP and NCK-means for different number of samples of MNIST, for k-nearest neighbor neighborhood with 𝑘 = 1, 2, 3.

Figure 3 :3Figure 3: Number of iterations and closed neighborhood sets on MNIST data

Figure 4 :4Figure 4: Run time of K-means-CP and NCK-means for different number of samples for k-mutual-nearest neighbor with 𝑘 = 1, 2, 3.

Number of iterations required by the K-means-CP for different number of samples for k-mutualnearest neighbor with 𝑘 = 1Number of closed neighborhood sets for different number of samples for k-mutual-nearest neighbor with 𝑘 = 1, 2, 3..

Figure 5 :Figure 6 :56Figure 5: Number of iterations and closed neighborhood sets on MNIST data.

also use a fractional 𝑘 NN consistency𝑥 𝑗 ∈ 𝑘 MN(𝑥 𝑖 ) ⟺ 𝑥 𝑗 ∈ 𝑘 NN(𝑥 𝑖 ) ∧ 𝑥 𝑖 ∈ 𝑘 NN(𝑥 𝑗 ).Algorithm 1: ENFORCE1 𝑙 ← cluster assignments by input clustering algorithm;2 𝑆 ← calculate sets of closed neighborhoods;3 𝑁 ←null;4 foreach 𝑆 𝑖 ∈ 𝑆 = {𝑆 1 , … , 𝑆 ℎ } do// closed neighborhoods5(𝑁 ) ←null;6foreach 𝑥 𝑗 ∈ 𝑆 𝑖 do// count cluster assignments7𝑁 𝑙 𝑗 + +;8𝑜 ← arg max 𝑁;9foreach 𝑥 𝑗 ∈ 𝑆 𝑖 do 𝑙 𝑗 ← 𝑜;// assign closed neighborhoods10 return 𝐿;𝐾 𝐶(𝑋 , 𝐿) =1 𝑛∑ 𝑛 𝑖=1 𝑘𝑐 𝑖 (𝑋 , 𝐿).To use the consistency as a quality measure, Handl and Knowles [11] developed a connectivity-based measure out of the 𝑘 NN consistency, which gives more emphasis to the nearest neighbors.The definition of the 𝑘 NN consistency can also be applied to other neighborhood concepts, forexample, k-Mutual-Nearest-Neighbors [10], which symmetrizes k-Nearest-Neighbors and isdenoted by 𝑘 MN(𝑥):

𝑁 𝑗 ← 𝑁 𝑗 + ‖𝑥 𝑢 − 𝜇 𝑗 ‖ 𝑗 ∈ 𝑆 𝑖 do 𝑙 𝑗 ← 𝑜;// assign closed neighborhoods12if 𝑙 is unchanged then break;

1 𝐶 ← initialize 𝑘 cluster; 2 𝑆 ← calculate sets of closed neighborhoods; 3 𝑙 ← dummy assignments to non-existent cluster; 4 repeat 5 foreach 𝑆 𝑖 ∈ 𝑆 = {𝑆 1 , … , 𝑆 ℎ } do // closed neighborhoods 6 𝑁 ←null; 7 foreach 𝜇 𝑗 ∈ 𝑀 = {𝜇 1 , … , 𝜇 𝑘 } do // determine nearest cluster 8 foreach 𝑥 𝑢 ∈ 𝑆 𝑖 do 9 2 ; 10 𝑜 ← arg min 𝑁; 11 foreach 𝑥 13 𝑀 ← calculate cluster centers for 𝐶; 14 return 𝐿;

On some clustering techniques REBonner 10.1147/rd.81.0022 IBM Journal of Research and Development 8 1964 Why so many clustering algorithms -a position paper VEstivill-Castro 10.1145/568574.568575 SIGKDD Explorations 4 2002 Measures of clustering quality: A working set of axioms for clustering MAckerman SBen-David NIPS 2008 Silhouettes: A graphical aid to the interpretation and validation of cluster analysis PJRousseeuw J. Comput. Appl. Math 20 1987 A cluster separation measure DLDavies DWBouldin 10.1109/TPAMI.1979.4766909 IEEE Trans. Pattern Anal. Mach. Intell 1979 A dendrite method for cluster analysis TCalinski JHarabasz 10.1080/03610927408827101 Communications in Statistics 3 1974 Well-separated clusters and optimal fuzzy partitions JCDunn 10.1080/01969727408546059 Journal of Cybernetics 4 1974 Fast and eager k-medoids clustering: O(k) runtime improvement of the PAM, CLARA, and CLARANS algorithms ESchubert PJRousseeuw 10.1016/j.is.2021.101804 Information Systems 101 101804 2021 Clustering by direct optimization of the medoid silhouette LLenssen ESchubert 10.1007/978-3-031-17849-8_15 Similarity Search and Applications 2022 K-nearest-neighbor consistency in data clustering: Incorporating local information into global optimization CDing XHe 10.1145/967900.968021 Symp. Applied Computing, SAC 2004 An evolutionary approach to multiobjective clustering JHandl JKnowles 10.1109/TEVC.2006.877146 IEEE Trans. Evol. Comput 11 2007 The relationship of DBSCAN to matrix factorization and spectral clustering ESchubert SHess KMorik Lernen, Wissen, Daten, Analysen 2018 Connecting the dots -density-connectivity distance unifies DBSCAN, k-center and spectral clustering ABeer ADraganov EHohma PJahn CMFrey IAssent 10.1145/3580305.3599283 Knowledge Discovery and Data Mining (KDD) 2023 to appear Model selection for semi-supervised clustering MPourrajabi DMoulavi RJ G BCampello AZimek JSander RGoebel 10.5441/002/edbt.2014.31 Extending Database Technology, EDBT 2014 Constrained k-means clustering with background knowledge KWagstaff CCardie SRogers SSchrödl Int. Conf. Machine Learning (ICML) 2001 Active semi-supervision for pairwise constrained clustering SBasu ABanerjee RJMooney 10.1137/1.9781611972740.31 SIAM Data Mining (SDM) 2004 Concept decompositions for large sparse text data using clustering ISDhillon DSModha 10.1023/A:1007612920971 Mach. Learn 42 2001 Accelerating spherical k-means ESchubert ALang GFeher 10.1007/978-3-030-89657-7_17 Similarity Search and Applications SISAP 2021 Reducing UK-means to K-means SDLee BKao RCheng 10.1109/ICDMW.2007.40 Workshops Int. Conf. Data Mining (ICDM) 2007 Stop using the elbow criterion for k-means and how to choose the number of clusters instead ESchubert 10.1145/3606274.3606278 SIGKDD Explorations 25 2023 Using the triangle inequality to accelerate k-means CElkan Int. Conf. Machine Learning (ICML) 2003 Making k-means even faster GHamerly 10.1137/1.9781611972801.12 SIAM Data Mining (SDM) 2010 Fast k-means with accurate bounds JNewling FFleuret Int. Conf. Machine Learning (ICML) 2016 Even faster exact k-means clustering CBorgelt 10.1007/978-3-030-44584-3_8 Symp. Intelligent Data Analysis (IDA) 2020 BETULA: fast clustering of large data with improved BIRCH CF-trees ALang ESchubert 10.1016/j.is.2021.101918 Inf. Syst 108 101918 2022 Automatic indexing for similarity search in ELKI ESchubert 10.1007/978-3-031-17849-8_16 Similarity Search and Applications

SISAP

2022 The (black) art of runtime evaluation: Are we comparing algorithms or implementations? HKriegel ESchubert AZimek 10.1007/s10115-016-1004-2 Knowl. Inf. Syst 52 2017 k-means++: the advantages of careful seeding DArthur SVassilvitskii Symp. Discrete Algorithms, SODA 2007