1. Introduction

Comparison of the efficiency of autoencoders for solving clustering problems

Volodymyr

Lytvynenko

0 2

Olszewski

Olszewski.Serge@gmail.com 2 4

Volodymyr

Osypenko

1 2

Violeta

Demchenko

2 3

Daria Dontsova

2 3 0 Kherson National Technical Unіversity , st. Instytutska, 11 Khmelnytskyi 29016 Ukraine 1 Kyiv National University of Technology and Design , Mala Shyianovska str., 2,01011 Kyiv , Ukraine 2 Saksaganskogo Str. , 75, 01033 Kyiv , Ukraine 3 State Institution «Kundiiev Institute of Occupational Health of the National Academy of Medical Sciences of Ukraine» 4 Taras Shevchenko National University of Kyiv , 64/13, Volodymyrska Street, 01601 Kyiv , Ukraine

This study compares the effectiveness of autoencoders and variational autoencoders for clustering tasks, using the Iris dataset with k-means, spectral clustering, Affinity Propagation, and Gaussian Mixture Model. Clustering quality was assessed with metrics like the Silhouette Index, DaviesBouldin Index, Adjusted Rand Index, and Mutual Information. Results showed that classical autoencoders performed more reliably and effectively, particularly with k-means, while variational autoencoders excelled with Affinity Propagation. The Gaussian Mixture Model was the least effective for both types. The study underscores the importance of choosing the right autoencoder and clustering algorithm based on the task and data structure, paving the way for future research on more complex datasets.

eol>autoencoders variational autoencoders k-means affinity propagation spectral clustering Gaussian mixture model silhouette score Davis-Bouldin index adjusted rand index mutual information latent space

1. Introduction

In recent decades, data volume has grown exponentially across various fields, becoming complex, multidimensional, and irregular, necessitating advanced analysis methods. Classical clustering methods like k-means often struggle with such data, especially with nonlinear dependencies or noise. Classical autoencoders, a type of neural network for dimensionality reduction, address this by compressing input data into a lower-dimensional latent space and reconstructing it, enabling more accurate clustering. Autoencoders automate feature selection by representing data in a low-dimensional space where clusters are more defined. They are used in fields such as image and text analysis, time series, and biomedical data, improving the extraction of valuable insights.

Autoencoders can integrate with other methods like variational autoencoders and GANs, enhancing adaptability for complex tasks. They are especially effective for high-dimensional data, addressing the "curse of dimensionality" by compressing data for better clustering. The growth in GPU and computing power has made training these models more accessible, boosting their relevance in modern data analysis. The need to analyze complex data, extract features, and integrate deep learning methods highlights the importance of autoencoders for achieving accurate results.

The main contributions of this paper are as follows: a)A comprehensive comparative analysis of the effectiveness of classical (AE) and variational (VAE) autoencoders for clustering tasks, applying various algorithmic approaches; b) Empirical evidence of the superiority of classical autoencoders in terms of stability and clustering quality, especially in combination with the kmeans algorithm; c)The high efficiency of the Affinity Propagation algorithm in combination with VAE, indicating its potential for specific types of data;d)The implementation of a multicriteria approach to clustering quality assessment, integrating several metrics, which provided a deep understanding of the effectiveness of the studied methods.

The rest of the paper is structured as follows. Section 2 provides a literature review, Section 3 presents the problem statement. Section 4 describes the materials and methods used in this study. Section 5 presents the results of testing the proposed clustering methods. Finally, Section 6 concludes the study.

2. Review of the literature

Autoencoders are effectively used for clustering by leveraging their latent representations. Clustering groups objects based on similarities, and autoencoders help extract latent data features and reduce dimensionality before clustering. Hinton and Salakhutdinov [1] introduced a neural network methodology for dimensionality reduction, showing that autoencoders create compact latent representations suitable for clustering. This foundational work advanced the use of autoencoders in data clustering. The Deep Embedded Clustering (DEC) model [2] uses an autoencoder to learn latent data representations and cluster them, jointly optimizing data reconstruction and cluster distribution for more accurate and stable results. Xie and colleagues' work was a significant advancement in autoencoder-based clustering methods. Variational autoencoders (VAEs) [3] are key in data analysis for modeling complex distributions and uncertainty. They provide a theoretical foundation for creating latent representations, enabling effective data clustering and making them popular for generative modeling and clustering tasks. In [4], the authors proposed a method combining convolutional autoencoders with deep learning for clustering, enhancing feature extraction and clustering quality, especially for complex data like images. In [5], Adversarial Autoencoders (AAEs) were introduced, combining autoencoders with GANs to create more structured latent representations, enhancing clustering and generative capabilities for data analysis. The paper [6] introduces the Deep Embedded Clustering (DEC) method, noting its limitations in preserving local data structure. An improved method, IDEC, addresses this by combining clustering with local structure preservation through a new objective function. Results demonstrate IDEC's superiority across metrics, with visualizations highlighting enhanced local structure retention and clustering performance. This work underscores the importance of local structure in deep clustering and offers a practical enhancement. The article [7] introduces the Deep Clustering Network (DCN), which combines deep learning with the K-means algorithm to perform nonlinear data mapping and clustering simultaneously. DCN outperforms classical and modern methods by transforming data into a more "K-means-friendly" space, improving clustering quality and advancing data analysis through the integration of deep learning with traditional clustering. The article [8] introduces a clustering approach combining autoencoders and traditional algorithms. It employs autoencoders for nonlinear dimensionality reduction, applies K-means to the hidden representation, and iteratively optimizes parameters. The method excels in handling complex, nonlinear data, automatically extracting relevant features and outperforming traditional clustering techniques. The study highlights the potential of integrating autoencoders into unsupervised analysis and discusses variational autoencoders (VAEs), which model probabilistic distributions and create latent representations for clustering and generative tasks. Variational Autoencoders (VAEs), introduced by Kingma and Welling [9], model the latent space as a probabilistic distribution, enabling data generation and effective organization in the latent space. This makes VAEs ideal for clustering complex, high-dimensional data. The Variational Deep Embedding (VaDE) method [10] combines VAEs with clustering by modeling latent variables as a Gaussian mixture, enabling direct clustering in the latent space. It outperforms traditional methods like K-means, particularly on complex datasets. The Gaussian Mixture Variational Autoencoder (GMVAE) [11] combines variational autoencoders with Gaussian mixtures, modeling the latent space as a Gaussian mixture to enhance clustering and handle complex data structures. In [12], the authors enhance VaDE by combining variational autoencoders with deep embedding, improving clustering for high-dimensional data like images and texts. Autoencoders effectively extract and utilize latent representations for clustering. This study compares classical and variational autoencoders for clustering, analyzing their integration with algorithms like K-means, spectral clustering, GMM, and Affinity Propagation. It aims to identify their strengths, limitations, and suitability for different data types and requirements. Clustering quality will be evaluated using metrics such as silhouette, DaviesBouldin Index, Adjusted Rand Index, and Mutual Information, providing objective insights and practical recommendations.

3. Problem statement

A formal statement of the problem can be formulated as follows:

Given:1. Data set D = {x₁, x₂, ..., xₙ}, where xᵢ ∈ ℝᵐ is m-dimensional vector of features. 2. Multiple clustering methods based on autoencoders M = {M₁, M₂, ..., Mₖ}, where each method Mᵢ can be a classical or variation autoencoder.

3. Set of clustering algorithms A = {K-means, Spectral Clustering, Gaussian Mixture Model, Affinity Propagation}.

4. Set of clustering quality metrics Q = {Silhouette Score, Davies-Bouldin Index, Adjusted Rand Index, Mutual Information, Adjusted Mutual Information}.

Required: 1) For each method Mᵢ ∈ M and algorithm Aⱼ ∈ A: a) Learn the model Mᵢ on the data D; b) Apply the algorithm Aⱼ to the latent representation of the data obtained by the Mᵢ; c) Calculate the values of all quality metrics q ∈ Q for the resulting clustering; 2) Carry out a comparative analysis of the results obtained: a) Evaluate the effectiveness of each combination (Mᵢ, Aⱼ) across all metrics q ∈ Q; b) Identify the advantages and limitations of each method Mᵢ ∈ M; c) Identify the most effective combinations of methods and algorithms for different types of data and clustering tasks; 3) To formulate recommendations for selecting the optimal autoencoder based clustering method and clustering algorithm depending on data characteristics and clustering quality requirements.

Limitations and assumptions: 1. The true cluster labels for the dataset D are assumed to be unknown (unsupervised learning problem); 2. The number of clusters K is assumed to be given or determined automatically depending on the clustering algorithm used; 3. Computational resources and model training time are not considered as limiting factors in this problem formulation. This formal problem statement covers all aspects mentioned in the research objective and provides a clear structure for a comparative analysis of autoencoder based clustering methods.

4. Materials and Methods

4.1. Data The IRIS dataset (https://www.kaggle.com/uciml/iris) was chosen for comparing classical (AE) and variational (VAE) autoencoders in clustering due to its known structure, interpretability, and manageable size, enabling efficient training and testing. Its explicit clustering structure (three classes) allows assessment of the models' ability to extract informative latent representations, making it an ideal baseline for such analyses.

4.2. Autoencodes

A classical autoencoder encodes data into a compact representation and reconstructs it, extracting features for clustering. Below is its mathematical description and application in clustering:Autoencoder Architecture. The autoencoder consists of two main parts:

Encoder: Function converts the input data in the hidden view

. , where

— encoder parameters.

Decoder: Function reconstructs the original data from the hidden view . where is refurbished entrance, and is decoder parameters.

Loss function. The autoencoder is trained using a loss function that measures the difference between the original data and recovered data . Typically, the standard error of mean square error is used (MSE): where is number of examples in the batches.

Application for clustering. After training the autoencoder and obtaining hidden representations for all the data, one can use these representations for clustering. The process can be described as follows:

a) Obtaining latent representations: Skip the entire dataset through the encoder to get the hidden views : ( 1 ) ( 2 ) ( 3 ) b) Clustering: Apply a clustering algorithm such as K-means or other method to the hidden representations for cluster tagging : where is parameters of the clustering algorithm.

Thus, a classical autoencoder can be described as a model that first encodes data into a hidden representation and then reconstructs data from that representation. Hidden representations obtained from the encoder are used for clustering, which are then clustered using a clustering algorithm. ( 4 ) ( 5 )

4.3. Variation Autoencoder

Variational Autoencoder (VAE) is a probabilistic model that encodes data as a probability distribution rather than a fixed representation. In clustering, VAE generates compact, informative data representations for clustering, with an encoder and decoder, differing from the classical autoencoder. instead of a pointwise latent representation distribution is used .

Encoder. The encoder converts the input data into distribution parameters (it's usually an average ) and standard deviation

): where parameters .

and

are parameters that depend on the input data and network Decoder. The decoder recovers data from a hidden view , which was sampled from the distribution

: where is a distribution (e.g., Gaussian) defined by the decoder.

The variational autoencoder is trained by minimizing the two components of the loss function: 1. Reconstructive error (RMS error or cross-entropy) between the original data and the reconstructed data :

2. Kulbak-Leibler divergence (KL- divergence) between the posterior distribution and a priori distribution

, which is usually assumed to be the standard normal distribution :

Complete loss function: where is parameters of the clustering algorithm (e.g., K-means). ( 7 ) ( 8 ) ( 9 ) ( 10 ) ( 11 ) ( 12 ) where

is the weighting factor that can be adjusted to control for the effect of KL-divergence (e.g. in the model -VAE).

After training VAE to produce probabilistic representations of the data, these representations can be used for clustering:

a) Obtaining latent representations: Skip the entire dataset through the encoder to get the distribution parameters and and then sample the latent representations : b) Clustering: We apply the clustering algorithm to the sampled hidden representations for cluster tagging :

A variational autoencoder (VAE) is a probabilistic model that uses latent variables to represent data. After training, it generates latent representations for clustering and effectively handles complex, multi-level data structures.

4.4. Internal Clustering Algorithms to Hidden Representations

The following algorithms were used to cluster the hidden representations: k-means, spectral clustering, Affinity Propagation and Gaussian Mixture Model.

k-means [14]. K-means clustering is a vector quantization method that divides n observations into k clusters by assigning each to the nearest centroid, which represents the cluster. This partitions the data space into Voronoi cells. The algorithm minimizes intra-cluster variance (sum of squared Euclidean distances), unlike the more complex Weber problem, which minimizes Euclidean distances. For greater accuracy in Euclidean distance minimization, kmedian or k-medoid methods can be used.

Explanation:

Step 1: Initialises the centroids by randomly selecting points from the data. Cycle While: Iteratively performs the following steps until convergence or until the maximum number of iterations is reached: Step 2: Assigns each data point to the nearest centroid, forming clusters.

Step 3: Updates the centroids by computing the average of all points assigned to each cluster. Step 4: Checks convergence by evaluating whether the change in centroids is less than a given tolerance If the change is small enough or the maximum number of iterations is reached, the loop terminates. Return: Outputs the final clusters and their corresponding centroids.

Spectral Clustering [15]. Spectral clustering transforms data into a lower-dimensional space using eigenvectors of the Laplace matrix derived from a similarity graph, then applies standard clustering methods like k-means.

Step 1: Constructs the Laplacian matrix from the similarity matrix.

Step 2: Computes the first k eigenvectors of the Laplacian matrix.

Step 3: Forms a matrix U from these eigenvectors.

Step 4: Normalizes the rows of U.

Step 5: Applies the k-means algorithm to cluster the rows of U.

While loop: Iteratively refines the clustering by recalculating centroids and reassigning points until clusters stabilize.

Gaussian Mixture Model [16]. A mixture model is a probabilistic approach for representing subpopulations in data without identifying their membership. In model-based clustering, such as Gaussian Mixture Models (GMMs), data are modeled as a mixture of parametric distributions, with each cluster represented by a separate Gaussian distribution.

Explanation:

Step 1: Initializes the parameters of the Gaussian components: Mixing coefficients

determine the proportion of each component.

Means

represent the center of each Gaussian component.

Covariance matrices describe the spread of each component.

While loop: Iteratively performs the Expectation-Maximization (EM) steps until convergence or the maximum number of iterations is reached.

E-Step: Calculates the responsibilities

, which represent the probability that data point belongs to component .

M-Step: Updates the parameters based on the current responsibilities.

Check Convergence: The algorithm checks if the change in log-likelihood is below a certain tolerance

Return: After convergence, returns the final parameters of the Gaussian components.

The Affinity Propagation (AP) clustering algorithm is a method that identifies cluster centers (exemplars) through message passing between data points. Unlike traditional methods such as K-means, AP does not require the number of clusters to be specified in advance.

The Affinity Propagation (AP) clustering algorithm automatically finds clusters by determining exemplars (cluster centers) through message passing between data points. It uses a similarity matrix S, where S (i , j ) measures how similar points i and j are. The diagonal values S (i , j ) indicate the preference of each point to be chosen as an exemplar.

The process involves updating two matrices: the "responsibility" matrix R, which shows how suitable point j is as a center for point i, and the "availability" matrix A, which reflects how appropriate point j is as a center considering all other points. These matrices are updated iteratively until convergence. The number of clusters is determined automatically based on the data, making AP useful for tasks where the number of clusters is not known in advance.

4.5. Clustering Quality Assessment

Silhouette Score. The silhouette index evaluates cluster compactness and separation, ranging from -1 (misclassified) to 1 (well-separated), with 0 indicating overlap. It is defined for each object i as: ( 1 ) Average internal distance :

This is the average distance from the object to all other objects in the same cluster

Silhouette index :

For each object its silhouette is defined as: where is object spacing and

(for example, the Euclidean distance). ( 2 ) Average inter-cluster distance

This is the average distance from the object which the object is not included. to all facilities in the nearest cluster

, into ( 13 ) ( 14 ) ( 15 ) ( 16 ) If

close to 1, then the object is well categorised if is close to 0, then the object is on the boundary between clusters. If close to -1, then the object was probably misclassified. ( 3 ) Average silhouette index for the entire dataset:

The overall silhouette for the entire clustering is calculated as the average of the all facilities: across where is the total number of objects in the dataset.

The Silhouette Plot evaluates clustering quality by showing silhouette values for objects within clusters. Positive values (close to 1) indicate well-separated clusters, while negative values suggest misclassified objects.

Davis-Bouldin Index, ( DBI) is a metric for assessing the quality of clustering [18]. This index measures the average "similarity" of each cluster to its most similar cluster, and the smaller the DBI value, the better separated the clusters are. The Davis-Bouldin index is based on two key concepts: the spread within a cluster and the distance between clusters. ( 1 ) Scatter intra-cluster , where is distance (e.g., Euclidean) between the point and the centroid This is the distance between the centroids of the two clusters and

measures the average distance between all points within a cluster and its centroid ( 3 ) Similarity Index between two clusters :

It is defined as the ratio of the sum of the spreads of two clusters to the distance between them. (17) (18) (19) (20) (21) ( 4 ) Davis-Bouldin index for the cluster

: This is the maximum of the similarity index with any other cluster. ( 5 ) Davis-Bouldin Index (DBI) for the entire clustering: This is the average across all clusters.

, where is total number of clusters.

Adjusted Rand Index (ARI). The Rand Index (RI) evaluates clustering quality by comparing predicted clusters to true labels. Adjusted for random matches, RI ranges from -1 to 1, with 1 indicating perfect clustering [19].

The Adjusted Rand Index (ARI) corrects for this shortcoming by providing a normalized value that accounts for random matches.

Formal definition of adjusted Rand index 1. Matching Matrix is Suppose we have two partitions of a dataset: - true cluster labels (ground truth) and is predicted cluster labels.

2.Let's construct a matching matrix, where each element of shows the number of points falling simultaneously into the cluster partitioned and cluster partitioned .2. Rand Index (RI) Formula: The Rand index measures the proportion of point pairs that either belong to the same clusters or different clusters in both partitions [21].

Adjusted Rand Index (ARI): (22) (23) (24)

ARI accounts for the probability of random matches by normalizing the Rand index. The formula is as follows: where: is total number of data points; is line amount matching matrix (i.e. the number of points in the cluster partitioned ; is sum on the column of the matching matrix (i.e. the number of points in the cluster partitioned ; is total number of object pairs.

ARI = 1: Complete correspondence between predicted and true partitioning (perfect clustering); ARI = 0: Clustering is no better than random partitioning; ARI < 0: The result is worse than random clustering.

Mutual Information (MI) and its corrected version, Adjusted Mutual Information ( AMI), are metrics used to assess the quality of clustering. They measure how well one partitioning of data (clustering) agrees with another, taking into account information common to both partitions [16].

Mutual Information between two partitions and (e.g., true partitioning and predicted partitioning) measures the amount of information common to both partitions.

Definition: Reciprocal information measures the extent to which knowledge of partitioning reduces the uncertainty about the partitioning .

and to belongs to the cluster to and cluster

respectively.

is probability that a randomly selected object belongs simultaneously to the to and cluster to ; is marginal probabilities that the object

MI takes values from 0 to .The value 0 means that the partitions and are independent (no common information). Higher MI values mean more dependence between partitions, i.e. better cluster matching.

Adjusted Mutual Information (AMI)

AMI — is a corrected version of MI that accounts for the probability of random matches between partitions. It is a normalized metric that removes the positive bias of MI. AMI is calculated as follows:

is mathematical expectation of mutual information between random partitions; и is partition entropies and respectively.

AMI takes values from -1 to 1.

AMI = 1: Full correspondence between the partitions.

AMI = 0: Conformity is no better than casual conformity.

AMI < 0: Conformity is worse than random (which is extremely rare).

5. Experiments and Results

Experimental results obtained on clustering when applying autoencoders

Silhouette Score:The values range from 0.3563 to 0.5869.The best result is for k-means clustering algorithm, indicating good cluster separation.The lowest result is for Gaussian Mixture Model (GMM), indicating weaker separation.

Davis-Bouldin Index (DBI): The values range from 0.9561 to 1.0979.GMM has the lower value which indicates better clustering.

Adjusted Rand Index (ARI): he values range from 0.3622 to 0.8681.The best result is again for k-means, which confirms the high quality of clustering compared to true labels. Mutual information indices (MI) and AMI: MI: 0.4296 до 0.9331. AMI: 0.3933 до 0.8681. The in-house k-means algorithm performs better on both MI and AMI.

Experimental results obtained on IRISDATA clustering using VAE variational autoencoders Adjusted Mutual Information (AMI) 0,6732217 0,9988075 0,6813073 0,7900837 0,6813073 0,1494412 0,6951292

0,1494412 0,9601669

0,4153072 0,3901078

0,6516407 0,4664558

0,7766257 0,4253811 0,7272799 1,1864 0,4078 0,4599 0,4323 Variational Autoencoders (VAE) Silhouette Index:The values range from 0.0149 to 0.6951 Affinity Propagation has the best value which shows good cluster separation.

Davis-Bouldin Index: Values from 0.0415 to 1.1864. Affinity Propagation has the best DBI value indicating better clustering results.

Adjusted Rand Index (ARI): Values from 0.0408 to 0.6813. k-means and Affinity Propagation show the best results.

Mutual Information Index (MI) and AMI:MI: 0.0459 to 0.7901. AMI: 0.0432 to 0.6813. kmeans shows the best score for MI and AMI.

Autoencoders show better clustering performance, especially when using k-means, as shown by high Silhouette Score, ARI, MI and AMI values.

Variational autoencoders show competitive results, especially when using Affinity Propagation, although in some cases (e.g., GMM) the quality of clustering is lower, as indicated by the low values of the metrics. Thus, when comparing the two approaches, autoencoders (especially when combined with k-means) show more stable and better results in clustering tasks compared to variational autoencoders.

Experimental results obtained on IRISDATA clustering using autoencoders Experimental results obtained on IRISDATA clustering using VAE variational autoencoders Evaluation of clustering results based on Silhouette Index on graphical representations Autocoders: Figure 1: k-means. The k-means silhouette index indicates well-separated, high-quality clusters, with most samples showing high positive values and balanced cluster sizes. Figure 2: Spectral clustering. Spectral clustering shows less distinct separation and less balanced cluster sizes than k-means, with a lower Silhouette Index but still good quality. Figure 3: Affinity Propagation. Clustering results using Affinity Propagation show a low Silhouette Index, indicating less clear separation of clusters. The clusters overlap, making them difficult to interpret.

Figure 4: Gaussian Mixture Model (GMM). The GMM silhouette index is lower than k-means and spectral clustering, indicating poor separation with significant overlap and many negative values, particularly in one cluster.

Variational autoencoders (VAE):

6. Conclusion

Classical autoencoders (AE) generally perform more stably and yield better clustering results than variational autoencoders (VAE). K-means is most effective with AE, while Affinity Propagation works best with VAE, particularly for cluster separation. GMM shows the worst performance for both. Silhouette Score graphs confirm AE's better separation, especially with kmeans and spectral clustering, while VAE shows more variability, performing well with some algorithms (e.g., Affinity Propagation) and poorly with others (e.g., spectral clustering). The study highlights the need for tailored choices of autoencoder and clustering algorithm, especially for small datasets like Iris, and underscores the importance of using multiple metrics for a comprehensive evaluation.

7. FUTURE RESEARCH

Ours will be directed toward researching ref data and problems. In particular, we plan to use these algorithms to predict the toxic properties of different pesticides depending on their structural 3D formulas. [17] Peter J. Rousseeuw. "Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis". Computational and Applied Mathematics. 20, (1987): 53–65. https://doi:10.1016/0377-0427(87)90125-7. [18] Davies, David L.; Bouldin, Donald W. "A Cluster Separation Measure". IEEE Transactions on Pattern Analysis and Machine Intelligence. PAMI-1 ( 2 ): 1987, pp. 224–227. https://doi:10.1109/TPAMI.1979.4766909. S2CID 13254783. [19] Gates A. J., Ahn Y. Y. The impact of random models on clustering similarity. Journal of Machine Learning Research. Т. 18. №. 87. (2017): 1-28.

https://doi.org/10.48550/arXiv.1701.06508 [20] Vinh, N. X.; Epps, J.; Bailey, J. "Information theoretic measures for clusterings comparison".

Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09. 2009, p. 1. https://doi.org/10.1145/1553374.1553511. ISBN 9781605585161

[1] Hinton

G.E

, Salakhutdinov

.R. “ Reducing the dimensionality of data with neural networks” . Science. 2006 Jul 28 ; 313 ( 5786 ), ( 2006 ): 504 - 7 . https://doi.org/10.1126/science.1127647. PMID: 16873662

[2] Xie , J. , Girshick , R. , & Farhadi , A. Unsupervised deep embedding for clustering analysis , in: International conference on machine learning. PMLR . 2016 , pp. 478 - 487 . https://doi.org/10.48550/arXiv.1511.06335

[3] Kingma , D. P. , & Welling , M. Auto-encoding variational bayes . arXiv preprint arXiv:1312.6114 . 2013 . https://doi.org/10.48550/arXiv.1312.6114

[4] Dizaji , K. G. , Herandi , A. , Deng , C. , Cai , W. , & Huang , H. Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization , in: Proceedings of the IEEE international conference on computer vision , 2017 , pp. 5736 - 5745 . https://doi.org/10.48550/arXiv.1704.06327

[5] Makhzani , A. , Shlens , J. , Jaitly , N. , Goodfellow , I. , & Frey , B. Adversarial autoencoders . arXiv preprint arXiv:1511.05644 . 2015 . https://doi.org/10.48550/arXiv.1511.05644

[6] Guo , X. , Gao , L. , Liu , X. , & Yin , J. Improved Deep Embedded Clustering with Local Structure Preservation , in: International Joint Conference on Artificial Intelligence (IJCAI) . 2017 , pp. 1753 - 1759 . URL https://www.ijcai.org/proceedings/2017/243

[7] Yang , B. , Fu , X. , Sidiropoulos , N. D. , & Hong , M. Towards K-means-friendly spaces: Simultaneous deep learning and clustering , in: International Conference on Machine Learning , PMLR.. 2017 , pp. 3861 - 3870 . URL: http://proceedings.mlr.press/v70/yang17b.html

[8] Song , C. , Liu , F. , Huang , Y. , Wang , L. , & Tan , T. Auto-encoder based data clustering , in: Iberoamerican Congress on Pattern Recognition . Springer, Berlin, Heidelberg, pp. 117 - 124 . URL: https://link.springer.com/chapter/10.1007/978-3- 642 -41822-8_ 15

[9] Kingma , D. P. , & Welling , M. Auto-Encoding Variational Bayes . 2014 . https://arxiv.org/abs/1312.6114

[10] Jiang , Z. , Zheng , Y. , Tan , H. , Tang , B. , & Zhou , H. Variational deep embedding: An unsupervised and generative approach to clustering , 2017 . https://arxiv.org/abs/1611.05148

[11] Dilokthanakul , N. , Mediano , P. A. M. , Garnelo , M. , Lee , M. C. H. , Salimbeni , H. , Arulkumaran , K. , & Shanahan , M. Deep unsupervised clustering with Gaussian mixture variational autoencoders , 2016 . https://arxiv.org/abs/1611.02648

[12] Zhang , L. , Jiang , Z. , Zhang, Y. , & Zhou , H. Variational Autoencoder with Deep Embedding: A Generative Approach for Clustering, 2019 . https://arxiv.org/abs/ 1906 .11242

[13] Hartigan , J. A. , & Wong , M. A. Algorithm AS 136: A k-means clustering algorithm . Journal of the Royal Statistical Society . Series C (Applied Statistics), 28 ( 1 ), ( 1979 ): 100 - 108 . https://doi.org/ 10.2307/2346830

[14] Ng , A. Y. , Jordan , M. I. , & Weiss , Y. On spectral clustering: Analysis and an algorithm . Advances in Neural Information Processing Systems , 14 , ( 2002 ): 849 - 856 . https://doi.org/10.7551/mitpress/1198.003.0104

[15] Fruhwirth-Schnatter , S. Finite Mixture and Markov Switching Models. Springer. 2006 . ISBN 978 -0- 387 -32909-3.

[16] Bishop , C. M. Pattern Recognition and Machine Learning . Springer. (Chapter 9: Mixture Models and EM) , 2006 . https://doi.org/10.5555/1162264