<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Comparison of the efficiency of autoencoders for solving clustering problems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Volodymyr</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lytvynenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olszewski</string-name>
          <email>Olszewski.Serge@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Volodymyr</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Osypenko</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Violeta</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Demchenko</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daria Dontsova</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kherson National Technical Unіversity</institution>
          ,
          <addr-line>st. Instytutska, 11 Khmelnytskyi 29016</addr-line>
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kyiv National University of Technology and Design</institution>
          ,
          <addr-line>Mala Shyianovska str., 2,01011 Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Saksaganskogo Str.</institution>
          ,
          <addr-line>75, 01033 Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>State Institution «Kundiiev Institute of Occupational Health of the National Academy of Medical Sciences of Ukraine»</institution>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Taras Shevchenko National University of Kyiv</institution>
          ,
          <addr-line>64/13, Volodymyrska Street, 01601 Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This study compares the effectiveness of autoencoders and variational autoencoders for clustering tasks, using the Iris dataset with k-means, spectral clustering, Affinity Propagation, and Gaussian Mixture Model. Clustering quality was assessed with metrics like the Silhouette Index, DaviesBouldin Index, Adjusted Rand Index, and Mutual Information. Results showed that classical autoencoders performed more reliably and effectively, particularly with k-means, while variational autoencoders excelled with Affinity Propagation. The Gaussian Mixture Model was the least effective for both types. The study underscores the importance of choosing the right autoencoder and clustering algorithm based on the task and data structure, paving the way for future research on more complex datasets.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;autoencoders</kwd>
        <kwd>variational autoencoders</kwd>
        <kwd>k-means</kwd>
        <kwd>affinity propagation</kwd>
        <kwd>spectral clustering</kwd>
        <kwd>Gaussian mixture model</kwd>
        <kwd>silhouette score</kwd>
        <kwd>Davis-Bouldin index</kwd>
        <kwd>adjusted rand index</kwd>
        <kwd>mutual information</kwd>
        <kwd>latent space</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In recent decades, data volume has grown exponentially across various fields, becoming
complex, multidimensional, and irregular, necessitating advanced analysis methods. Classical
clustering methods like k-means often struggle with such data, especially with nonlinear
dependencies or noise. Classical autoencoders, a type of neural network for dimensionality
reduction, address this by compressing input data into a lower-dimensional latent space and
reconstructing it, enabling more accurate clustering. Autoencoders automate feature selection
by representing data in a low-dimensional space where clusters are more defined. They are used
in fields such as image and text analysis, time series, and biomedical data, improving the
extraction of valuable insights.</p>
      <p>Autoencoders can integrate with other methods like variational autoencoders and GANs,
enhancing adaptability for complex tasks. They are especially effective for high-dimensional
data, addressing the "curse of dimensionality" by compressing data for better clustering. The
growth in GPU and computing power has made training these models more accessible, boosting
their relevance in modern data analysis. The need to analyze complex data, extract features, and
integrate deep learning methods highlights the importance of autoencoders for achieving
accurate results.</p>
      <p>The main contributions of this paper are as follows: a)A comprehensive comparative analysis
of the effectiveness of classical (AE) and variational (VAE) autoencoders for clustering tasks,
applying various algorithmic approaches; b) Empirical evidence of the superiority of classical
autoencoders in terms of stability and clustering quality, especially in combination with the
kmeans algorithm; c)The high efficiency of the Affinity Propagation algorithm in combination
with VAE, indicating its potential for specific types of data;d)The implementation of a
multicriteria approach to clustering quality assessment, integrating several metrics, which provided a
deep understanding of the effectiveness of the studied methods.</p>
      <p>The rest of the paper is structured as follows. Section 2 provides a literature review, Section 3
presents the problem statement. Section 4 describes the materials and methods used in this
study. Section 5 presents the results of testing the proposed clustering methods. Finally, Section
6 concludes the study.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Review of the literature</title>
      <p>Autoencoders are effectively used for clustering by leveraging their latent representations.
Clustering groups objects based on similarities, and autoencoders help extract latent data
features and reduce dimensionality before clustering. Hinton and Salakhutdinov [1] introduced
a neural network methodology for dimensionality reduction, showing that autoencoders create
compact latent representations suitable for clustering. This foundational work advanced the use
of autoencoders in data clustering. The Deep Embedded Clustering (DEC) model [2] uses an
autoencoder to learn latent data representations and cluster them, jointly optimizing data
reconstruction and cluster distribution for more accurate and stable results. Xie and colleagues'
work was a significant advancement in autoencoder-based clustering methods. Variational
autoencoders (VAEs) [3] are key in data analysis for modeling complex distributions and
uncertainty. They provide a theoretical foundation for creating latent representations, enabling
effective data clustering and making them popular for generative modeling and clustering tasks.
In [4], the authors proposed a method combining convolutional autoencoders with deep
learning for clustering, enhancing feature extraction and clustering quality, especially for
complex data like images. In [5], Adversarial Autoencoders (AAEs) were introduced, combining
autoencoders with GANs to create more structured latent representations, enhancing clustering
and generative capabilities for data analysis. The paper [6] introduces the Deep Embedded
Clustering (DEC) method, noting its limitations in preserving local data structure. An improved
method, IDEC, addresses this by combining clustering with local structure preservation through
a new objective function. Results demonstrate IDEC's superiority across metrics, with
visualizations highlighting enhanced local structure retention and clustering performance. This
work underscores the importance of local structure in deep clustering and offers a practical
enhancement. The article [7] introduces the Deep Clustering Network (DCN), which combines
deep learning with the K-means algorithm to perform nonlinear data mapping and clustering
simultaneously. DCN outperforms classical and modern methods by transforming data into a
more "K-means-friendly" space, improving clustering quality and advancing data analysis
through the integration of deep learning with traditional clustering. The article [8] introduces a
clustering approach combining autoencoders and traditional algorithms. It employs
autoencoders for nonlinear dimensionality reduction, applies K-means to the hidden
representation, and iteratively optimizes parameters. The method excels in handling complex,
nonlinear data, automatically extracting relevant features and outperforming traditional
clustering techniques. The study highlights the potential of integrating autoencoders into
unsupervised analysis and discusses variational autoencoders (VAEs), which model
probabilistic distributions and create latent representations for clustering and generative tasks.
Variational Autoencoders (VAEs), introduced by Kingma and Welling [9], model the latent
space as a probabilistic distribution, enabling data generation and effective organization in the
latent space. This makes VAEs ideal for clustering complex, high-dimensional data. The
Variational Deep Embedding (VaDE) method [10] combines VAEs with clustering by modeling
latent variables as a Gaussian mixture, enabling direct clustering in the latent space. It
outperforms traditional methods like K-means, particularly on complex datasets. The Gaussian
Mixture Variational Autoencoder (GMVAE) [11] combines variational autoencoders with
Gaussian mixtures, modeling the latent space as a Gaussian mixture to enhance clustering and
handle complex data structures. In [12], the authors enhance VaDE by combining variational
autoencoders with deep embedding, improving clustering for high-dimensional data like images
and texts. Autoencoders effectively extract and utilize latent representations for clustering. This
study compares classical and variational autoencoders for clustering, analyzing their
integration with algorithms like K-means, spectral clustering, GMM, and Affinity Propagation.
It aims to identify their strengths, limitations, and suitability for different data types and
requirements. Clustering quality will be evaluated using metrics such as silhouette,
DaviesBouldin Index, Adjusted Rand Index, and Mutual Information, providing objective insights and
practical recommendations.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Problem statement</title>
      <p>A formal statement of the problem can be formulated as follows:</p>
      <p>Given:1. Data set D = {x₁, x₂, ..., xₙ}, where xᵢ ∈ ℝᵐ is m-dimensional vector of features.
2. Multiple clustering methods based on autoencoders M = {M₁, M₂, ..., Mₖ}, where each
method Mᵢ can be a classical or variation autoencoder.</p>
      <p>3. Set of clustering algorithms A = {K-means, Spectral Clustering, Gaussian Mixture Model,
Affinity Propagation}.</p>
      <p>4. Set of clustering quality metrics Q = {Silhouette Score, Davies-Bouldin Index, Adjusted
Rand Index, Mutual Information, Adjusted Mutual Information}.</p>
      <p>Required: 1) For each method Mᵢ ∈ M and algorithm Aⱼ ∈ A: a) Learn the model Mᵢ on the data
D; b) Apply the algorithm Aⱼ to the latent representation of the data obtained by the Mᵢ; c)
Calculate the values of all quality metrics q ∈ Q for the resulting clustering; 2) Carry out a
comparative analysis of the results obtained: a) Evaluate the effectiveness of each combination
(Mᵢ, Aⱼ) across all metrics q ∈ Q; b) Identify the advantages and limitations of each method Mᵢ ∈
M; c) Identify the most effective combinations of methods and algorithms for different types of
data and clustering tasks; 3) To formulate recommendations for selecting the optimal
autoencoder based clustering method and clustering algorithm depending on data
characteristics and clustering quality requirements.</p>
      <p>Limitations and assumptions: 1. The true cluster labels for the dataset D are assumed to be
unknown (unsupervised learning problem); 2. The number of clusters K is assumed to be given
or determined automatically depending on the clustering algorithm used; 3. Computational
resources and model training time are not considered as limiting factors in this problem
formulation. This formal problem statement covers all aspects mentioned in the research
objective and provides a clear structure for a comparative analysis of autoencoder based
clustering methods.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Materials and Methods</title>
      <p>4.1. Data
The IRIS dataset (https://www.kaggle.com/uciml/iris) was chosen for comparing classical (AE)
and variational (VAE) autoencoders in clustering due to its known structure, interpretability,
and manageable size, enabling efficient training and testing. Its explicit clustering structure
(three classes) allows assessment of the models' ability to extract informative latent
representations, making it an ideal baseline for such analyses.</p>
      <sec id="sec-4-1">
        <title>4.2. Autoencodes</title>
        <p>A classical autoencoder encodes data into a compact representation and reconstructs it,
extracting features for clustering. Below is its mathematical description and application in
clustering:Autoencoder
Architecture. The autoencoder consists of two main
parts:</p>
        <p>Encoder: Function
converts the input data
in the hidden view</p>
        <p>.
,
where</p>
        <p>— encoder parameters.</p>
        <p>Decoder: Function
reconstructs the original data
from the hidden view
.
where is refurbished entrance, and is decoder parameters.</p>
        <p>Loss function. The autoencoder is trained using a loss function that measures the difference
between the original data and recovered data . Typically, the standard error of mean square
error is used (MSE):
where is number of examples in the batches.</p>
        <p>Application for clustering. After training the autoencoder and obtaining hidden
representations for all the data, one can use these representations for clustering. The process
can be described as follows:</p>
        <p>
          a) Obtaining latent representations: Skip the entire dataset through the encoder to get
the hidden views :
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
b) Clustering: Apply a clustering algorithm such as K-means or other method to the hidden
representations for cluster tagging :
where is parameters of the clustering algorithm.
        </p>
        <p>
          Thus, a classical autoencoder can be described as a model that first encodes data into a hidden
representation and then reconstructs data from that representation. Hidden representations
obtained from the encoder are used for clustering, which are then clustered using a clustering
algorithm.
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.3. Variation Autoencoder</title>
        <p>Variational Autoencoder (VAE) is a probabilistic model that encodes data as a probability
distribution rather than a fixed representation. In clustering, VAE generates compact,
informative data representations for clustering, with an encoder and decoder, differing from the
classical autoencoder. instead of a pointwise latent representation distribution is used
.</p>
        <p>Encoder. The encoder converts the input data into distribution parameters (it's usually an
average
) and standard deviation</p>
        <p>):
where
parameters .</p>
        <p>and</p>
        <p>are parameters that depend on the input data and network
Decoder. The decoder recovers data from a hidden view
, which was sampled from the
distribution</p>
        <p>:
where is a distribution (e.g., Gaussian) defined by the decoder.</p>
        <p>The variational autoencoder is trained by minimizing the two components of the loss function:
1. Reconstructive error (RMS error or cross-entropy) between the original data and the
reconstructed data :</p>
        <p>2. Kulbak-Leibler divergence (KL- divergence) between the posterior distribution
and a priori distribution</p>
        <p>, which is usually assumed to be the standard normal distribution
:</p>
        <p>
          Complete loss function:
where
is parameters of the clustering algorithm (e.g., K-means).
(
          <xref ref-type="bibr" rid="ref7">7</xref>
          )
(
          <xref ref-type="bibr" rid="ref8">8</xref>
          )
(
          <xref ref-type="bibr" rid="ref9">9</xref>
          )
(
          <xref ref-type="bibr" rid="ref10">10</xref>
          )
(
          <xref ref-type="bibr" rid="ref11">11</xref>
          )
(
          <xref ref-type="bibr" rid="ref12">12</xref>
          )
where
        </p>
        <p>is the weighting factor that can be adjusted to control for the effect of KL-divergence
(e.g. in the model -VAE).</p>
        <p>After training VAE to produce probabilistic representations of the data, these
representations can be used for clustering:</p>
        <p>a) Obtaining latent representations: Skip the entire dataset through the encoder to get
the distribution parameters
and
and then sample the latent representations
:
b) Clustering: We apply the clustering algorithm to the sampled hidden representations
for cluster tagging :</p>
        <p>A variational autoencoder (VAE) is a probabilistic model that uses latent variables to
represent data. After training, it generates latent representations for clustering and effectively
handles complex, multi-level data structures.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.4. Internal Clustering Algorithms to Hidden Representations</title>
        <p>The following algorithms were used to cluster the hidden representations: k-means, spectral
clustering, Affinity Propagation and Gaussian Mixture Model.</p>
        <p>k-means [14]. K-means clustering is a vector quantization method that divides n
observations into k clusters by assigning each to the nearest centroid, which represents the
cluster. This partitions the data space into Voronoi cells. The algorithm minimizes intra-cluster
variance (sum of squared Euclidean distances), unlike the more complex Weber problem, which
minimizes Euclidean distances. For greater accuracy in Euclidean distance minimization,
kmedian or k-medoid methods can be used.</p>
        <sec id="sec-4-3-1">
          <title>Explanation:</title>
          <p>Step 1: Initialises the centroids by randomly selecting  points from the data. Cycle While:
Iteratively performs the following steps until convergence or until the maximum number of
iterations is reached: Step 2: Assigns each data point to the nearest centroid, forming clusters.</p>
          <p>Step 3: Updates the centroids by computing the average of all points assigned to each cluster.
Step 4: Checks convergence by evaluating whether the change in centroids is less than a given
tolerance  If the change is small enough or the maximum number of iterations is reached, the
loop terminates. Return: Outputs the final clusters and their corresponding centroids.</p>
          <p>Spectral Clustering [15]. Spectral clustering transforms data into a lower-dimensional
space using eigenvectors of the Laplace matrix derived from a similarity graph, then applies
standard clustering methods like k-means.</p>
          <p>Step 1: Constructs the Laplacian matrix from the similarity matrix.</p>
          <p>Step 2: Computes the first k eigenvectors of the Laplacian matrix.</p>
          <p>Step 3: Forms a matrix U from these eigenvectors.</p>
          <p>Step 4: Normalizes the rows of U.</p>
          <p>Step 5: Applies the k-means algorithm to cluster the rows of U.</p>
          <p>While loop: Iteratively refines the clustering by recalculating centroids and reassigning
points until clusters stabilize.</p>
          <p>Gaussian Mixture Model [16]. A mixture model is a probabilistic approach for
representing subpopulations in data without identifying their membership. In model-based
clustering, such as Gaussian Mixture Models (GMMs), data are modeled as a mixture of
parametric distributions, with each cluster represented by a separate Gaussian distribution.</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>Explanation:</title>
          <p>Step 1: Initializes the parameters of the Gaussian components:
Mixing coefficients</p>
          <p>determine the proportion of each component.</p>
          <p>Means</p>
          <p>represent the center of each Gaussian component.</p>
          <p>Covariance matrices describe the spread of each component.</p>
          <p>While loop: Iteratively performs the Expectation-Maximization (EM) steps until
convergence or the maximum number of iterations is reached.</p>
          <p>E-Step: Calculates the responsibilities</p>
          <p>, which represent the probability that data point
belongs to component .</p>
          <p>M-Step: Updates the parameters based on the current responsibilities.</p>
          <p>Check Convergence: The algorithm checks if the change in log-likelihood is below a
certain tolerance</p>
          <p>Return: After convergence, returns the final parameters of the Gaussian components.</p>
          <p>The Affinity Propagation (AP) clustering algorithm is a method that identifies cluster
centers (exemplars) through message passing between data points. Unlike traditional methods
such as K-means, AP does not require the number of clusters to be specified in advance.</p>
          <p>The Affinity Propagation (AP) clustering algorithm automatically finds clusters by
determining exemplars (cluster centers) through message passing between data points. It uses a
similarity matrix S, where S (i , j ) measures how similar points i and j are. The diagonal values
S (i , j ) indicate the preference of each point to be chosen as an exemplar.</p>
          <p>The process involves updating two matrices: the "responsibility" matrix R, which shows how
suitable point j is as a center for point i, and the "availability" matrix A, which reflects how
appropriate point j is as a center considering all other points. These matrices are updated
iteratively until convergence. The number of clusters is determined automatically based on the
data, making AP useful for tasks where the number of clusters is not known in advance.</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>4.5. Clustering Quality Assessment</title>
        <p>
          Silhouette Score. The silhouette index evaluates cluster compactness and separation, ranging
from -1 (misclassified) to 1 (well-separated), with 0 indicating overlap. It is defined for each
object i as:
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) Average internal distance :
        </p>
        <p>This is the average distance from the object to all other objects in the same cluster</p>
        <sec id="sec-4-4-1">
          <title>Silhouette index :</title>
          <p>For each object its silhouette is defined as:
where
is object spacing
and</p>
          <p>
            (for example, the Euclidean distance).
(
            <xref ref-type="bibr" rid="ref2">2</xref>
            ) Average inter-cluster distance
          </p>
          <p>:</p>
          <p>This is the average distance from the object
which the object is not included.
to all facilities in the nearest cluster</p>
          <p>
            , into
(
            <xref ref-type="bibr" rid="ref13">13</xref>
            )
(
            <xref ref-type="bibr" rid="ref14">14</xref>
            )
(
            <xref ref-type="bibr" rid="ref15">15</xref>
            )
(
            <xref ref-type="bibr" rid="ref16">16</xref>
            )
If
          </p>
          <p>
            close to 1, then the object is well categorised if is close to 0, then the object is on the
boundary between clusters. If close to -1, then the object was probably misclassified.
(
            <xref ref-type="bibr" rid="ref3">3</xref>
            ) Average silhouette index for the entire dataset:
          </p>
          <p>The overall silhouette for the entire clustering is calculated as the average of the
all facilities:
across
where is the total number of objects in the dataset.</p>
          <p>The Silhouette Plot evaluates clustering quality by showing silhouette values for objects
within clusters. Positive values (close to 1) indicate well-separated clusters, while negative
values suggest misclassified objects.</p>
          <p>
            Davis-Bouldin Index, ( DBI) is a metric for assessing the quality of clustering [18]. This
index measures the average "similarity" of each cluster to its most similar cluster, and the
smaller the DBI value, the better separated the clusters are. The Davis-Bouldin index is based on
two key concepts: the spread within a cluster and the distance between clusters.
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            ) Scatter intra-cluster
,
where
is distance (e.g., Euclidean) between the point
and the centroid
This is the distance between the centroids of the two clusters
and
          </p>
          <p>
            measures the average distance between all points within a cluster and its
centroid
(
            <xref ref-type="bibr" rid="ref3">3</xref>
            ) Similarity Index between two clusters :
          </p>
          <p>
            It is defined as the ratio of the sum of the spreads of two clusters to the distance between
them.
(17)
(18)
(19)
(20)
(21)
(
            <xref ref-type="bibr" rid="ref4">4</xref>
            ) Davis-Bouldin index for the cluster
          </p>
          <p>
            :
This is the maximum of the similarity index
with any other cluster.
(
            <xref ref-type="bibr" rid="ref5">5</xref>
            ) Davis-Bouldin Index (DBI) for the entire clustering:
This is the average
across all clusters.
          </p>
          <p>,
where is total number of clusters.</p>
          <p>Adjusted Rand Index (ARI). The Rand Index (RI) evaluates clustering quality by
comparing predicted clusters to true labels. Adjusted for random matches, RI ranges from -1 to 1,
with 1 indicating perfect clustering [19].</p>
          <p>The Adjusted Rand Index (ARI) corrects for this shortcoming by providing a normalized
value that accounts for random matches.</p>
          <p>Formal definition of adjusted Rand index
1. Matching Matrix is Suppose we have two partitions of a dataset: - true cluster labels
(ground truth) and is predicted cluster labels.</p>
          <p>2.Let's construct a matching matrix, where each element of
shows the number of points
falling simultaneously into the cluster partitioned and cluster partitioned .2. Rand
Index (RI) Formula: The Rand index measures the proportion of point pairs that either belong
to the same clusters or different clusters in both partitions [21].</p>
        </sec>
        <sec id="sec-4-4-2">
          <title>Adjusted Rand Index (ARI): (22) (23) (24)</title>
          <p>ARI accounts for the probability of random matches by normalizing the Rand index. The
formula is as follows: where:
is total number of data points;
is line amount
matching
matrix (i.e. the number of points in the cluster
partitioned
;
is sum on the column of the
matching matrix (i.e. the number of points in the cluster partitioned ; is total number
of object pairs.</p>
          <p>ARI = 1: Complete correspondence between predicted and true partitioning (perfect clustering);
ARI = 0: Clustering is no better than random partitioning; ARI &lt; 0: The result is worse than
random clustering.</p>
          <p>Mutual Information (MI) and its corrected version, Adjusted Mutual Information
( AMI), are metrics used to assess the quality of clustering. They measure how well one
partitioning of data (clustering) agrees with another, taking into account information common
to both partitions [16].</p>
          <p>Mutual Information between two partitions and (e.g., true partitioning and
predicted partitioning) measures the amount of information common to both partitions.</p>
          <p>Definition: Reciprocal information measures the extent to which knowledge of partitioning
reduces the uncertainty about the partitioning .</p>
          <p>and
to
belongs to the cluster
to
and cluster</p>
          <p>respectively.</p>
          <p>is probability that a randomly selected object belongs simultaneously to the
to
and cluster
to
;
is marginal probabilities that the object</p>
          <p>MI takes values from 0 to .The value 0 means that the partitions and
are independent (no common information). Higher MI values mean more dependence between
partitions, i.e. better cluster matching.</p>
          <p>Adjusted Mutual Information (AMI)</p>
          <p>AMI — is a corrected version of MI that accounts for the probability of random matches
between partitions. It is a normalized metric that removes the positive bias of MI. AMI is
calculated as follows:</p>
          <p>is mathematical expectation of mutual information between random
partitions; и is partition entropies and respectively.</p>
          <p>AMI takes values from -1 to 1.</p>
          <p>AMI = 1: Full correspondence between the partitions.</p>
          <p>AMI = 0: Conformity is no better than casual conformity.</p>
          <p>AMI &lt; 0: Conformity is worse than random (which is extremely rare).</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments and Results</title>
      <p>Experimental results obtained on clustering when applying autoencoders</p>
      <p>Silhouette Score:The values range from 0.3563 to 0.5869.The best result is for k-means
clustering algorithm, indicating good cluster separation.The lowest result is for Gaussian
Mixture Model (GMM), indicating weaker separation.</p>
      <p>Davis-Bouldin Index (DBI): The values range from 0.9561 to 1.0979.GMM has the lower value
which indicates better clustering.</p>
      <p>Adjusted Rand Index (ARI): he values range from 0.3622 to 0.8681.The best result is again for
k-means, which confirms the high quality of clustering compared to true labels.
Mutual information indices (MI) and AMI: MI: 0.4296 до 0.9331. AMI: 0.3933 до 0.8681. The
in-house k-means algorithm performs better on both MI and AMI.</p>
      <p>Experimental results obtained on IRISDATA clustering using VAE variational
autoencoders
Adjusted Mutual
Information (AMI)
0,6732217
0,9988075
0,6813073
0,7900837
0,6813073
0,1494412
0,6951292</p>
      <p>0,1494412
0,9601669</p>
      <p>0,4153072
0,3901078</p>
      <p>0,6516407
0,4664558</p>
      <p>0,7766257
0,4253811
0,7272799
1,1864
0,4078
0,4599
0,4323
Variational Autoencoders (VAE)
Silhouette Index:The values range from 0.0149 to 0.6951 Affinity Propagation has the best
value which shows good cluster separation.</p>
      <p>Davis-Bouldin Index: Values from 0.0415 to 1.1864. Affinity Propagation has the best DBI
value indicating better clustering results.</p>
      <p>Adjusted Rand Index (ARI): Values from 0.0408 to 0.6813. k-means and Affinity Propagation
show the best results.</p>
      <p>Mutual Information Index (MI) and AMI:MI: 0.0459 to 0.7901. AMI: 0.0432 to 0.6813.
kmeans shows the best score for MI and AMI.</p>
      <p>Autoencoders show better clustering performance, especially when using k-means, as shown
by high Silhouette Score, ARI, MI and AMI values.</p>
      <p>Variational autoencoders show competitive results, especially when using Affinity
Propagation, although in some cases (e.g., GMM) the quality of clustering is lower, as indicated
by the low values of the metrics. Thus, when comparing the two approaches, autoencoders
(especially when combined with k-means) show more stable and better results in clustering
tasks compared to variational autoencoders.</p>
      <p>Experimental results obtained on IRISDATA clustering using autoencoders
Experimental results obtained on IRISDATA clustering using VAE variational
autoencoders
Evaluation of clustering results based on Silhouette Index on graphical
representations Autocoders:
Figure 1: k-means. The k-means silhouette index indicates well-separated, high-quality
clusters, with most samples showing high positive values and balanced cluster sizes.
Figure 2: Spectral clustering. Spectral clustering shows less distinct separation and less
balanced cluster sizes than k-means, with a lower Silhouette Index but still good quality.
Figure 3: Affinity Propagation. Clustering results using Affinity Propagation show a low
Silhouette Index, indicating less clear separation of clusters. The clusters overlap, making them
difficult to interpret.</p>
      <p>Figure 4: Gaussian Mixture Model (GMM). The GMM silhouette index is lower than k-means
and spectral clustering, indicating poor separation with significant overlap and many negative
values, particularly in one cluster.</p>
      <p>Variational autoencoders (VAE):</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>Classical autoencoders (AE) generally perform more stably and yield better clustering results
than variational autoencoders (VAE). K-means is most effective with AE, while Affinity
Propagation works best with VAE, particularly for cluster separation. GMM shows the worst
performance for both. Silhouette Score graphs confirm AE's better separation, especially with
kmeans and spectral clustering, while VAE shows more variability, performing well with some
algorithms (e.g., Affinity Propagation) and poorly with others (e.g., spectral clustering). The
study highlights the need for tailored choices of autoencoder and clustering algorithm,
especially for small datasets like Iris, and underscores the importance of using multiple metrics
for a comprehensive evaluation.</p>
    </sec>
    <sec id="sec-7">
      <title>7. FUTURE RESEARCH</title>
      <p>
        Ours will be directed toward researching ref data and problems. In particular, we plan to use
these algorithms to predict the toxic properties of different pesticides depending on their
structural 3D formulas.
[17] Peter J. Rousseeuw. "Silhouettes: a Graphical Aid to the Interpretation and Validation of
Cluster Analysis". Computational and Applied Mathematics. 20, (1987): 53–65.
https://doi:10.1016/0377-0427(87)90125-7.
[18] Davies, David L.; Bouldin, Donald W. "A Cluster Separation Measure". IEEE Transactions
on Pattern Analysis and Machine Intelligence. PAMI-1 (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ): 1987, pp. 224–227.
https://doi:10.1109/TPAMI.1979.4766909. S2CID 13254783.
[19] Gates A. J., Ahn Y. Y. The impact of random models on clustering similarity. Journal of
Machine Learning Research. Т. 18. №. 87. (2017): 1-28.
      </p>
      <p>https://doi.org/10.48550/arXiv.1701.06508
[20] Vinh, N. X.; Epps, J.; Bailey, J. "Information theoretic measures for clusterings comparison".</p>
      <p>Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09.
2009, p. 1. https://doi.org/10.1145/1553374.1553511. ISBN 9781605585161</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Hinton</surname>
            <given-names>G.E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salakhutdinov</surname>
            <given-names>R</given-names>
          </string-name>
          .R. “
          <article-title>Reducing the dimensionality of data with neural networks”</article-title>
          .
          <source>Science. 2006 Jul</source>
          <volume>28</volume>
          ;
          <volume>313</volume>
          (
          <issue>5786</issue>
          ), (
          <year>2006</year>
          ):
          <fpage>504</fpage>
          -
          <lpage>7</lpage>
          . https://doi.org/10.1126/science.1127647. PMID:
          <volume>16873662</volume>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Farhadi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>Unsupervised deep embedding for clustering analysis</article-title>
          ,
          <source>in: International conference on machine learning.  PMLR</source>
          .
          <year>2016</year>
          , pp.
          <fpage>478</fpage>
          -
          <lpage>487</lpage>
          . https://doi.org/10.48550/arXiv.1511.06335
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D. P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Welling</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Auto-encoding variational bayes</article-title>
          .
          <source> arXiv preprint arXiv:1312.6114</source>
          .
          <year>2013</year>
          . https://doi.org/10.48550/arXiv.1312.6114
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Dizaji</surname>
            ,
            <given-names>K. G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herandi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization</article-title>
          ,
          <source>in: Proceedings of the IEEE international conference on computer vision</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>5736</fpage>
          -
          <lpage>5745</lpage>
          . https://doi.org/10.48550/arXiv.1704.06327
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Makhzani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shlens</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jaitly</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Frey</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <article-title>Adversarial autoencoders</article-title>
          .
          <source> arXiv preprint arXiv:1511.05644</source>
          .
          <year>2015</year>
          . https://doi.org/10.48550/arXiv.1511.05644
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Improved Deep Embedded Clustering with Local Structure Preservation</article-title>
          , in:
          <source>International Joint Conference on Artificial Intelligence (IJCAI)</source>
          .
          <year>2017</year>
          , pp.
          <fpage>1753</fpage>
          -
          <lpage>1759</lpage>
          . URL https://www.ijcai.org/proceedings/2017/243
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sidiropoulos</surname>
            ,
            <given-names>N. D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Hong</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Towards K-means-friendly spaces: Simultaneous deep learning and clustering</article-title>
          ,
          <source>in: International Conference on Machine Learning</source>
          , PMLR..
          <year>2017</year>
          , pp.
          <fpage>3861</fpage>
          -
          <lpage>3870</lpage>
          . URL: http://proceedings.mlr.press/v70/yang17b.html
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <article-title>Auto-encoder based data clustering</article-title>
          ,
          <source>in: Iberoamerican Congress on Pattern Recognition</source>
          . Springer, Berlin, Heidelberg, pp.
          <fpage>117</fpage>
          -
          <lpage>124</lpage>
          . URL: https://link.springer.com/chapter/10.1007/978-3-
          <fpage>642</fpage>
          -41822-8_
          <fpage>15</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D. P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Welling</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Auto-Encoding Variational Bayes</surname>
          </string-name>
          .
          <year>2014</year>
          . https://arxiv.org/abs/1312.6114
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>Variational deep embedding: An unsupervised and generative approach to clustering</article-title>
          ,
          <year>2017</year>
          . https://arxiv.org/abs/1611.05148
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Dilokthanakul</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mediano</surname>
            ,
            <given-names>P. A. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garnelo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>M. C. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salimbeni</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arulkumaran</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Shanahan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Deep unsupervised clustering with Gaussian mixture variational autoencoders</article-title>
          ,
          <year>2016</year>
          . https://arxiv.org/abs/1611.02648
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>Y.</given-names>
            , &amp;
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <surname>H. Variational</surname>
          </string-name>
          <article-title>Autoencoder with Deep Embedding: A Generative Approach</article-title>
          for Clustering,
          <year>2019</year>
          . https://arxiv.org/abs/
          <year>1906</year>
          .11242
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Hartigan</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Wong</surname>
            ,
            <given-names>M. A. Algorithm AS</given-names>
          </string-name>
          136:
          <article-title>A k-means clustering algorithm</article-title>
          .
          <source>Journal of the Royal Statistical Society</source>
          . Series C (Applied Statistics),
          <volume>28</volume>
          (
          <issue>1</issue>
          ), (
          <year>1979</year>
          ):
          <fpage>100</fpage>
          -
          <lpage>108</lpage>
          . https://doi.org/ 10.2307/2346830
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A. Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M. I.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <article-title>On spectral clustering: Analysis and an algorithm</article-title>
          .
          <source>Advances in Neural Information Processing Systems</source>
          ,
          <volume>14</volume>
          , (
          <year>2002</year>
          ):
          <fpage>849</fpage>
          -
          <lpage>856</lpage>
          . https://doi.org/10.7551/mitpress/1198.003.0104
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Fruhwirth-Schnatter</surname>
            ,
            <given-names>S. Finite</given-names>
          </string-name>
          <string-name>
            <surname>Mixture</surname>
          </string-name>
          and Markov Switching Models. Springer.
          <year>2006</year>
          .  ISBN 
          <fpage>978</fpage>
          -0-
          <fpage>387</fpage>
          -32909-3.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Bishop</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>M. Pattern Recognition and Machine Learning</article-title>
          .
          <source>Springer. (Chapter 9: Mixture Models and EM)</source>
          ,
          <year>2006</year>
          . https://doi.org/10.5555/1162264
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>