<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Analyzing Generated Images using Machine Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Diana Miranda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aparna Rane</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bipin Naik</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Technology, Goa College of Engineering</institution>
          ,
          <addr-line>Farmagudi, Ponda, Goa</addr-line>
          ,
          <country country="IN">India</country>
          ,
          <addr-line>403401</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <abstract>
        <p>This paper provides a detailed explanation of the approaches proposed by the Biomedical Imaging Goa Lab for the ImageCLEFmedical GANs tasks. The proposed approaches use feature vectors extracted from the penultimate layer of a pre-trained CNN to represent CT scan images. For the first task, -means clustering is first performed on the extracted features. Then, 1-nearest neighbour classifiers are used to determine if an input real CT scan image is used to generate the synthetic CT scan images. The best-performing model using this approach produced an 1-score of 0.5315. For the second task, a Gaussian Mixture Model is used to perform clustering of the input deep features extracted from the CNN. This produced an Adjusted Rand Index of 0.63812.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;-nearest neighbour classifier</kwd>
        <kwd>Gaussian Mixture Model</kwd>
        <kwd>Generative Adversarial Networks</kwd>
        <kwd>synthetic medical images</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Artificial intelligence has brought about significant improvements in the fields of medicine and health
care. The use of machine learning in automating diagnosis, content-based image retrieval (CBIR),
and treatment recommendation systems has attempted to lessen the burden on existing healthcare
professionals. However, the development of such automated systems requires enormous amounts of
training data. Such training data, especially that involving medical images is often dificult to obtain.
This could be due to privacy concerns among patients as well as a lack of homogeneity in the techniques
used to capture the images. In the absence of such data, it is dificult to train machine learning models
to give accurate results.</p>
      <p>
        This issue could be solved by creating synthetic medical images using Generative Adversarial
Networks (GANs), in which synthetic medical images can be generated from real ones. If this generation
process could be performed without afecting the privacy of patients, it could be ideal for producing
large amounts of medical data with ease. However, if the original medical image could be recovered from
the synthetic image, it could raise privacy concerns. The ImageCLEF 2024 challenge is an initiative that
tackles such issues [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. As part of this challenge, the ImageCLEFmedical GAN track invites researchers
to explore two tasks related to computed tomography (CT) scan images [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Task 1 Identify training
data ‘fingerprints’ involves investigating if the original CT scan images can be detected from given
generated CT scan images. This is a classification problem in which the generated CT scan images
are provided along with two types of images, those that are used for the generation process and those
that are not used. Participants are also given another set of synthetic images and are then required to
predict if unknown test images are used to generate these synthetic images. Task 2 Detect generative
models’ ‘fingerprints’ is a clustering problem that aims to produce clusters of synthetic CT scan images
so that it can be investigated if image fingerprints can be identified.
      </p>
      <p>In this paper, we have proposed to use deep features extracted from a convolutional neural network
(CNN) to represent the visual characteristics of the CT scan images. For the first task, we have assumed
that the test images are similar to the training images. Based on this assumption, we propose to use
a 1-nearest neighbour classification approach in which the test image is compared to cluster centers
obtained after performing -means clustering on the training data. The second task aims to find image
ifngerprints in the synthetic CT scan images. To do this, we propose to extract deep features from these
CT scan images and then use a Gaussian mixture model to perform clustering. This is based on the
assumption that the data follows a Gaussian distribution in a GMM. This is a preliminary study on
possible approaches to solve these problems and we will investigate other solutions in the future.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Proposed Approach</title>
      <p>Each CT scan image is given as input to a CNN and the output of one of the layers of the CNN is
considered as the feature vector for the input image. Identifying whether or not, a test image is used to
generate synthetic CT scan images is done using a 1-nearest neighbour-based classification approach
using the feature vectors obtained from the CNN. For the clustering task, a GMM is used to perform
clustering on the extracted feature vectors.</p>
      <p>In this section, a description of the method used for feature extraction is provided. This is followed by
an explanation of the technique used for the classification of the CT scan images. Lastly, the clustering
approach used to identify the fingerprints in the generated images is elucidated.</p>
      <sec id="sec-2-1">
        <title>2.1. Feature Extraction</title>
        <p>
          A CNN performs two main functions: (1) feature extraction from the input, and (2) classification of the
extracted features to certain classes. A CNN consists of diferent layers that each perform a distinct
operation. An image when passed through a CNN undergoes a series of transformations across these
layers. The final output layer of the CNN performs the classicfiation task. Once an image is passed
through a CNN, the output of any of the internal layers can be used to represent the given image. For
a CNN to efectively capture the visual characteristics of the input images, the CNN must be trained
on a large number of images. However, in the absence of suficient training images, a CNN that is
pre-trained on images from a similar dataset can be used. As the number of CT scan images available
for both tasks was not adequate to train a CNN from scratch, we have used a CNN that is pre-trained
on natural images from the ImageNet dataset [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The work proposed in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], shows the efectiveness
of features extracted from the ResNet50 CNN in performing medical image modality classification
for datasets that contained a large number of radiographic images. Therefore, we propose to use the
features extracted from the penultimate layer of the ResNet50 CNN to represent the CT scan images [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
Each CT scan image  is passed through a pre-trained ResNet50 CNN and the output from the average
pooling layer after the last convolutional layer is extracted. This is a 2048-dimension vector ′ that is
used to represent the features of the input image . These features are then used for classifying the CT
scan images as well as for clustering to identify the fingerprints in generated images.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Classification Approach for Task 1 Identify Training Data ‘Fingerprints’ to</title>
      </sec>
      <sec id="sec-2-3">
        <title>Detect Real Images Used to Generate Synthetic Images</title>
        <p>
          The proposed approach to determine if a given CT scan image is used to generate a set of synthetic
CT scan images is based on the assumption that the images used for generation in the training and
test datasets are similar. As shown in Figure 1, each image  in the training dataset is passed through
the pre-trained ResNet50 CNN, and the 2048-dimension output ′ of the average pooling layer after
the last convolutional layer is extracted as the input feature vector. If the training dataset contains 
training images  = {1, . . . , } that were used to generate the synthetic images, the corresponding
feature vectors are  ′ = {1′ , . . . , ′}. If the training dataset contains  images  = {1, . . . , }
that were not used to generate the synthetic images, then their corresponding feature vectors are
′ = {1′, . . . , ′}. Both sets of feature vectors are separately clustered using the -means clustering
approach where the value of  is varied [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Upon performing clustering on the set  ′,  cluster
Images Used
to Generate
Synthetic CT
        </p>
        <p>Scans</p>
        <p>O
Images Not
Used to
Generate
Synthetic CT</p>
        <p>Scans</p>
        <p>Pre-Trained</p>
        <p>CNN
Backbone
Pre-Trained</p>
        <p>CNN
Backbone</p>
        <p>Feature Vectors</p>
        <p>U'
centers   = {1 , . . . ,  } are obtained. Clustering the feature vectors present in ′ results in the
formation of  clusters with cluster centers   = {1, . . . , }.</p>
        <p>
          For the testing phase, each test image  is also passed through the same pre-trained ResNet50 CNN,
and the 2048-dimension output is extracted as  ′ to represent the test image  . To identify the closest
cluster center, the Manhattan distance is used as a similarity measure [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The Manhattan distance
between two -dimensional vectors  = [1 2 . . . ] and  = [1 2 . . . ]is calculated as follows:

(, ) = ∑︁ | − |
=1
(1)
        </p>
        <p>The Manhattan distance between  ′ and each of the  cluster centers in   and   is calculated
as ′ and ′ , respectively. If the smallest distance in ′ is less than the minimum distance in ′ ,
the test image  is classified as used and is considered to be used to generate the synthetic images.
Otherwise, the test image  is considered not to be used to generate the synthetic CT scan images. In
this way, the class label of the 1-nearest neighbour cluster center is assigned to the test image  .</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.3. Clustering Approach for Task 2 Detect Generative Models’ ‘Fingerprints’ to</title>
      </sec>
      <sec id="sec-2-5">
        <title>Identify Image Fingerprints</title>
        <p>
          For this task, the feature vectors for all the images in the training and test datasets are extracted from
the penultimate layer pre-trained ResNet50 CNN that is pre-trained on the ImageNet dataset. The
2048-dimension vectors are then clustered using a Gaussian Mixture Model (GMM) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] as shown in
Figure 2. GMM is efective when the data is heterogeneous. This is useful when a data point is close
to multiple clusters making it dificult to assign that data point to a single cluster. In the proposed
approach using GMM, the data is distributed into  components that are assumed to follow Gaussian
distributions where each ℎ component has its mean   and covariance matrix Σ where  = 1, . . . , .
The mean   and covariance matrix Σ are initialized using four diferent methods. The expectation
maximization (EM) algorithm is then used to estimate the mean   and covariance matrix Σ for each
of the  components.
        </p>
        <p>Training/Test</p>
        <p>Dataset</p>
        <p>Pre-Trained</p>
        <p>CNN
Backbone</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Results</title>
      <p>This section consists of a description of the datasets used for both tasks followed by a detailed explanation
of the experiments conducted on the datasets.</p>
      <sec id="sec-3-1">
        <title>3.1. Datasets</title>
        <p>Both datasets consist of axial slices of three-dimensional CT scan lung images. These images were of
256 × 256 pixels in size. Since the first task involves detecting which real CT scan images were used
to generate the synthetic CT scan images, this dataset consists of a mix of real and synthetic CT scan
images. The details of the dataset used for Task 1 Identify training data ‘fingerprints’ are as follows:
• The training dataset consists of two sets of image data. The first set contains 100 real images
used for the generation task, 100 images not used for the generation task, and 10, 000 generated
images that are generated from the real CT scan images. The second set consists of 3, 000 real
images used for the generation task, 3, 000 images that are not used for the generation task, and
10, 000 CT synthetic CT scan images generated from the real images.
• The test dataset also consists of two sets of image data. The first set contains 5, 000 synthetic CT
scan images and 4, 000 unknown real CT scan images. The second set consists of 7, 200 synthetic
CT scan images and 4, 000 unknown real CT scan images. The main aim of the task involving
this dataset is to determine if the unknown real CT scan images are used to generate synthetic
CT scan images.</p>
        <p>The goal of the second task is to identify the image fingerprints in the generated CT scan images in
the dataset. The details of the dataset used for Task 2 Detect generative models’ ‘fingerprints’ are as
follows:
• The training data contains three sets of synthetic CT scan images with each set having 200 images
each.</p>
        <p>• The test dataset consists of 3, 000 synthetic CT scan images.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Details of the Submitted Runs for Task 1 Identify Training Data ‘Fingerprints’</title>
        <p>The training data consists of images  that are used to generate the synthetic CT scan images and
images  that are not used to generate the synthetic images. Once the feature vectors  ′ and ′ are
extracted after passing the images through the CNN, they are separately clustered using the -means
clustering method to obtain  cluster centers each in   and  , respectively. During testing, each
test image is compared with the 2 ×  cluster centers in   and  , and the label of the closest
cluster is assigned to the test image. For this task, we have submitted 6 runs for diferent values of 
used for -means clustering. The values of  are powers of 2, i.e. 1, 2, 4, 8, 16, 32. The results of this
task are provided in Table 1.</p>
        <p>The results indicate that the 1-scores vary with the value of . The best result of an 1-score of
0.5315 from the submitted runs was obtained by the model where  = 4. The second best result was
obtained by the model with  = 8 with an 1-score of 0.5145. The submitted run where  = 2 gave
the worst performance. This shows that an optimal value of  must be chosen.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Details of the Submitted Runs for Task 2 Detect Generative Models’ ‘Fingerprints’</title>
        <p>
          Once the features are extracted from the CNN, they are clustered using a Gaussian Mixture Model
(GMM). During the training process, we experimented with GMMs that had 3 components. This was
based on our observations of the training data. During the test phase, we assumed that the GMMs have
4 components. The 4 submitted runs difer based on the methods used to initialize the means of the
components in the GMMs. The 4 methods used for initializing are as follows:
1. -means: Uses the -means clustering algorithm for initialization [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]
2. -means++: Uses the -means++ clustering algorithm in which the first clusters are chosen
randomly while the cluster centers in the subsequent iterations are chosen based on the maximum
squared distance [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]
3. random: Chooses the component means randomly
4. random from data: Chooses random data points as component means
        </p>
        <p>The results of this task are provided in Table 2. The performance of the proposed approach indicates
that initialization of the GMM components using the -means algorithm gives the best results while
random initialization of the GMM components is not efective. However, the proposed approach is
based on the assumption that the data follows a Gaussian distribution. If this is not true, then the
proposed approach may not be an ideal solution to this problem.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>This paper describes the methods used by the Biomedical Imaging Goa group for the ImageCLEFmedical
GANs task. The proposed approach uses features extracted from the penultimate layer of a CNN that is
pre-trained on natural images. For the classification task, the test image is compared with the cluster
centers formed from the training data and labeled according to a 1-nearest neighbour approach. For the
clustering task, the feature vectors are clustered using a GMM-based approach.</p>
      <p>Since this is a preliminary study conducted by our group, in the future we could explore other possible
methods to improve eficiency. This could involve using feature vectors from other CNNs as well as
diferent classification and clustering methods. Further, the proposed approaches are based on certain
assumptions that could be investigated to provide a better explanation of other possible solutions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Drăgulinescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rückert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. B.</given-names>
            <surname>Abacha</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
            , L. Bloch,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Brüngel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Idrissi-Yaghir</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Schäfer</surname>
            ,
            <given-names>C. S.</given-names>
          </string-name>
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>T. M.</given-names>
          </string-name>
          <string-name>
            <surname>Pakull</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Damm</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Bracke</surname>
            ,
            <given-names>C. M.</given-names>
          </string-name>
          <string-name>
            <surname>Friedrich</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Andrei</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Prokopchuk</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Karpenka</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Radzhabov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Kovalev</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macaire</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Lecouteux</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Esperança-Rodier</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Yim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Fu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Yetisgen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Xia</surname>
            ,
            <given-names>S. A.</given-names>
          </string-name>
          <string-name>
            <surname>Hicks</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Thambawita</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Storås</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Halvorsen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Heinrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kiesel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Stein</surname>
          </string-name>
          , Overview of ImageCLEF 2024:
          <article-title>Multimedia retrieval in medical applications, in: Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and
          <string-name>
            <surname>Interaction</surname>
          </string-name>
          ,
          <source>Proceedings of the 15th International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ), Springer Lecture Notes in Computer Science LNCS, Grenoble, France,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Andrei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radzhabov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Karpenka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Prokopchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <surname>Overview of 2024 ImageCLEFmedical GANs Task - Investigating Generative</surname>
          </string-name>
          Models'
          <article-title>Impact on Biomedical Synthetic Images</article-title>
          , in: CLEF2024 Working Notes, CEUR Workshop Proceedings, CEUR-WS.org, Grenoble, France,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.-J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fei-Fei</surname>
          </string-name>
          ,
          <article-title>Imagenet: A large-scale hierarchical image database, in: 2009 IEEE conference on computer vision and pattern recognition</article-title>
          , Ieee,
          <year>2009</year>
          , pp.
          <fpage>248</fpage>
          -
          <lpage>255</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Miranda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Thenkanidiyoor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Dinesh</surname>
          </string-name>
          ,
          <article-title>Detecting the modality of a medical image using visual and textual features</article-title>
          ,
          <source>Biomedical Signal Processing and Control</source>
          <volume>79</volume>
          (
          <year>2023</year>
          )
          <fpage>104035</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Deep residual learning for image recognition</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>770</fpage>
          -
          <lpage>778</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R. O.</given-names>
            <surname>Duda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Hart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Stork</surname>
          </string-name>
          , Pattern classification, John Wiley &amp; Sons,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Kushwaha</surname>
          </string-name>
          ,
          <article-title>Clustering cloud workloads: K-means vs gaussian mixture model</article-title>
          ,
          <source>Procedia computer science 171</source>
          (
          <year>2020</year>
          )
          <fpage>158</fpage>
          -
          <lpage>167</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Arthur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vassilvitskii</surname>
          </string-name>
          , et al.,
          <article-title>k-means++: The advantages of careful seeding</article-title>
          ,
          <source>in: Soda</source>
          , volume
          <volume>7</volume>
          ,
          <year>2007</year>
          , pp.
          <fpage>1027</fpage>
          -
          <lpage>1035</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>