<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>AI Multimedia Lab at ImageCLEFmedical GANs 2024: Deep Learning Approaches for Analyzing Synthetic Medical Images</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexandra-Georgiana Andrei</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mihai Gabriel Constantin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mihai Dogariu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bogdan Ionescu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AI Multimedia Lab, CAMPUS Research Center, National University of Science and Technology Politehnica Bucharest</institution>
          ,
          <addr-line>Bucharest</addr-line>
          ,
          <country country="RO">Romania</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents the participation of AI Multimedia Lab to the 2024 ImageCLEFmedical GANs task. The 2024 ImageCLEFmedical GANs propose two sub-tasks that study security and privacy concerns related to personal medical image data in the context of generating and using artificial images in diferent real-life scenarios: identifying "fingerprints" left by the original training data within synthetic medical images and detecting unique "fingerprints" imprinted by diferent generative models on their synthetic outputs. For the first sub-task, we proposed advanced algorithms that leverage medical image segmentation, deep feature extraction, and clustering techniques to accurately identify fingerprints from the original data. For the second sub-task, we proposed robust clustering frameworks and feature extraction methods using both pre-trained deep learning models and handcrafted techniques to distinguish between synthetic images generated by various models. Our methods demonstrated promising results obtaining a maximum F1-score of 0.627 for the first sub-task and an ARI of 0.996 for the second sub-task, highlighting their potential in addressing security and privacy concerns in the context of synthetic medical images.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;ImageCLEFmedical GANs</kwd>
        <kwd>Generative models</kwd>
        <kwd>Generative Adversarial Networks</kwd>
        <kwd>synthetic medical images</kwd>
        <kwd>CT images</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Medical imaging plays a crucial role in the diagnosis and treatment of various conditions by providing
information about the internal structures and functions of the human body. Even with current
technological developments and the abundance of data available in hospitals and other private institutions, it
is still dificult to make these data easily accessible to the scientific community for the purpose of
developing algorithms. This dificulty arises from the necessity of maintaining patient data confidentiality.
To address these impediments, generative models have been proposed to augment datasets and even
improve their quality.</p>
      <p>
        In its second edition, ImageCLEF2024medical GANs task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], part of the 2024 ImageCLEF2024 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
evaluation campaign focuses on leveraging generative models to improve the quality and utility of
medical images. The task is divided in two sub-tasks that address both security and privacy concerns
related to personal medical data and on understanding if diferent generative models imprint diferent
discernible signatures within the synthetic images they produce.
      </p>
      <p>This paper presents our methods for addressing the challenges proposed by ImageCLEFmedical
GANs task. The paper is structured as follows: Section 2 presents the tasks and the datasets, Section 3
presents the proposed methods and the results are presented and discussed in Section 4. Finally, the
paper closes with Section 5, where we present the conclusions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. The 2024 ImageCLEFmedical GANs Tasks</title>
      <sec id="sec-2-1">
        <title>2.1. Identify training data “fingerprints”.</title>
        <p>
          The task was introduced in the previous edition [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and involves investigating the hypothesis that
generative models produce medical images that bear similarities to the original images used for training
the generative model. This addresses security and privacy concerns related to personal medical image
data in the context of generating and using artificial images in various real-life scenarios.
        </p>
        <p>
          The objective is to detect “fingerprints” within synthetic biomedical image data to determine which
real images were used in the training process to produce the generated images. This task involves
analyzing test image datasets and assessing the likelihood that specific images of real patients were used
to train the generative models. This sub-task involved investigating the hypothesis for two diferent
generative models.The dataset provided for this tasks consists of both real and generated images, as
described in Table 1. More information about the sub-task and the provided data are available in the
overview paper of the task [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Detect generative models’ “fingerprints”</title>
        <p>The task involves exploring the hypothesis that generative models imprint unique fingerprints on the
synthetic images they produce. The primary focus is on determining whether diferent generative
models or architectures leave discernible signatures within these synthetic images.</p>
        <p>
          To address this, a set of synthetic images generated through various generative models is provided,
with the objective of identifying and detecting the distinct "fingerprints" associated with each model.
This task requires analyzing the embedded characteristics, patterns, or features in the synthetic images.
This investigation contributes to a deeper understanding of the unique imprints left by generative
models on their generated images, facilitating model attribution and recognition. The dataset provided
for this tasks consists of generated images: 600 images generated using three diferent generative models
for the training dataset (each model is represented by 200 images and are annotated accordingly) and a
mixture of 3,000 generated images for the test dataset generated using four diferent generative models.
More information about the sub-task and the provided data are available in the overview paper of the
task [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed Methods</title>
      <sec id="sec-3-1">
        <title>3.1. Identify training data “fingerprints”.</title>
        <p>For the first sub-task we propose two diferent approaches. The first one is based on segmenting the
lung area from the CT and extracting features for diferent relevant regions and, finally, clustering the
feature vectors such that images used for training will be grouped together and separated from the
ones not used for training. The second approach relies on training an autoeconder to reconstruct the
generated images. The hypothesis is that using the same autoencoder on the training images will yield
lower reconstruction errors than on images outside the training set.</p>
        <sec id="sec-3-1-1">
          <title>3.1.1. Analyzing medically relevant regions</title>
          <p>Our first set of approaches for identifying training data fingerprints aims at isolating and analyzing
diferent medically relevant regions of the target images. This approach is presented in Figure 1, and
is composed of several stages as follows: (i) medical image segmentation, detecting the area of the
lungs, (ii) deep feature extraction, and (iii) clustering. This results in three types of individual runs, the
baseline run consisting of using the entire image for stages 2 and 3, while the second type isolates and
uses only the lung regions in stages 2 and 3, which are the most medically relevant regions in the CT
images. Finally, the last type of run isolates and removes the lung regions, analyzing the rest of the CT
image and, while this region may be less medically relevant, it could still contain fingerprints of the
original images.</p>
          <p>
            For medical image segmentation, we deploy a UNet [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] deep neural network, trained for lung
segmentation in CT images. The detected lung regions are then prepared for the second and third
stages, with one set of images containing only the lung regions, cropped according to the limits of the
lung regions and with all other non-lung pixels being set to 0, while the other set of images being a
copy of the original images, with the lung region pixels set to 0.
          </p>
          <p>
            In the second stage we deploy two DNNs and extract features from their layers, namely the
ResNet50 [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] and the DenseNet121 [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] networks. We use the pretrained versions of these networks,
consisting of weights trained on the ImageNet dataset [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]. We extract features from the penultimate
layer for each of the two networks, resulting in a vector of values of size (2048 × 1) for ResNet and a
(1024 × 8 × 8) vector of values for DenseNet, that is then averaged obtaining a vector of size (1024 × 1).
          </p>
          <p>In the third stage, we test two diferent clustering methods: k-means and hierarchical clustering,
while varying the number of clusters . The hypothesis we wish to test in this set of experiments
is also presented Figure 1, and it is as follows. The training process of the two generative models
associated with this task will inevitably create new artificial images that can be associated with a
variable number of clusters. Thus, the entire set of generated images can be expressed as a set of clusters
 = 1 ∪ 2 ∪ ... ∪ , where  is a variable number of clusters. In theory, given a distance  and two
samples,  and , with the former used in training the generative networks and the latter not used,
the used sample should be closer to the clusters than the sample that was not used during training:
(, ) &lt; (, ). We also test two types of distances, one where only the distance to the closest
cluster is taken into account (, ) = ((, )), and one where the average of all the distances
is taken into account: (, ) = ((, )). We also test a large number of values for the number
of clusters, in total 30 possible values. We find the best values of  and the best setting for the distance
variation based on tests on the training set, which we then deploy on the testing set.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. Anomaly detection applied to generative models</title>
          <p>Our next approach focuses on the autoencoders’ ability to capture the reconstruction ability of the
training dataset. The autoencoder is composed of two main blocks, the encoder , which transforms
an input sample into a feature vector, also called bottleneck, and the decoder , which transforms the
bottleneck back into the input sample’s domain. The output of the decoder should, usually, match the
input of the encoder. Formally, ˜ = (()) ≈ . However, this reconstruction is almost never lossless.
In order to asses the reconstruction error, we use the mean square error (MSE). Ideally,  (, ˜) = 0.
Practically, we long for MSE values as low as possible. This is an indicator that the autoencoder adapted
to the training dataset’s probability density function and can successfully reconstruct any (or most) of
the inputs from the training dataset.</p>
          <p>Hypothetically, an autoencoder trained on a certain dataset should yield low reconstruction errors for
samples that were used for training the said autoencoder and higher reconstruction errors for samples
that are outside the training dataset. In this respect, we consider that an autoencoder trained on the
generated samples of a generative model should yield low reconstruction errors for samples that were
used for training the generative model and higher reconstruction errors for samples outside the training
dataset. This hypothesis is, of course, prone to alterations due to the large similarity between ‘used’ and
‘not_used’ subsets for the generative model’s training and the low complexity of the analyzed images.</p>
          <p>We follow 2 diferent approaches. During the first approach, we compute the mean of the MSE
computed on the pixel level and determine a centroid value for the images that are sure to have been
part of the training dataset. Similarly, we compute the mean MSE value of the reconstructions of images
that are surely outside the training dataset. We, thus, obtain two central values, one for the images
that were used during training and one for the images that were not used during training. Finally,
we compute the distance of the MSE reconstruction loss of new (test) samples and assign them to the
closest group of samples, thus labeling them as being ‘used’ or ‘not_used’ during training.</p>
          <p>
            The second approach is similar to the first one. We compute the MSE loss for each input, but this
time at the pixel-level. We create a reconstruction error image by averaging the pixel-level MSE errors
for all images that were used for training the generative model and one for the images that were not
used for training the model, thus obtaining two centroid-like images. Lastly, we compute the SSIM [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]
between the reconstruction error of new samples and the two centroid-like images. The largest SSIM
value yields the class of images (‘used’/‘not_used’) the test image falls into.
          </p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Detect generative models’ “fingerprints”</title>
        <p>
          We proposed diferent variations of the pipeline presented in Figure 2 for detecting the "fingerprints" of
the generative models. The presented methods use pattern recognition and feature extraction techniques
to analyze the features embedded in the generated images. Building on the the method presented by our
team in the previous edition [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], we employed two approaches for feature extraction. The first approach
uses Transfer Learning, where a pre-trained model is used, while the second approach involves using a
handcrafted extraction technique.
        </p>
        <p>
          Feature extraction – For the deep-learning feature extraction method, we employed diferent
pretrained models that were originally trained on the ImageNet dataset [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]: i) VGG-16 model [11], ii)
ResNet [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], iii) MobileNetV2 [12], iv) EficientNet [ 13] and v) DenseNet-121 [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. These models were
selected for feature extraction due to their proved eficacy and robustness in various computer vision
tasks.
        </p>
        <p>VGG-16 generates rich hierarchical features due to its architecture comprising of 16 weight layers,
including 13 convolutional layers and 3 fully connected layers. It uses 3 × 3 filters and it allows capture
intricate patterns and details in the input images. Same as the VGG-16, ResNet50 captures complex
patterns and hierarchical features from input images due to its architecture characterized by its residual
blocks, where each block contains a series of convolutional layers with batch normalization and ReLU
activation. ResNet50 is a variant of the Residual Network (ResNet) model, which is known for its
powerful capabilities in deep learning applications.</p>
        <p>DenseNet-121, a variant of the Densely Connected Convolutional Networks (DenseNet), is highly
esteemed for its efective and eficient feature extraction capabilities. DenseNet-121 introduces an
innovative architecture where each layer is directly connected to every other layer within a dense block.
This dense connectivity pattern significantly enhances information flow and gradient propagation
throughout the network, facilitating better learning and reducing the risk of vanishing gradients.
MobileNetV2 is a suitable option for feature extraction in non-mobile scenarios as well, despite its
initial design for mobile applications. The model’s architecture, characterized by inverted residuals and
linear bottlenecks, significantly reduces the number of parameters and computational complexity while
maintaining high accuracy, having the ability ti provide high-quality feature representations.
EficientNet is a convolutional Neural Network (CNN) capable of eficient and scalable feature extraction.
Because of its scalability, it can adjust to diferent application needs, ranging from high-performance
computer systems to situations with limited resources. Moreover, EficientNet’s architecture, which is
based on squeeze-and-excitation modules and mobile inverted bottleneck convolutions guarantees that
complex patterns and hierarchical features are eficiently captured.</p>
        <p>All these models have in common the fact that the initial layers capture low-level features such as
edges and textures, while the deeper layers extract complex features such as shapes and high-level
patterns.</p>
        <p>Feature selection – Using pre-trained networks for features extraction, resulted in a substantial
number of feature, as presented in Table 2 for the training dataset. To manage the high dimensionality
of these features, Principal Component Analysis (PCA) was applied to reduce them to the minimum
number of components required to capture the variance in the original data. During this process,
diferent values for the number of components were tested to determine the optimal balance between
dimensionality reduction and information retention. This approach ensured that the selected number
of components was neither too few, which would risk losing significant information, nor too many,
which would undermine the benefits of dimensionality reduction.</p>
        <p>We have also employed a handcrafted feature extraction technique for extracting the Local Binary
Pattern (LBP). It captures information about the local spatial patterns and the grayscale contrast of the
images.</p>
        <p>Clustering – Knowing the number of classes/clusters to which the images belong (3 for the
development/test dataset and 4 for the test dataset), we used two diferent clustering methods: i) k-means and ii)
hierarchical clustering. Both k-means and hierarchical clustering aim to group data points into clusters
based on similarity, but they do so in fundamentally diferent ways. We chose to use both k-means and
hierarchical clustering to identify patterns and group CT slices with similar features within the provided
dataset. By comparing the outcomes, we can validate that the feature extraction method efectively
captures the data’s intrinsic properties.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussion</title>
      <sec id="sec-4-1">
        <title>4.1. Identify training data “fingerprints”.</title>
        <p>For the first task, the best results for identifying the training data “fingerprints" were obtained with
diferent methods on the two datasets. On the first dataset, the method focusing on analyzing the
medically relevant regions obtained the highest F1 score, 0.5385. On the second dataset, the method
relying on detecting anomalies with by computing SSIM on reconstruction errors obtained the highest
F1 score, 0.62753. We present both methods in detail, next.</p>
        <p>VGG-16</p>
        <p>VGG-16
VGG-16 + PCA (95%)
VGG-16 + PCA (95%)
VGG-16 + PCA (90%)
VGG-16 + PCA (90%)
Hand-crafted – LBP
Hand-crafted – LBP</p>
        <p>ResNet50</p>
        <p>ResNet50
ResNet50 + PCA (95%)
ResNet50 + PCA (95%)
ResNet50 + PCA (90%)
ResNet50 + PCA (90%)</p>
        <p>MobileNetV2
MobileNetV2
EficientNet</p>
        <p>EficientNet
EficientNet + PCA (95%)
EficientNet PCA (95%)</p>
        <p>DenseNet-121
DenseNet-121</p>
        <p>k-means
hierarchical clustering</p>
        <p>k-means
hierarchical clustering</p>
        <p>k-means
hierarchical clustering</p>
        <p>k-means
hierarchical clustering</p>
        <p>k-means
hierarchical clustering</p>
        <p>k-means
hierarchical clustering</p>
        <p>k-means
hierarchical clustering</p>
        <p>k-means
hierarchical clustering</p>
        <p>k-means
hierarchical clusterin</p>
        <p>k-means
hierarchical clusterin</p>
        <p>k-means
hierarchical clusterin</p>
        <sec id="sec-4-1-1">
          <title>4.1.1. Analyzing medically relevant regions</title>
          <p>Overall, the performances of hierarchical clustering on the training set were significantly superior
to those of the k-means approach in all preliminary tests. We therefore decided to proceed and test
only predictions associated with the hierarchical clustering approaches. Table 3 presents the results
associated with this approach, as well as the features and the clustering setup defined as (distance type,
number of clusters) for each of the six submitted runs. The best result on the testing set is achieved with
a ResNet-based feature extractor, processing outer regions not containing the lung segments, and using
the minimum distance variation with a number of 10 clusters for the hierarchical clustering method.
This approach achieved an F1 score of 0.5295 for the first dataset, 0.5475 for the second dataset, and an
average F1 value of 0.5385.</p>
          <p>When analyzing the diferences between the expected performance (F1 values on the training set)
and real performance (F1 values on the testing set), a noticeable diference can be observed between the
two datasets. For the first dataset, the decrease in performance is between 8.21% and 20.53% across all
six runs, while for the second dataset, only one of the six runs registered a lower performance on the
testing set, with a 0.2% decrease in performance, with the other five runs surprisingly registering higher
performance on the testing set compared with the training set, an increase between 0.52% and 8.71%.
While this diferent behaviour must be more thoroughly studied in future experiments, at this moment
we believe this to be the outcome of the number of samples in the training set for the two dataset. The
ifrst dataset is composed of only 200 real images, while the second one has 6000. This would result in a
sub-optimal search for the parameters of the clustering method (the type of distance and the number of
clusters), as the low number of real images in the first dataset may not be representative enough for the
entire dataset, or a better method of clustering is needed in cases with lower number of samples in the
training set.</p>
          <p>Finally, the cropping and masking method seems to have an important efect on the final results,
seemingly regardless of the dataset analyzed, feature extractor, or clustering setup. The original
uncropped images show a maximum F1 performance of 0.50225 across both datasets, while the lung
region and outer region images show a maximum F1 performance of 0.51525 and 0.5385 respectively.
This is another interesting phenomenon, that must be further studied, but, at this point, we theorize
that analyzing smaller patches of the original images may allow the feature extractors to concentrate
more on specific areas and features and to concentrate their responses to those specific areas. Another
point of discussion here is if the nature of the patches has any importance on the improvement of the
results. In our approaches, we specifically targeted regions of the image that have a medical significance
(regions containing only lungs and the rest of the body and CT scan). However, in our future studies we
should test if any random patch of a large enough size of the images would improve the result compared
with a full image analysis.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.1.2. Anomaly detection applied to generative models</title>
          <p>The anomaly detection approaches seem to yield good results for the second dataset, whereas it fails
on the first dataset. On the first dataset, the ‘used’ and ‘not_used’ centroid-like features that were
computed on the development data and adequately tuned to ensure a good segregation between the
data that was used to train the generative model and the external one does not seem to generalize to the
test dataset, outputting a single label for the entire test set. This might happen due to a large diference
between the dataset used to train the development generator and the dataset used to train the test
generator.</p>
          <p>For the second dataset, however, the proposed approaches seem to have a better fitting, suggesting
that the second model has a stronger correlation between the training dataset and the test dataset.</p>
          <p>Between the two applied methods it is clear that the SSIM approach is superior to the vanilla one.
This is somewhat to be expected, since for images it is also important where the reconstruction error
is located and not only its absolute value. SSIM takes into account the entire structure (pixel error
placement) instead of averaging over all points, obtaining an F1 score of 0.5938 for DB2, as opposed to
0.4683, as per Table 4.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Detect generative models’ “fingerprints”.</title>
        <p>Analyzing the results obtained on the testing set presented in Table 2, it is observed that the best results,
achieving the highest possible ARI score of 1, were obtained using DenseNet-121 for feature extraction.
This result was consistent across both clustering methods. The next best results were obtained using the
MobileNetV2 network for feature extraction, which achieved an ARI of 0.9949 with the k-means method
and an ARI of 0.99 with hierarchical clustering. For this method, PCA was also applied to reduce the
number of features, but the results were consistent. Significant ARI values were also achieved using the
VGG-16 network for feature extraction. Moreover, applying PCA for feature reduction in conjunction
with VGG-16 feature extraction and k-means classification resulted in an increased ARI value.</p>
        <p>Based on the evaluation of various methods tested on the development dataset, we selected the
top-performing approaches to be applied for the test dataset and we obtained the results presented in
Table 5. Specifically, we chose DensNet-121, MobileNetV2, ResNet and VGG-16 which demonstrated
superior ARI on the development dataset. In addition to this, we included the method that employs
hand-crafted feature – LBP – to provide a comparative analysis between automated feature extraction
through deep learning and traditional hand-crafted feature extraction.</p>
        <p>When comparing the expected performance, measured by ARI on training set, with the actual
performance obtained on the training set, a noticeable diference can be observed between the results
obtained using VGG-16 and ResNet CNNs for feature extraction. The decrease in performance is
between 22.8% and 33% for all runs that used VGG-16 and ResNet for feature extraction. This divergence
in performance can be attributed to diferences in the composition of the training and testing datasets
from the perspective of the number of generative models employed. It is plausible that the features
extracted by VGG-16 and ResNet from the training set do not generalize well to the testing set due to
variations in the distribution or characteristics of the generative models represented in each dataset. As
a result, the models trained on VGG-16 and ResNet features may struggle to efectively classify instances
in the testing set, leading to a notable decline in performance compared to the training set. Similar
performances were obtained using MobileNetV2 and DenseNet-121 for feature extraction. The higher
ARIs on the testing set of 0.9971 respectively 0.9965 were obtained using DensNet and MobileNet for
feature extraction and hierarchical clustering. Similar results were also obtained using the hand-crafted
feature extraction technique we use to extract the LBP feature.</p>
        <p>The achieved ARI value of 0.9971 indicates an exceptionally high level of agreement between our
clustering results and the ground truth. This result underscores the efectiveness of the proposed
method (feature extraction using MobileNetV2 and hierarchical clustering) in accurately capturing the
inherent patterns and relationships within the images generated using diferent generative models.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>This paper presents the methods developed by AI Multimedia Lab for 2024 ImageCLEFmedical GANs
task, focusing on identifying training data "fingerprints" and detecting generative models’ "fingerprints"
within medical images. We provided various methods, including analyzing medically relevant regions,
anomaly detection using autoencoders, and employing deep learning-based feature extraction together
with clustering techniques. Our results demonstrate promising capabilities in detecting the images used
for training a generative models, obtaining an F1-score of 0.627. Using feature extracting and clustering
methods, we managed to achieve almost a maximum ARI in clustering synthetic images based on the
generative model that generated them. Further work can be conducted to optimize the approaches for
broader applicability across diverse medical imaging scenarios.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The contribution to this task is supported under project AI4Media, A European Excellence Centre
for Media, Society and Democracy, H2020 ICT-48-2020, grant #951911. Mihai Dogariu’s work was
supported by a grant from the National Program for Research of the National Association of Technical
Universities - GNAC ARUT 2023.
[11] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition,
arXiv preprint arXiv:1409.1556 (2014).
[12] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen, Mobilenetv2: Inverted residuals
and linear bottlenecks, in: Proceedings of the IEEE conference on computer vision and pattern
recognition, 2018, pp. 4510–4520.
[13] B. Koonce, B. Koonce, Eficientnet, Convolutional neural networks with swift for Tensorflow:
image recognition and dataset categorization (2021) 109–123.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Andrei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radzhabov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Karpenka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Prokopchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <surname>Overview of 2024 ImageCLEFmedical GANs Task - Investigating Generative</surname>
          </string-name>
          Models'
          <article-title>Impact on Biomedical Synthetic Images</article-title>
          , in: CLEF2024 Working Notes, CEUR Workshop Proceedings, CEUR-WS.org, Grenoble, France,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Drăgulinescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rückert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Ben</given-names>
            <surname>Abacha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garcıa Seco de Herrera</surname>
          </string-name>
          , L. Bloch,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brüngel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Idrissi-Yaghir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schäfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Pakull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Damm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bracke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Friedrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Andrei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Prokopchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Karpenka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radzhabov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Macaire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schwab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lecouteux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Esperança-Rodier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yetisgen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Hicks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Riegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Thambawita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Storås</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Halvorsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Heinrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kiesel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          , Overview of ImageCLEF 2024:
          <article-title>Multimedia retrieval in medical applications, in: Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and
          <string-name>
            <surname>Interaction</surname>
          </string-name>
          ,
          <source>Proceedings of the 15th International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ), Springer Lecture Notes in Computer Science LNCS, Grenoble, France,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.-G.</given-names>
            <surname>Andrei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radzhabov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Coman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <article-title>Overview of imageclefmedical gans 2023 task: identifying training data “fingerprints” in synthetic biomedical images generated by gans for medical image security</article-title>
          ,
          <source>in: Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2023</year>
          ), volume
          <volume>3497</volume>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>O.</given-names>
            <surname>Ronneberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brox</surname>
          </string-name>
          , U-net:
          <article-title>Convolutional networks for biomedical image segmentation, in: Medical image computing and computer-assisted intervention-MICCAI 2015: 18th international conference</article-title>
          , Munich, Germany, October 5-
          <issue>9</issue>
          ,
          <year>2015</year>
          , proceedings,
          <source>part III 18</source>
          , Springer,
          <year>2015</year>
          , pp.
          <fpage>234</fpage>
          -
          <lpage>241</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Koonce</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Koonce</surname>
          </string-name>
          , Resnet 50,
          <article-title>Convolutional neural networks with swift for tensorflow: image recognition and dataset categorization (</article-title>
          <year>2021</year>
          )
          <fpage>63</fpage>
          -
          <lpage>72</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Iandola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Moskewicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Karayev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Darrell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Keutzer</surname>
          </string-name>
          , Densenet:
          <article-title>Implementing eficient convnet descriptor pyramids</article-title>
          ,
          <source>arXiv preprint arXiv:1404</source>
          .
          <year>1869</year>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>O.</given-names>
            <surname>Russakovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Krause</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Satheesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Karpathy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khosla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          , et al.,
          <article-title>Imagenet large scale visual recognition challenge</article-title>
          ,
          <source>International journal of computer vision 115</source>
          (
          <year>2015</year>
          )
          <fpage>211</fpage>
          -
          <lpage>252</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Bovik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. R.</given-names>
            <surname>Sheikh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. P.</given-names>
            <surname>Simoncelli</surname>
          </string-name>
          ,
          <article-title>Image quality assessment: from error visibility to structural similarity</article-title>
          ,
          <source>IEEE transactions on image processing 13</source>
          (
          <year>2004</year>
          )
          <fpage>600</fpage>
          -
          <lpage>612</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.-G.</given-names>
            <surname>Andrei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          , Aimultimedialab at imageclefmedical gans
          <year>2023</year>
          <article-title>: determining “fingerprints” of training data in generated synthetic images</article-title>
          ,
          <source>in: CLEF2023 Working Notes, CEUR Workshop Proceedings</source>
          , Thessaloniki, Greece,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.-J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fei-Fei</surname>
          </string-name>
          ,
          <article-title>Imagenet: A large-scale hierarchical image database</article-title>
          ,
          <source>in: 2009 IEEE Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>248</fpage>
          -
          <lpage>255</lpage>
          . doi:
          <volume>10</volume>
          .1109/CVPR.
          <year>2009</year>
          .
          <volume>5206848</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>