<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Detecting Training Data Usage in Synthetic Images Using Machine Learning Techniques</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Krithikha Sanju Saravanan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohanavalli Subramaniam</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>R B Anirudh Narayanan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bharath P</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Karunanidhi Ayyamperumal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Sri Sivasubramaniya Nadar College of Engineering</institution>
          ,
          <addr-line>Kalavakkam, Tamil Nadu</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents an investigation into detecting hidden "fingerprints" of real training images within synthetic medical images generated by Generative Adversarial Networks (GANs). The potential for GAN-generated images to retain traces of the source data poses a significant privacy risk, especially in sensitive domains like medical imaging where data sharing is restricted by privacy regulations. While GANs ofer a powerful solution to data scarcity for training diagnostic models, it is crucial to ensure their outputs do not compromise patient confidentiality. This study evaluates the efectiveness of various machine learning strategies in identifying whether a specific real image was used in a GAN's training process. We explore a range of methods, from supervised deep learning classifiers to a comprehensive unsupervised framework that uses dimensionality reduction via PCA and clustering techniques. By measuring the feature-space proximity between real and synthetic images, we aim to uncover signatures of training data usage. The primary objective is to determine the privacy implications of employing GAN-generated data in medical applications, thereby providing a clearer understanding of the risks and benefits.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Synthetic Data</kwd>
        <kwd>GAN</kwd>
        <kwd>Medical Imaging</kwd>
        <kwd>Privacy</kwd>
        <kwd>Fingerprint Detection</kwd>
        <kwd>PCA</kwd>
        <kwd>Clustering</kwd>
        <kwd>Training Data Leakage</kwd>
        <kwd>Supervised Learning</kwd>
        <kwd>Unsupervised Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The proliferation of deep learning has significantly advanced the capabilities of Computer-Aided
Diagnosis (CAD) systems in medical imaging. However, the performance of these data-intensive models
is often hampered by the scarcity of large, diverse, and publicly available datasets, a direct consequence
of stringent patient privacy regulations and the high cost of data annotation. Generative Adversarial
Networks (GANs) have emerged as a promising technology to mitigate this issue by producing
highifdelity synthetic images, which can be used to augment datasets for training more robust and accurate
models.</p>
      <p>Despite their potential, GANs introduce a critical and unresolved privacy concern: the risk that
synthetic images may implicitly retain signatures or "fingerprints" of the real patient data used to
train them. The existence of such fingerprints would mean that synthetic images carry similar privacy
liabilities as the original data, thereby undermining their utility as a shareable, anonymized resource.
This paper addresses the central question: can we reliably detect if a specific real image was used in the
training of a GAN?</p>
      <p>
        This working note details the participation of the "Challengers" lab in the ImageCLEFmedical 2025
GANs challenge, specifically addressing Subtask 1: "Detect Training Data Usage" [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We present
a systematic evaluation of various machine learning strategies, comparing a fully supervised deep
learning approach against a comprehensive unsupervised framework. Our goal is to rigorously assess the
feasibility of fingerprint detection and contribute to a clearer understanding of the privacy guarantees
of GAN-generated data in the broader context of the ImageCLEF 2025 initiative [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. To enhance
reproducibility, our code is available online.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>The application of Generative Adversarial Networks (GANs), first introduced by Goodfellow et al. [ 3],
has expanded rapidly, particularly within the domain of medical imaging. Early research demonstrated
their profound utility for data augmentation, a critical task where acquiring large, labeled datasets is
often prohibitive. For instance, Frid-Adar et al. [4] successfully utilized Deep Convolutional GANs
(DCGANs) to synthesize CT images of liver lesions, thereby augmenting their training set to improve
CNN classification performance. Similarly, other works have explored GANs for augmenting datasets
of lung nodules and chest X-rays, establishing them as a powerful tool for enhancing the robustness
and accuracy of diagnostic models when faced with limited data [5, 6].</p>
      <p>However, the power of GANs to learn and replicate complex data distributions also introduces
significant privacy concerns. The very process that enables high-fidelity image generation raises questions
about whether the models are inadvertently memorizing and exposing sensitive information from the
training data. This has led to a growing body of research on the privacy risks of generative models,
including the threat of membership inference attacks, where an adversary aims to determine if a specific
data point was part of a model’s training set [7, 8]. The concept of "training data fingerprinting"—the
central theme of our work—is a nuanced form of this risk, suggesting that unique, traceable signatures
of training samples may be embedded within the synthetic outputs.</p>
      <p>
        The ImageCLEFmedical challenges have been instrumental in formalizing this problem and directing
research eforts toward solutions [
        <xref ref-type="bibr" rid="ref1 ref2">2, 1</xref>
        ]. For instance, throughout the 2023 edition, the DMK SSN team
tackled this problem using an integrated approach of Convolutional Neural Networks (CNNs), SIFT with
KNN classification, and Perceptual Hashing [ 9]. They focused on calculating the Hamming distances
for actual images and their synthesized counterparts to determine if the GAN outputs were significantly
similar to the training data. Their study was an initial confirmation of the fingerprinting hypothesis and
attracted attention for eforts toward more robust detection frameworks. To build on such eforts and
address the current challenge, our work employs a diverse suite of well-established machine learning
techniques.
      </p>
      <p>For deep feature extraction, we utilize several landmark convolutional neural network architectures:
ResNet, known for its deep residual learning framework [10]; VGGNet, which established the importance
of depth in network design [11]; DenseNet, which introduced dense connectivity patterns [12]; and the
highly eficient MobileNetV2 [ 13] and EficientNet [ 14] architectures. For our unsupervised pipeline,
we pair these deep learners with classical methods, including Principal Component Analysis (PCA)
for dimensionality reduction [15], Local Binary Patterns (LBP) for texture analysis [16], and clustering
algorithms like K-Means [17] and DBSCAN [18]. While prior work has identified the fingerprinting
problem, our primary contribution is the systematic and rigorous comparative analysis of these varied
techniques. By evaluating both supervised and extensively tuned unsupervised pipelines, we provide a
clear benchmark on the dificulty of the ImageCLEF 2025 fingerprint detection task and ofer practical
insights into the privacy guarantees of synthetic medical data.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset</title>
      <p>
        Our study utilizes the oficial benchmarking dataset provided for the ImageCLEFmedical 2025 GANs
Task, specifically Subtask 1: "Detect Training Data Usage" [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The dataset is composed of grayscale
axial CT slices of lungs from tuberculosis patients, saved in PNG format with a uniform resolution of
256× 256 pixels. It covers a wide spectrum of pathological conditions, from normal lung tissue to severe
lesion cases.
1https://github.com/karuna207/ImageCLEF-Subtask1
      </p>
      <p>The dataset is divided into training and test sets, each structured to facilitate the fingerprint detection
task.</p>
      <p>• Training Set: This set is provided to develop and tune the detection models. It contains three
distinct subsets:
– real_used: A folder containing 100 real CT images that were used to train the provided</p>
      <p>GAN model.
– real_not_used: A folder containing 100 real CT images that were explicitly not used in
the GAN’s training process.
– generated: A collection of 5,000 synthetic images produced by the GAN, which was
trained exclusively on the images from the real_used set.
• Test Set: This set is used for the final evaluation of the models. It consists of two folders:
– real_unknown: A folder containing 500 real CT images. Some of these images were part
of the GAN’s training set, while others were not.
– generated: An additional set of 2,000 synthetic images produced by the same GAN
architecture under identical training conditions.</p>
      <p>The primary objective of the subtask is to classify each image in the real_unknown folder with a
binary label: ‘1’ for "used" (i.e., the image was part of the GAN’s training data) or ‘0’ for "not used".</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>To address the challenge of detecting GAN training data "fingerprints" in the ImageCLEF 2025
Medical Task 1, we developed and evaluated two distinct methodological pillars: a comprehensive,
semiunsupervised pipeline based on feature-space distances, and a fully-supervised deep learning classifier.</p>
      <sec id="sec-4-1">
        <title>4.1. Unsupervised Distance-Based Pipeline</title>
        <p>The fundamental hypothesis of our unsupervised approach is that real images used to train a GAN will
lie closer in a high-dimensional feature space to the manifold of generated images than real images
not used in training. This pipeline was designed to systematically find the optimal combination of
components to exploit this relationship.</p>
        <sec id="sec-4-1-1">
          <title>4.1.1. Pipeline Stages</title>
          <p>Our approach follows a structured, multi-stage process to ensure that the final models are robust and
optimized. The architecture of the pipeline is illustrated in Figure 1.</p>
          <p>1. Hyperparameter Tuning: We first perform an extensive search over a large space of models
and parameters on a validation set created by splitting the initial labeled data (real_used and
real_not_used). This identifies the most promising model configurations.
2. Model Evaluation and Selection: The best-performing configurations from the tuning phase are
then re-instantiated, and models that achieve a performance metric above a predefined threshold
(Cohen’s Kappa &gt; 0.1) on the full labeled set are selected.
3. Final Inference: The selected, validated models are used to predict labels for the final, unseen test
dataset.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.1.2. Component-wise Breakdown</title>
          <p>The pipeline is modular, allowing for the systematic testing of diferent components.
• Feature Extraction: We evaluated a diverse set of pre-trained CNN backbones: ResNet50 [10],
DenseNet121 [12], VGG16 [11], MobileNetV2 [13], and EficientNet-B0 [ 14]. Additionally, we
tested a traditional approach using Local Binary Patterns (LBP) [16].
• Dimensionality Reduction: We integrated an optional step using StandardScaler and PCA [15],
testing configurations that retained 95% and 90% of the variance.
• Clustering: We applied K-Means [17], Agglomerative Clustering, and DBSCAN [18] to the feature
vectors of the generated images to find representative centroids.
• Classification Logic: Real images were classified based on the Euclidean distance of their features
to the generated centroids, using either the minimum (‘min’) or average (‘avg’) distance
compared against an optimized threshold.</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>4.1.3. Systematic Hyperparameter Tuning</title>
          <p>The key to this pipeline is the rigorous tuning process. We created a 30% validation split and performed
a grid search over all component combinations. The final classification threshold was optimized by
testing 95 percentile values on the validation set’s distance distribution. The performance of every
unique parameter combination was evaluated using the Cohen’s Kappa score, and the best configuration
for each feature extractor was selected for the final evaluation.</p>
        </sec>
        <sec id="sec-4-1-4">
          <title>4.1.4. Step-by-Step Unsupervised Workflow</title>
          <p>Our unsupervised pipeline is executed through a systematic, multi-step process for each feature extractor
configuration. The entire workflow is automated in a single script, ensuring reproducibility.
1. Data Preparation and Validation Split: The process begins by identifying the paths to all images in
the generated and real image folders. The set of all available labeled real images (real_used
and real_not_used) is then split into a training/test set and a validation set using a 70/30 ratio,
controlled by a VALIDATION_SPLIT_SIZE parameter of 0.3. This split is stratified and shufled
with a fixed RANDOM_SEED of 42 to ensure consistency across experiments. The smaller validation
set (60 images in our case) is used exclusively for the hyperparameter tuning loop.
2. Feature Extraction: For each feature extractor and PCA configuration defined in our
experimental setup, the pipeline processes every image. A UniversalFeatureExtractor class
loads the specified pre-trained model (e.g., resnet50), and a dedicated function computes
highdimensional feature vectors for all images in both the generated set and the validation set.
3. Dimensionality Reduction: If a PCA configuration is specified (e.g., retain 90% or 95% of variance),
a StandardScaler and a PCA model are employed. Critically, the scaler and PCA models are
iftted only on the feature vectors of the generated images . This prevents any data leakage from
the validation set into the feature transformation process. The fitted models are then used to
transform the features of both the generated and validation sets.
4. Synthetic Data Clustering: With the features prepared, a clustering algorithm (KMeans,
AgglomerativeClustering, or DBSCAN) is applied. This clustering is performed exclusively
on the feature vectors of the generated images. The objective is not to classify, but to model
the distribution of the synthetic data by finding its cluster centroids. These centroids serve as
representatives of the GAN’s output manifold.
5. Distance Calculation and Threshold Tuning: The core classification logic is tuned in this step. For
each image in the validation set, the pipeline calculates its Euclidean distance to the synthetic
data centroids computed in the previous step. Both the minimum and average distances are
considered as potential metrics. This produces a set of distances for the validation images. To
ifnd an optimal decision boundary, the pipeline iterates through 95 diferent percentile values of
this distance distribution (from 1 to 95). Each percentile is treated as a potential classification
threshold.
6. Optimal Configuration Selection: For every unique combination of parameters (feature extractor,
PCA, clustering algorithm, distance metric, etc.), a set of predictions is made for the validation set.
The quality of these predictions is measured by calculating the Cohen’s Kappa score against the
ground truth labels. The full set of parameters that results in the highest Kappa score is stored as
the single best configuration for that specific experimental branch.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Supervised Classification Baseline</title>
        <p>In parallel to our unsupervised approach, we implemented a fully supervised binary classification model
to serve as a high-performance baseline. This model directly learns to diferentiate between images
that were used in GAN training and those that were not. The architecture of the pipeline is illustrated
in Figure 2.</p>
        <p>• Architecture: The model uses a pre-trained ResNet50V2 [10] backbone, leveraging transfer
learning from ImageNet. We replaced the original classifier with a custom classification head
composed of a Global Average Pooling (GAP) layer, a Dense layer with ReLU activation, and a
ifnal Dense layer with a Sigmoid activation function to output a probability score for the "used"
class.
• Training and Data Handling: The model was trained directly on the labeled real_used and
real_not_used image sets. Images were resized to 256× 256 pixels and normalized. We
employed the Adam optimizer [19] and the Binary Crossentropy loss function, which are standard
choices for binary classification tasks. Training was accelerated using a GPU.
• Inference: Once trained, the model was applied directly to the real_unknown test set to predict
a binary label for each image. This approach tests the upper bound of performance when explicit
labels are available for training.</p>
        <sec id="sec-4-2-1">
          <title>4.2.1. Step-by-Step Supervised Workflow</title>
          <p>Our supervised baseline model was implemented using the TensorFlow and Keras frameworks. The
workflow is detailed below.</p>
          <p>1. Data Preparation and Labeling: The file paths for all images in the real_used and
real_not_used directories were loaded. Crucially, each image path was explicitly assigned a
corresponding integer label: ‘1‘ for images in real_used and ‘0‘ for images in real_not_used.
This labeled dataset was then split into training and validation sets for model training and
evaluation.
2. Model Architecture and Transfer Learning: The architecture is built upon a ResNet50V2 model
pre-trained on ImageNet, leveraging the power of transfer learning. The convolutional base of
the pre-trained model was frozen (base_model.trainable = False) to act as a fixed feature
extractor. A custom classification head was then stacked on top, consisting of:
• A Conv2D layer to adapt the single-channel grayscale input to the three-channel format
expected by ResNet50V2.
• The frozen ResNet50V2 base.
• A GlobalAveragePooling2D layer to reduce the feature maps to a vector.
• A Dense layer with 128 units and a ReLU activation function.
• A Dropout layer with a rate of 0.5 for regularization.
• A final Dense output layer with a single unit and a sigmoid activation for binary probability
prediction.
3. Data Augmentation and Training: To improve model generalization, on-the-fly data
augmentation was applied to the training set using Keras’s ImageDataGenerator. Augmentations
included random rotations, width and height shifts, horizontal flips, and zooming. The model was
compiled with the Adam optimizer and binary_crossentropy as the loss function. Training
was performed for 20 epochs with a batch size of 32.
4. Evaluation and Final Inference: After training, the model’s performance was assessed on the
held-out validation set, achieving a final validation Kappa score of 0.05. The fully trained model
was then used to make predictions on the final, unlabeled test dataset provided by the task
organizers, with the results saved to a submission file.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Analysis</title>
      <p>In this section, we present the empirical results of our work. We first detail the performance of our
models on our internal validation set, which guided our selection process. We then report the final
results for our four submitted runs on the oficial ImageCLEF 2025 test dataset and provide an analysis
of the key findings.</p>
      <sec id="sec-5-1">
        <title>5.1. Internal Evaluation and Model Selection</title>
        <p>Our methodology involved two primary streams of experimentation: a systematic, unsupervised pipeline
and a supervised baseline. The top-performing configurations from both streams were selected for final
submission based on their performance on our internal labeled validation data.</p>
        <sec id="sec-5-1-1">
          <title>5.1.1. Unsupervised Pipeline Validation</title>
          <p>The extensive hyperparameter search across our 16 unsupervised configurations yielded a wide range
of results. The performance of the best configuration for each feature extractor and PCA setting on the
full 200 labeled images is shown in Table 1. While most models struggled, three configurations achieved
a Cohen’s Kappa score greater than our selection threshold of 0.1: ResNet50 (None), ResNet50 (PCA
0.9), and LBP (None). These were selected for submission as runs 1778, 1779, and 1777, respectively.</p>
        </sec>
        <sec id="sec-5-1-2">
          <title>5.1.2. Supervised Baseline Validation</title>
          <p>The supervised ResNet50V2 model was trained and evaluated on a separate train/validation split of the
labeled data. On its validation set of 40 images, it achieved a Kappa score of 0.05. Although modest, this
positive result justified its submission as a strong baseline, submitted as run 1811.</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Final Test Set Performance</title>
        <p>The four selected models were submitted to the oficial challenge platform for evaluation against the
hidden test set. The oficial results, linking each model to its submission run ID, are detailed in Table 3.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Analysis</title>
        <p>The supervised approach was the only method to achieve a positive Cohen’s Kappa score (0.012). While
this score is only marginally better than random chance, it clearly indicates that directly training
a classifier on labeled examples was the most efective strategy for capturing the subtle diferences
between the ‘used’ and ‘not used’ classes.</p>
        <p>Second, the unsupervised models failed to generalize from the validation set to the oficial test set.
Models 1, 2, and 4, which had shown some promise with positive Kappa scores during internal validation,
all yielded negative Kappa scores on the final test data. This suggests that the feature-space proximity
hypothesis, while appealing, is not a robust signal on its own and may be highly sensitive to the specific
data distribution, leading to poor generalization.</p>
        <p>Finally, the low scores across all submissions underscore the extreme dificulty of the GAN
fingerprinting task. The signatures left by the GAN training process are incredibly subtle and not easily
detected by either standard deep feature comparison or direct supervised classification, and therefore
this shall remain as a challenging and open area for future research.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this paper, we presented the work of our team, the Challengers, in the ImageCLEFmedical 2025
GANs Task 1, addressing the critical challenge of detecting training data "fingerprints" in synthetic
medical images. To this end, we developed and evaluated two distinct methodological pillars: a fully
supervised binary classifier using a ResNet50V2 architecture and an extensive, semi-unsupervised
pipeline designed to find an optimal configuration for distance-based detection.</p>
      <p>Our final results on the oficial test set reveal the profound dificulty of this task. The supervised
baseline was the only approach to achieve a performance marginally better than random chance,
yielding a positive Cohen’s Kappa score of 0.012. In contrast, our systematically-tuned unsupervised
models, which were based on feature-space proximity, failed to generalize from our internal validation
set to the final test set, all yielding negative Kappa scores. This finding strongly suggests that the subtle
traces left by the training process are not easily captured by standard feature distance metrics alone.</p>
      <p>The primary limitation of our study is the low absolute performance across all attempted models,
which confirms that the signals of data usage are extremely faint. Plus, our extensive unsupervised
pipeline relied on a labeled validation set for its hyperparameter tuning, a resource that may not be
available in real-world scenarios, limiting its practical applicability as a truly unsupervised method.
Therefore, despite our task remaining as a challenge for future research to tackle, we hope our work
provides a comprehensive benchmark for this task, highlighting the limitations of current standard
approaches and charting clear paths for future research into ensuring the privacy of generative models
in medicine.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Google Gemini 2.5 and Grammarly in order
to correct grammatical errors and review the intelligibility of the paper. After using these tools, the
authors reviewed and edited the content as needed and take full responsibility for the publication’s
content.
IR Meets Multilinguality, Multimodality, and Interaction, Proceedings of the 16th International
Conference of the CLEF Association (CLEF 2025), Springer Lecture Notes in Computer Science
LNCS, Madrid, Spain, 2025.
[3] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio,
Generative adversarial nets, in: Advances in Neural Information Processing Systems 27 (NIPS
2014), 2014.
[4] M. Frid-Adar, I. Diamant, E. Klang, M. Amitai, J. Goldberger, H. Greenspan, Gan-based
synthetic medical image augmentation for increased cnn performance in liver lesion classification,
Neurocomputing 321 (2018) 321–331.
[5] H.-C. Shin, N. A. Tenenholtz, J. K. Rogers, C. G. Schwarz, M. L. Senjem, J. L. Gunter, K. P. Andriole,
M. Michalski, Medical image synthesis for data augmentation and anonymization using generative
adversarial networks, arXiv preprint arXiv:1807.10225 (2018).
[6] A. Salehi, A. Sari, D. Erdogmus, Generative adversarial network with handcrafted features for
data augmentation of medical images, in: 2017 IEEE 14th International Symposium on Biomedical
Imaging (ISBI 2017), IEEE, 2017, pp. 706–710.
[7] J. Hayes, L. Melis, G. Danezis, E. De Cristofaro, Logan: Membership inference attacks against
generative models, in: Proceedings of the 2019 ACM SIGSAC Conference on Computer and
Communications Security, 2019, pp. 579–595.
[8] B. Hilprecht, M. Härterich, D. Bernau, Monte carlo and anom-based membership inference attacks
against generative models, in: 2019 First IEEE International Conference on Trust, Privacy and
Security in Intelligent Systems and Applications (TPS-ISA), IEEE, 2019, pp. 174–183.
[9] D. Subburam, S. M. SathyaNarayanan, B. Anand, K. Srinivasan, M. Subramaniam, Dmk-ssn at
imageclef 2023 medical: Controlling the quality of synthetic medical images created via gans using
machine learning and image hashing techniques, in: CLEF 2023 Working Notes, CEUR-WS.org,
2023.
[10] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
[11] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition,
arXiv preprint arXiv:1409.1556 (2014).
[12] G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger, Densely connected convolutional networks,
in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
2017, pp. 4700–4708.
[13] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen, Mobilenetv2: Inverted residuals and
linear bottlenecks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2018, pp. 4510–4520.
[14] M. Tan, Q. V. Le, Eficientnet: Rethinking model scaling for convolutional neural networks, in:</p>
      <p>International Conference on Machine Learning (ICML), PMLR, 2019, pp. 6105–6114.
[15] I. Jollife, Principal component analysis, Springer series in statistics, 2002.
[16] T. Ojala, M. Pietikäinen, T. Mäenpää, Multiresolution gray-scale and rotation invariant texture
classification with local binary patterns, IEEE Transactions on Pattern Analysis and Machine
Intelligence 24 (2002) 971–987.
[17] Some methods for classification and analysis of multivariate observations, volume 1, 1967.
[18] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in
large spatial databases with noise., in: Proceedings of the Second International Conference on
Knowledge Discovery and Data Mining (KDD-96), 1996, pp. 226–231.
[19] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980
(2014).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.-G.</given-names>
            <surname>Andrei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Constantin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dogariu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radzhabov</surname>
          </string-name>
          , L.
          <string-name>
            <surname>-D. Ştefan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Prokopchuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Kovalev</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Müller</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <article-title>Overview of ImageCLEFmedical 2025 GANs task: Training data analysis and ifngerprint detection</article-title>
          ,
          <source>in: CLEF2025 Working Notes, CEUR Workshop Proceedings</source>
          , CEUR-WS.org, Madrid, Spain,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.-C.</given-names>
            <surname>Stanciu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-G.</given-names>
            <surname>Andrei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radzhabov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Prokopchuk</surname>
          </string-name>
          , Ştefan, LiviuDaniel, M.-G. Constantin,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dogariu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Damm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rückert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Ben</given-names>
            <surname>Abacha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>García Seco de Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Friedrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bloch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brüngel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Idrissi-Yaghir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schäfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M. G.</given-names>
            <surname>Pakull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bracke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pelka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Eryilmaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Becker</surname>
          </string-name>
          , W.-W. Yim,
          <string-name>
            <given-names>N.</given-names>
            <surname>Codella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Novoa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Malvehy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dimitrov</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. J. Das</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>H. M.</given-names>
          </string-name>
          <string-name>
            <surname>Shan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Nakov</surname>
            , I. Koychev,
            <given-names>S. A.</given-names>
          </string-name>
          <string-name>
            <surname>Hicks</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gautam</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Thambawita</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Halvorsen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Fabre</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macaire</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Lecouteux</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Heinrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kiesel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Stein</surname>
          </string-name>
          , Overview of imageclef 2025:
          <article-title>Multimedia retrieval in medical, social media and content recommendation applications</article-title>
          , in: Experimental
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>