<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Reverse Engineering Generative Fingerprints in Medical Images: A Deep Learning Approach to Training Data Attribution</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mitra Barve</string-name>
          <email>barve.mitra@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nikita Bhedasgaonkar</string-name>
          <email>nikitaedu7@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Isha Shah</string-name>
          <email>ishahmshah@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sara Nambiar</string-name>
          <email>nambiar.sara@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Atharva Date</string-name>
          <email>atharva2718@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Geetanjali Kale</string-name>
          <email>gvkale@pict.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Autoencoders</institution>
          ,
          <addr-line>DCLGANs, GANs, Latent Fingerprint Detection, Training Data Attribution, Synthetic Biomedical Images</addr-line>
          ,
          <institution>Medical Image Privacy</institution>
          ,
          <addr-line>Deep Representation Learning</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Pune Institute of Technology</institution>
          ,
          <addr-line>Dhankawadi, Pune, Maharashtra</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>The growing concern of data privacy in AI models trained on sensitive medical data has led to increasing usage of synthetically generated medical data for this purpose. ImageCLEFmedical GANs 2025 investigates whether such synthetic data contains fingerprints that might be used to identify the real images that were implicitly used to generate these synthetic images thereby posing a threat to patient privacy. We used multiple approaches to identify whether a given image was part of the training set of a generative model whose outputs we had access to. The central idea was self-supervised training of Auto-Encoders and GANs on the synthetic images and performing clustering / classification on the encoder / critic features. These findings suggest that encoder-based feature representations can retain some training signal from generative models, highlighting potential risks to patient privacy. We also observed that Vision Transformers, especially when pretrained on domain-specific data, help models learn more informative representations.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In recent years, the rise of generative models has enabled the creation of highly realistic synthetic
medical images, providing valuable assistance for applications such as data augmentation. This is
particularly beneficial in scenarios where there is limited access to real medical data. However, this
advancement introduces a critical privacy concern: can synthetic images unintentionally leak sensitive
information about the real training data used in their generation?</p>
      <p>
        Our team, Neural Nexus, investigates this question through our participation in the
ImageCLEFmedical GANs 2025 challenge, specifically Subtask 1: Detect Training Data Usage [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
      </p>
      <p>To explore potential training data leakage, we use self-supervised representation learning methods,
including both Vision Transformer (ViT)-based and CNN-based autoencoders, trained directly on
synthetic image sets. We also evaluate the use of feature extractors derived from GAN critics,
specifically Dual Contrastive Learning GAN (DCLGAN). Additionally, we examine the efects of supervised
pretraining on external labeled tuberculosis dataset to guide encoder feature learning.</p>
      <p>To determine whether these extracted features carry fingerprints of training data, we apply both
supervised classification methods and unsupervised clustering techniques, including KMeans, Gaussian
Mixture Models (GMM), and Spectral Clustering.</p>
      <p>Our findings demonstrate that, under specific experimental setups, it is indeed possible to
distinguish between synthetic images generated from “real_used” training samples and those that were not.
These results raise important concerns about data traceability and privacy in synthetic medical image
generation workflows. The code required for this is publicly available in our Github Repository. 1</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>This work addresses the problem of detecting whether real medical images were used in the training
of generative models, with a focus on synthetic lung CT slices produced by a Generative Adversarial
Network (GAN). Specifically, Subtask 1, Detect Training Data Usage, aims to determine whether a
given real image contributed to the training of a GAN that produced synthetic CT images. Identifying
such instances is critical for assessing privacy risks, as the presence of training data “fingerprints” in
generated images can indicate potential data leakage and privacy violations. The complete dataset
comprises standardized lung CT scan slices in PNG format, each of size 256 × 256 pixels and encoded
at 8 bits per pixel. The training data is divided into three folders:
• generated: 5,000 synthetic CT images generated by a GAN trained on real lung CT data.
• real_used: 100 real images that were included in the training set of the GAN.</p>
      <p>• real_not_used: 100 real images that were excluded from the GAN’s training process.
The test data is similarly structured and consists of:
• generated: 2,000 new synthetic images produced by the same GAN under the same training
configuration.
• real_unknown: 500 real CT images, which are a mixture of “used” and “not used” images. The
task is to predict, for each of these, whether it was used (label 1) or not used (label 0) during GAN
training.</p>
      <p>The goal is to train a model on the provided training set and produce a final output file with 500
binary predictions corresponding to the real_unknown images. A correct prediction (1 for used, 0
for not used) indicates the model successfully identified subtle signatures that distinguish between
training and held-out data in the synthetic output space. The implications of this work are substantial,
particularly in the context of medical imaging, where the inadvertent exposure of patient-specific
information through generative models raises significant ethical and legal concerns.
1https://github.com/saranambiar/Neural-Nexus-ImageClef-2025</p>
      <sec id="sec-2-1">
        <title>External Dataset Description</title>
        <p>
          The ViT encoder and the encoder used in the spectral clustering approach was pretrained on a publicly
available external tuberculosis classification dataset [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], consisting of lung CT slices in JPG/PNG format.
The dataset includes images from four classes: Adenocarcinoma, Large Cell Carcinoma, Squamous Cell
Carcinoma, and Normal. It is split into training (70%), validation (10%), and test (20%) sets. Pretraining on
this domain-specific dataset helped the encoder learn medically relevant features prior to reconstruction
tasks.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Pretrained ViT Autoencoder</title>
        <p>
          The proposed autoencoder architecture is composed of a Vision Transformer (ViT)-based encoder and a
convolutional decoder enhanced with residual connections [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. The encoder utilizes a modified ViT-B/16
model, where the original classification head is removed and replaced with a projection layer that
maps the 768-dimensional token embeddings to a lower-dimensional latent space. Since ViT requires a
3-channel input, a lightweight 1× 1 convolution is applied to convert grayscale images into a transformer
suitable format for training.
        </p>
        <p>
          Here, the encoder is pretrained on an external tuberculosis classification dataset, enabling it to extract
domain-specific features[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Unlike conventional CNNs, the ViT encoder employs self-attention to
capture long-range dependencies and global context at an early stage. This mechanism is defined as:
Attention(, ,  ) = softmax
(1)
︂(  )︂
√
        </p>
        <p>
          Here, , , and  are the query, key, and value matrices derived from the token embeddings, and  is
the dimensionality of the key vectors. This attention mechanism allows ViT to extract semantically rich
and discriminative features, which are particularly crucial in medical imaging and anomaly detection,
where subtle, spatially distributed patterns across the image are important [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>
          The decoder reconstructs the input image from the latent representation using a sequence of
transposed convolutional layers and residual blocks. These residual connections promote eficient feature
propagation and training stability [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The decoder progressively upsamples the compact representation
back to the original image resolution. Finally, a Tanh activation is applied to produce pixel values
in a normalized range. After analyzing the reconstruction error distribution on the validation set, a
threshold is empirically set to classify the images.
        </p>
        <sec id="sec-3-1-1">
          <title>3.1.1. Experimental Setup</title>
          <p>The experiment involved two phases: supervised pretraining of the ViT-based encoder for classification,
followed by unsupervised training of a full autoencoder using the pretrained encoder.</p>
          <p>In the first phase, the encoder, based on the ViT-B/16 architecture, was pretrained for 100 epochs on
a multi-class classification task using cross-entropy loss. The model was optimized using the Adam
optimizer with a StepLR scheduler, and accuracy and loss were tracked on both training and validation
sets. Training was conducted using PyTorch with GPU acceleration.</p>
          <p>In the second phase, the pretrained encoder weights were reused to initialize an autoencoder, which
was trained to minimize the Mean Squared Error (MSE) between input and reconstructed CT images.
The decoder consisted of transposed convolutions and residual blocks to progressively upsample the
latent representation back to the image resolution. Training was conducted for 10 epochs using the
Adam optimizer with a learning rate of 1 × 10− 4 and a weight decay of 1 × 10− 5, and convergence
was monitored using average reconstruction loss per epoch.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Spectral Clustering on AutoEncoder Features</title>
        <p>
          We implemented a convolutional autoencoder to learn compact representations of the given generated
images. The encoder was first pretrained in a supervised fashion on an external tuberculosis classification
dataset [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], allowing it to extract domain-relevant features. This encoder was then integrated into a full
autoencoder and fine-tuned on the available data using reconstruction loss. Specifically, for an input
image  ∈ R× ×  , the encoder (· ) produced a latent representation  = (), which was then
passed through the decoder (· ) to reconstruct the image: ˆ = (). The training objective was to
minimize the reconstruction MSE:
(2)
(3)
ℒrecon = ‖ − ˆ‖22.
        </p>
        <p>After training, the encoder was used to extract latent features  for all images in the dataset. We then
applied various clustering algorithms, including KMeans, Gaussian Mixture Models (GMM), and Spectral
Clustering, to group these features into two clusters corresponding to the real_used and real_not_used
labels. Among these, Spectral Clustering yielded the highest performance in terms of both accuracy
and F1-score.</p>
        <p>
          Spectral Clustering [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] operates by computing a similarity graph  = (, ) over the features,
forming the graph Laplacian  =  − , where  is the afinity matrix and  is the degree matrix.
The eigenvectors corresponding to the  smallest eigenvalues of the normalized Laplacian sym =
− 1/2− 1/2 are used to embed the data, followed by KMeans:
        </p>
        <p>sym =  − − 1/2− 1/2.</p>
        <p>Clustering on the features extracted from the encoder indicated significant diferences between the
used and unused images.</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. Experimental Setup</title>
          <p>We used an encoder with 4 Residual Blocks (having 32, 64, 128, 256 filters ) each consisting of multiple
Convolutional, BatchNorm, Dropout and Pooling layers. We initially pretrained it on an external
classification dataset by appending a linear classification head to this encoder and training it for 100
epochs against Cross Entropy Loss using the Adam Optimizer and Step LR Scheduler.</p>
          <p>We built the autoencoder using this pre-trained encoder and a decoder with 4 Residual Blocks
(having 256, 128, 64, 32 filters). The Auto-Encoder was then trained for 100 epochs using L1 loss to
model reconstruction error and the Adam optimizer. Spectral clustering was performed on the features
produced by the trained encoder to generate final results.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. ResNet Autoencoder and Feature-Based Detection Framework</title>
        <sec id="sec-3-3-1">
          <title>3.3.1. ResNet Autoencoder</title>
          <p>We adopted a ResNet-based convolutional autoencoder with residual connections to enable stable
training and deep feature extraction. The encoder consists of a total of four Conv2D layers with
increasing filters (32 →64→128→256) and stride 2, each followed by a residual block comprising two
3 × 3 Conv2D layers, BatchNorm, and LeakyReLU activation, along with a shortcut connection.</p>
          <p>
            A GlobalAveragePooling2D layer reduced the spatial dimensions, and a Dense layer maped the result
to a 512-dimensional latent vector  = (). Following Wickramasinghe et al. [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ], this residual design
prevented performance degradation as the depth increased, enabling deeper networks with better
generalization.
          </p>
          <p>The decoder inverts this process: the latent vector  is passed through a Dense layer and reshaped to
16 × 16 × 128, followed by three Conv2DTranspose layers (128→64→32), each paired with a residual
block. A final Conv2DTranspose layer with 3 filters and sigmoid activation reconstructed the image.
The network was trained end-to-end using the mean absolute error (MAE) loss.</p>
          <p>
            Optimization is performed with Adam optimizer, allowing the encoder to efectively learn compact
representations of GAN-generated samples. A similar multi-scale residual autoencoder design was
successfully applied by Li et al. [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] for CT lung nodule classification, further supporting its applicability
in capturing fine-grained medical image features.
          </p>
        </sec>
        <sec id="sec-3-3-2">
          <title>3.3.2. Feature-based classification</title>
          <p>
            The ResNet encoder, trained on synthetic images produces 512-dim latent embeddings. However, these
features must be enriched. We accomplish this using handcrafted descriptors such as first-order radiomic
statistics, GLCM-based texture measures [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ], wavelet subband energies [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ], Gabor filter responses
[
            <xref ref-type="bibr" rid="ref12">12</xref>
            ] and morphological characteristics.
          </p>
          <p>
            Next, we train a Random Forest classifier on the labeled real images using the enriched features. We
compute Mahalanobis distances [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ] from each real image to the synthetic feature distribution, based
on the top-K most imformative features. This is done to encourage the model to better distinguish used
and unused images. These distances are used as a meta anomaly signal.
          </p>
          <p>As per the classifier’s probability outputs, predictions are generated using thresholds selected via
cross-validation methods. This approach helps optimize F1-score and Cohen’s Kappa.</p>
        </sec>
        <sec id="sec-3-3-3">
          <title>3.3.3. Mathematical Expressions for Handcrafted Features</title>
        </sec>
        <sec id="sec-3-3-4">
          <title>GLCM Contrast</title>
          <p>− 1 − 1
Contrast = ∑︁ ∑︁ ( − )2  (, )</p>
          <p>=0 =0
where  (, ) is the normalized co-occurrence probability of gray levels  and , and  is the number
of gray levels.</p>
        </sec>
        <sec id="sec-3-3-5">
          <title>Wavelet Subband Energy</title>
          <p>= ∑︁ ∑︁ |(, )|2</p>
          <p>=1 =1
where (, ) denotes the wavelet coeficient at position (, ) in subband , and  is the total
energy of that subband.  and  are the dimensions of the subband.</p>
          <p>Multiple subbands (e.g., horizontal, vertical, diagonal details at various scales) can be used to form a
feature vector of subband energies.
(4)
(5)
(, ) = exp −
︂(
′2 +  2′2 )︂
′ =  cos  +  sin ,  ′ = −  sin  +  cos 
where  is the wavelength,  is the orientation,  is the phase ofset,  is the Gaussian standard deviation,</p>
        </sec>
        <sec id="sec-3-3-6">
          <title>Gabor Filter Response</title>
          <p>and  is the spatial aspect ratio.</p>
          <p>Mahalanobis Distance
(6)
(7)
(8)
√︁
 (x) =</p>
          <p>(x −  )⊤Σ− 1(x −  )
where x is the feature vector,  is the mean of the distribution, and Σ is the covariance matrix.</p>
        </sec>
        <sec id="sec-3-3-7">
          <title>3.3.4. Experimental Setup</title>
          <p>The above-mentioned ResNet-based convolutional autoencoder was implemented using the TensorFlow
deep learning framework. We trained the model for 50 epochs using the Adam optimizer with default
absolute error loss function was used to improve pixel-wise reconstruction accuracy.
hyperparameters ( 1 = 0.9,  2 = 0.999) and a fixed learning rate of 1 × 10− 4. The training was
performed with a batch size of 32 on normalized RGB images resized to 128 × 128 pixels. The mean</p>
          <p>The software environment included Python 3.8, TensorFlow 2.12, and CUDA 11.8. We observed the
model convergence within 40 to 45 epochs. GPU acceleration was utilized throughout the training
process to eficiently handle high-dimensional image data and support deep network training.</p>
          <p>The meta-classification piece was made using Python 3.11 and scikit-learn 1.4, NumPy 1.26,
Pandas 2.2, and OpenCV 4.9 for preprocessing and feature extraction. Latent features were extracted
from the previously mentioned autoencoder. Gabor filters and wavelet transforms were calculated on
default settings. The Random Forest classifier was trained with 100 estimators and default depth, and
with Gini impurity as the split criterion. Mahalanobis distances were calculated using the empirical
covariance matrix corresponding to the synthetic image features. All experiments were executed on
the Kaggle GPU runtime with CUDA 11.8 support.</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Dual-Contrastive GAN for Feature Extraction</title>
        <p>and Tanh activation.</p>
        <p>We use a DCLGAN framework to extract and compare discriminator-driven feature maps across three
CT image domains. By analyzing attention patterns from intermediate discriminator activations, we
estimate semantic similarity to infer which real images influenced the generator’s learning. The
generator starts with a 7 ×</p>
        <p>
          7 convolution, followed by downsampling (128→256 channels) and ResNet
blocks with identity shortcuts [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. It then upsamples (256→128→64) and ends with a 7 × 7 convolution
        </p>
        <p>
          The PatchGAN discriminator consists of five Conv2D layers (64 →1 channels) with LeakyReLU and
Instance Normalization [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. We attach forward hooks to the deepest post-activation layers to extract
feature maps, enabling spatial attention analysis [15]. This supports evaluating feature overlap and
understanding the generator’s dependency on real images. While this approach is not directly linked to
a specific submission ID, initial testing indicated promising performance for feature extraction in CT
scan imagery.
        </p>
        <sec id="sec-3-4-1">
          <title>3.4.1. Experimental Setup</title>
          <p>×
The model was trained for 200 epochs using the Adam optimizer with TTUR: learning rates for both
the generator and discriminator were set to 1
10− 4 and 4
10− 4, respectively. Cosine annealing
schedulers were used for both optimizers. Spectral normalization was applied to all Conv2D and Linear
×
layers in the discriminator to enforce 1-Lipschitz continuity. We conducted training with batch size 1
using CUDA acceleration on PyTorch.</p>
          <p>Training includes three loss functions:
• Hounsfield Unit (HU) Loss: Preserves clinically relevant CT intensity distributions by aligning
histogram distributions of real and synthetic images for each slice. Let () and  () denote
the normalized histogram values of the th bin for real and fake images, respectively. The HU loss
is computed as the Kullback-Leibler divergence over  histogram bins:</p>
          <p>ℒHU = ∑︁ () log
=1
︂( () +  )︂
 () + 
(9)
where  is a small constant added for numerical stability.
• PatchNCE Loss: Enforces localized contrastive alignment between real and generated patches,
improving the fidelity of fine-grained features [15].
• Feature Matching Loss: Stabilizes adversarial training by minimizing the L1 distance between
discriminator feature activations for real and synthetic inputs.</p>
          <p>We apply gradient penalty regularization to enhance training stability. This setup ensures that precise
attention overlaps are captured between discriminator feature activations, supporting our hypothesis
that real_used and generated images exhibit greater alignment than real_not_used.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>The quantitative results in Table 1 demonstrate the superiority of the ViT-based encoder, which obtained
the highest F1-score (0.568) and Cohen’s Kappa (0.148). The encoder learns anatomical priors and
pathological patterns unique to thoracic imaging after being pretrained on a lung CT dataset. During
reconstruction, the model can extract more significant and medically relevant representations thanks to
this initialization. ViT’s self-attention mechanism, in contrast to CNNs, enables the model to concentrate
on contextually related but spatially distant regions, which is essential for detecting subtle or difuse
abnormalities in chest CT scans. Furthermore, by stabilizing training and maintaining low-level spatial
details, the residual decoder architecture makes high-fidelity reconstruction possible. Only the most
important features are kept and rebuilt thanks to the compact latent space’s function as a semantic
bottleneck.The higher Cohen’s Kappa score indicates that the model is more in agreement with
groundtruth labels as a result of these factors working together to minimize incorrect classifications and
help the model diferentiate between structurally similar classes. Reconstructions produced by this
architectural synergy are both aesthetically pleasing and diagnostically significant.</p>
      <p>The spectral clustering approach outperformed the ResNet-based framework in F1-score (0.442 vs.
0.437) and Cohen’s Kappa (0.072 vs. 0.032), despite being unsupervised, proving the efectiveness of
clustering in learned latent spaces. Lower recall, on the other hand, implies sensitivity trade-ofs in
capturing subtler image characteristics, perhaps as a result of noise in fine-grained structural regions
that are important for medical imaging or more dificult-to-cluster borderline cases.</p>
      <p>The modest results of the ResNet autoencoder with handcrafted features and anomaly scoring via
Mahalanobis distance are probably the result of either domain shift efects between real and synthetic
samples or the limited generalization of handcrafted features. Handcrafted features frequently lack the
lfexibility to adapt to unseen data distributions, especially in complex medical contexts, even though
the architecture captures low and mid-level patterns fairly well.</p>
      <p>The DCLGAN method demonstrated excellent qualitative performance in capturing intensity
distributions and attention overlaps, indicating its potential for targeted feature attribution and interpretable
GAN evaluation in CT domains, even though this was not evident in the leaderboard submissions.</p>
      <p>It is important to note that not all submissions made during the challenge are discussed in this
working note. We have intentionally focused on presenting only those configurations that yielded the
most competitive results in terms of quantitative metrics or qualitative insights. Lower-performing or
exploratory runs have been excluded to maintain clarity and focus.</p>
      <p>In this paper, we explore the ImageCLEF GAN 2025 task of identifying GAN fingerprints on training
data. Specifically, we determined whether a particular image had been part of the training set for a
generative model used to create the given synthetic images. We used multiple methods where we trained
VIT/Resnet-based AutoEncoders or GANs on the provided Synthetic Images to capture the Generated
distribution and performed clustering/classification on the features extracted from the Encoder/GAN
Critic. Using a classifier on the features extracted from a VIT-based AutoEncoder provided the best
results resulting in a Cohen’s Kappa Score of 0.148.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>Thanks to SCTR’s Pune Institute of Computer Technology (PICT), Pune, India for their support and the
resources provided, which greatly assisted in the research and preparation of this work.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used X-GPT-4 and Gramby in order to: Grammar and
spelling check. Further, the author(s) used X-AI-IMG for figures 3 and 4 in order to: Generate images.</p>
    </sec>
    <sec id="sec-7">
      <title>5. Conclusion</title>
      <sec id="sec-7-1">
        <title>Cohen’s Kappa</title>
      </sec>
      <sec id="sec-7-2">
        <title>Accuracy</title>
      </sec>
      <sec id="sec-7-3">
        <title>Precision Recall F1-Score</title>
        <p>After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and take(s)
full responsibility for the publication’s content.
[15] H. Zhang, I. Goodfellow, D. Metaxas, A. Odena, Self-attention generative adversarial networks, in:
Proceedings of the International Conference on Machine Learning (ICML), 2019. doi:10.48550/
arXiv.1805.08318.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.-G.</given-names>
            <surname>Andrei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Constantin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dogariu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radzhabov</surname>
          </string-name>
          , L.
          <string-name>
            <surname>-D. Ştefan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Prokopchuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Kovalev</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Müller</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <article-title>Overview of imageclefmedical 2025 GANs task: Training data analysis and ifngerprint detection</article-title>
          ,
          <source>in: CLEF2025 Working Notes, CEUR Workshop Proceedings</source>
          , CEUR-WS.org, Madrid, Spain,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.-C.</given-names>
            <surname>Stanciu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-G.</given-names>
            <surname>Andrei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radzhabov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Prokopchuk</surname>
          </string-name>
          , Ştefan, LiviuDaniel, M.-G. Constantin,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dogariu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Damm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rückert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Ben</given-names>
            <surname>Abacha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>García Seco de Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Friedrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bloch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brüngel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Idrissi-Yaghir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schäfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M. G.</given-names>
            <surname>Pakull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bracke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pelka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Eryilmaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Becker</surname>
          </string-name>
          , W.-W. Yim,
          <string-name>
            <given-names>N.</given-names>
            <surname>Codella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Novoa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Malvehy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dimitrov</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. J. Das</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>H. M.</given-names>
          </string-name>
          <string-name>
            <surname>Shan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Nakov</surname>
            , I. Koychev,
            <given-names>S. A.</given-names>
          </string-name>
          <string-name>
            <surname>Hicks</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gautam</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Thambawita</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Halvorsen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Fabre</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macaire</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Lecouteux</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Heinrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kiesel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Stein</surname>
          </string-name>
          , Overview of imageclef 2025:
          <article-title>Multimedia retrieval in medical, social media and content recommendation applications, in: Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and
          <string-name>
            <surname>Interaction</surname>
          </string-name>
          ,
          <source>Proceedings of the 16th International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ), Springer Lecture Notes in Computer Science LNCS, Madrid, Spain,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hany</surname>
          </string-name>
          ,
          <article-title>Chest ct-scan images dataset</article-title>
          , https://www.kaggle.com/datasets/mohamedhanyyy/ chest-ctscan-images/data,
          <year>2020</year>
          . Accessed:
          <fpage>2025</fpage>
          -05-30.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Prabhakar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wiestler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Menze</surname>
          </string-name>
          ,
          <article-title>Masked autoencoders are efective for medical image reconstruction and synthesis</article-title>
          ,
          <source>arXiv preprint arXiv:2301.07382</source>
          (
          <year>2023</year>
          ). URL: https://arxiv.org/abs/2301.07382.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dosovitskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Beyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kolesnikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Weissenborn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Unterthiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dehghani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Minderer</surname>
          </string-name>
          , G. Heigold,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Houlsby</surname>
          </string-name>
          ,
          <article-title>An image is worth 16x16 words: Transformers for image recognition at scale</article-title>
          ,
          <source>International Conference on Learning Representations (ICLR)</source>
          (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/
          <year>2010</year>
          .11929.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Deep residual learning for image recognition</article-title>
          ,
          <source>in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>770</fpage>
          -
          <lpage>778</lpage>
          . doi:
          <volume>10</volume>
          .1109/CVPR.
          <year>2016</year>
          .
          <volume>90</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jordan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <article-title>On spectral clustering: Analysis and an algorithm</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>14</volume>
          (
          <year>2001</year>
          ). doi:
          <volume>10</volume>
          .5555/2980539.2980649.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Wickramasinghe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Marino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Manic</surname>
          </string-name>
          ,
          <article-title>Resnet autoencoders for unsupervised feature learning from high-dimensional data: Deep models resistant to performance degradation</article-title>
          , IEEE Access (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2021</year>
          .
          <volume>3064819</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>F.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. N. Y.</given-names>
            <surname>Sherazi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>A new multi-scale dilated deep resnet model for classification of lung nodules in ct images</article-title>
          ,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .1145/3507971.3507988.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>S. K. . D. I. Haralick</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.M.</surname>
          </string-name>
          ,
          <article-title>Textural features for image classification</article-title>
          .,
          <source>IEEE Transactions on Systems, Man, and Cybernetics</source>
          (
          <year>1973</year>
          ). doi:
          <volume>10</volume>
          .1109/TSMC.
          <year>1973</year>
          .
          <volume>4309314</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mallat</surname>
          </string-name>
          ,
          <article-title>A theory for multiresolution signal decomposition: The wavelet representation</article-title>
          .,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          (
          <year>1989</year>
          ). doi:
          <volume>10</volume>
          .1109/34.192463.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Daugman</surname>
          </string-name>
          ,
          <article-title>Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters</article-title>
          .,
          <source>Journal of the Optical Society of America A</source>
          (
          <year>1989</year>
          ). doi:
          <volume>10</volume>
          .1364/josaa.2.001160.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>J.-R. D. . M. D. De Maesschalck</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <source>The mahalanobis distance., Chemometrics and Intelligent Laboratory Systems</source>
          (
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Isola</surname>
          </string-name>
          , J.-Y. Zhu,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Efros</surname>
          </string-name>
          ,
          <article-title>Image-to-image translation with conditional adversarial networks</article-title>
          ,
          <source>in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2017</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.1611.07004.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>