<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ViT-based Generative Model Fingerprinting</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yijiang Zhou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Haiyan Ding</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Information Science and Engineering, Yunnan University</institution>
          ,
          <addr-line>Kunming 650504, Yunnan</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the task of detecting privacy leakage risks in synthetic medical images, our team (zhouyijiang1) proposed a dynamic detection framework based on visual fingerprints for Subtask 1 ("Detecting Training Data Usage") of the ImageCLEF Medical 2025 GANs task. The scientific core of this task lies in verifying whether medical images synthesized by Generative Adversarial Networks (GANs) contain implicit fingerprint information from the training data, thereby assessing the risk of patient privacy leakage. Building on the Vision Transformer (ViT) architecture, we integrated a dynamic block masking mechanism with a cross-layer attention feature pyramid to construct a two-stage detection pipeline: The first stage leverages high-dimensional feature similarity matching (FAISS-L2) to filter candidate samples from a pre-built fingerprint library; the second stage performs precise judgment via a Hybrid Supervised Contrastive Network (Hybrid CANet), which combines cross-entropy loss and contrastive constraint loss to significantly mitigate false positive issues caused by model drift. Experimental results on the validation set demonstrate that our proposed method can efectively identify latent training data "fingerprint" information in synthetically generated images, achieving F1 and kappa scores of 0.619 and 0.172, respectively, indicating strong discriminative capability. Notably, a significant discrepancy in model performance was observed in the competition test set (with the best Kappa coeficient at 0.136). This contrast not only reveals the complexity of data distribution in real-world application scenarios but also indirectly verifies the efectiveness of the method under constrained validation conditions. Looking ahead to future research, we need to focus on addressing the issue of cross-domain generalization capability to enhance the model's robustness to distribution shifts. The related code has been open-sourced and is available at https://github.com/ZhouYiJiang88/Image_vit.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Vision Transformer</kwd>
        <kwd>medical images</kwd>
        <kwd>GAN</kwd>
        <kwd>hybrid similarity measurement</kwd>
        <kwd>two-stage detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Deep learning is commonly used in speech, image recognition[
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] , and medical image processing,
with typical applications being image classification and image segmentation [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
      </p>
      <p>
        However, recent studies [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5, 6, 7</xref>
        ] have confirmed that synthetic medical imaging technology based on
generative adversarial network (GAN) may induce new privacy leakage risks: biometric identification
makes chest X-ray and magnetic resonance imaging information available for patient identity
reidentification; The memory mechanism of the generative model may lead to the implicit association
between the synthetic image and the high-dimensional features of the specific training sample, forming
a privacy penetration channel. In response to this risk, ImageCLEFmedical GANs 2025 introduces a new
challenge[
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ] : the requirement to determine whether a composite image has a potential association
with a given real training set (i.e., whether it is generated by a real sample), which is essentially a binary
classification problem [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], in which real images can be classified as "used" or "unused".
      </p>
      <p>The generated image learns from the real image data, and the closer the data distribution, the higher
the quality. In this work, we are tasked with performing binary classification to classify used or unused
images. In order to accomplish this task, we construct a medical image fingerprint detection framework
based on visual Transformer (ViT) and feature space alignment, extract high-dimensional semantic
features through the pre-trained ViT model, combine dynamic data augmentation and dual-channel
attention mechanism to capture memory traces and adopt a dual verification architecture to achieve
accurate detection of medical image training set leakage, reduce the false positive rate, and provide
technical support for the compliant use of generative medical data.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        In recent years, Generative Adversarial Networks (GANs) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] have received extensive attention in
the medical field for image generation and transformation tasks, and many studies have explored
their applications in medical image synthesis and transformation, and GANs can explore the potential
information of medical images [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ] and generate virtual images that are conducive to diagnosis.
      </p>
      <p>
        Sheng-Yu Wang et al.[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] revealed the inherent defects of early GAN-generated images (e.g., ProGAN,
StyleGAN) in the frequency domain. Through experiments, they found that the high-frequency noise
patterns in the frequency domain of the images generated by GAN are significantly diferent from
those of the real images. For the first time, the concept of a "fingerprint" for generating images was
systematically proposed, and its detectability was verified. FakeCLR [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], proposed by Haodong Li et
al., pioneered the application of contrastive learning frameworks to the field of generative adversarial
network (GAN) composite image detection, filling the technical gap of self-supervised learning in
this direction. By constructing comparative sample pairs in feature space, the potential artifacts with
domain invariance in the generated images are mined in an unsupervised manner, which has significant
advantages in a variety of heterogeneous architecture cross-domain detection tasks.
      </p>
      <p>Synthetic images open up a new way to construct typical case samples in the medical field, enabling
medical researchers and clinicians to deepen their understanding of pathological mechanisms, optimize
clinical diagnostic methods, and validate treatment options based on standardized data. At the same time,
this method efectively alleviates the patient privacy dilemma involved in real medical imaging: because
the original medical data often contains traceable biometric information, traditional data sharing faces
ethical and legal constraints. By generating synthetic images that preserve the anatomical features of
the human body and are desensitized, the data distribution required for research is maintained, and
large-scale secure data flow and cross-agency collaboration are realized. This technology balances the
needs of information utilization and privacy protection in medical research and provides key technical
support for the construction of an open scientific research ecosystem.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <sec id="sec-3-1">
        <title>3.1. System Architecture</title>
        <p>In this study, we propose a leakage detection system for medical imaging training sets based on the
dual-stream heterogeneous feature learning framework (Fig. 1), the core innovation of which is to
perform hierarchical coupling of high-dimensional feature space matching and deep semantic association
modeling, and reduce the false positive rate through a two-stage screening mechanism. The system
adopts the divide-and-conquer design strategy to construct a heterogeneous dual-channel architecture
of feature fast screening and attention precision judgment network (Fig. 1). It makes comprehensive
decisions through the dynamic weight fusion mechanism. The coarse sieve module is based on the
FAISS (Facebook AI Similarity Search) engine to build a high-dimensional feature approximation search
system, and introduces cosine similarity spectral clustering to optimize the search results: the similarity
matrix of the Top-50 candidate sets is constructed as a graph structure, and the potential outliers are
separated by spectral clustering, and then the candidate pool size is dynamically adjusted. The Hybrid
CANet is constructed in the Precision Judgment Module, which realizes the semantic alignment between
the generated and real features through the Cross-modal Gating Unit. The network input is a feature
stitching vector (1536 dimensions), the multi-head self-attention mechanism captures the cross-region
correlation, and the output layer applies Adaptive Temperature Scaling to calibrate the confidence
distribution.</p>
        <p>(a) The dual-stream detection architecture includes a feature coarse sieve module (FAISS engine) and an
attention precision judgment network (CANet)
(b) Schematic diagram of the components of the ViT-based feature encoder</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Feature Encoding Network</title>
        <p>Relying on the vit_base_patch16_224 architecture, we removed the original classification header in a
targeted manner, injected a hybrid position coding strategy, and introduced a normalized coordinate
vector based on the original absolute position embedding to accurately capture the distortion of the
anatomical structure of medical images by enhancing the spatial proportion perception ability.
 
PE(, ) = Concat(PEabs(, ), , ) (1)
 
In the feature extraction stage, the dynamic data augmentation module with probability threshold p=0.6
(including perspective deformation with brightness jitter ±20%, ±15° random rotation and amplitude
0.2) interferes with the input image, and at the same time, the image blocks are randomly masked with
a 20% probability in the feature space and directional Gabor noise is superimposed to simulate the
common artifacts of medical images.</p>
        <p>(, ;  ) = − ′2+ 2′2
2 2</p>
        <p>︂(
cos 2</p>
        <sec id="sec-3-2-1">
          <title>Hierarchy combinations</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>Kappa</title>
          <p>L3+9+15
L6+12+18
L8+16+24
L5+10+15+20
′ )︂

the combined efects of diferent levels
fusion = LayerNorm ⎝
⎛</p>
          <p>∑︁
∈{6,12,18}</p>
          <p>⎞
  · GELU(C(L)S)⎠</p>
          <p>In order to solve the problem of false positive suppression, the dynamic weight decay of linear
growth is embedded in the AdamW optimizer, the momentum comparison loss function is constructed,
and the temperature scaling mechanism of negative sample similarity (initial temperature  = 0.07,
base temperature 0 = 0.05) strengthens the model to identify fingerprints and noise features, and
ifnally realizes eficient and reliable privacy leakage detection through the two-level decision-making
mechanism of "FAISS approximate search + Hybrid CANet fine judgment". Experiments show that the
encoding network shows strong generalization ability in cross-domain testing, and the kappa score on
the dataset oficially provided by ImageCLEFmed GAN 2025 is increased by 0.06.
(2)
(3)
(4)
(5)</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Deep Attention Network</title>
        <p>By fusing multi-layer perceptrons and multi-head attention mechanisms, we construct an end-to-end
correlation detection framework between generated and real images. The model takes the features
of the generated graph and the real graph (dimension 768×2=1536) as inputs, and achieves accurate
matching through three stages: feature mapping, global dependence modeling and decision refinement.
In the feature mapping stage, the input is upgraded to 1536 dimensions by fully connected layers, and
the nonlinear expression ability is enhanced by Layer Norm and GELU activation function, which is
expressed as follows:</p>
        <p>ℎ1 = GELU(LayerNorm(1 + 1))
where 1 ∈ R1536× 1536 are the weight matrices, and  ∈ R1536 are the input features. Subsequently,
a high proportion of Dropout (0.6) was introduced to suppress the overfitting, and then the features
were gradually compressed by two dimensionality reductions (1536→1024→512) to focus on the key
discriminant information. In the global dependency modeling stage, the model is embedded with an
8-head multihead attention mechanism, and its calculation process can be expressed as follows:
Attention(, ,  ) = Softmax
︂(  )︂
√

where  =  =  ∈ R1× 512 is the self-attention input, and  = 64 is the dimension of each attention
head. The multi-head mechanism divides the input into 8 subspaces (512/8=64), learns the correlation
of diferent semantic patterns (such as texture, outline, and color distribution) in parallel, and finally
fuses the output of each head through splicing and linear transformation to enhance the modeling
ability of cross-region dependencies. In the decision refinement stage, the attention output is further
reduced to 256 dimensions through the fully connected layer, and finally mapped to 1 dimension of the
configuration reliability:</p>
        <p>=  (3 · GELU(2ℎattn + 2))
2 ∈ R256× 512and 3 ∈ R1× 256 are the weight matrices, and the  are Sigmoid functions. The
model is trained using the AdamW optimizer, the learning rate is set to 1 × 10− 4, the weight decay
1 × 10− 4 is used to prevent overfitting, and the loss function is binary cross-entropy:

1 ∑︁ [ log ˆ + (1 − ) log(1 − ˆ)]
ℒ = −</p>
        <p>=1</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Two-stage Decision Engine</title>
        <p>In the detection stage, through the dynamic decision-making mechanism of coarse sieve and fine
sieve, the former uses a fixed learning rate of 1e-4 and a CNN feature freezing strategy to stabilize the
initialization of the feature space, and the latter introduces cosine annealing attenuation to gradually
reduce the learning rate to 5e-6 and activate the full-parameter fine-tuning, and embeds the adversarial
perturbation term in the loss function to enhance the robustness of the model, so as to optimize the
inference eficiency while ensuring high-precision detection. Firstly, the system uses the coarse sieve
strategy based on the FAISS high-dimensional index engine to project the input features into a
768dimensional Euclidean space to construct the feature spherical distribution, and the top 100 candidate
samples are screened out by the probability threshold (default 0.6). In the fine screening stage, the
candidate samples are finely classified by the dynamic calibration layer of the Comparative Attention
Network (CANet), which compares and analyzes the original similarity with the attention-weighted
calibration value to generate the final judgment result. The final match must meet two conditions:
the confidence level exceeds the threshold of 0.6 and the feature similarity score ranks in the top 100.
Experiments show that the design reduces the false positive rate on public datasets.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Experimental etup</title>
        <p>The benchmark dataset oficially available for the ImageCLEFmed GAN 2025: Detect Training Data
Usage competition includes both real and synthetic biomedical images. An example image is shown
in Figure 2. The real image consists of 3D CT scans of approximately 8,000 tuberculosis patients and
axial sections. These real images are stored in PNG format of 8 bits per pixel with dimensions of
256x256 pixels. The composite image is also 256x256 pixels in size and is generated by a variety of
generative models, including generative adversarial networks (GANs). The training dataset contains
5,000 composite images, 100 real samples of generated images, and 100 unused real images. The test
dataset consisted of 2,000 images generated by the same GAN model and 500 real-world images. The
competition is a dichotomous question that is evaluated using several key performance indicators:
kappa, accuracy, precision, recall, and f1. The kappa value was selected as the main indicator for this
year’s evaluation. These metrics are defined as follows:
 =  −</p>
        <p>1 − 
Accuracy =</p>
        <p>+  
  +   +   +  
Precision =</p>
        <p>+  
(6)
(7)
(8)
(9)
(10)
Recall =</p>
        <p>+  
(a)Real_used images
(b)Real_not_used images
(c)Generated images
(11)
(12)</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Experimental Results</title>
        <p>Our team systematically developed three technical approaches: a benchmark feature extraction
architecture based on ResNet50, a multi-modal feature fusion scheme combining ResNet50 (2048D) and
EficientNet (2048D), and a Vision Transformer-based encoder architecture. During the competition,
the team submitted a total of 15 sets of distinct experimental results. As shown in Table 2, the optimal
experimental outcomes of these three technical approaches are highlighted therein. Experimental data
show that although the traditional convolutional neural network method (ResNet50) has the advantage
of single-sample processing eficiency, the limitation of its local receptive field leads to insuficient
global semantic information capture, and the kappa in the task is only 0.06. In the improvement scheme,
the dual-stream feature fusion architecture improves the Top-100 retrieval accuracy to 0.552 through
the dynamic weighting strategy (ResNet confidence level 0.6, EficientNet 0.4), but the inference delay
of 306ms and the increase in video memory occupation of 89.2% significantly restrict the deployment
feasibility. Finally, the enhancement scheme based on Vision Transformer achieves the hyperparameter
performance of 0.0146 with a verification loss through the joint optimization of the self-attention
mechanism and dynamic data augmentation (the combination of color distortion and geometric transformation
with a probability threshold of 0.6), and constructs a two-stage discrimination mechanism for candidate
screening and attention matching, which reduces the false positive rate to a breakthrough level of
18% while maintaining the real-time inference speed of 132ms. After comprehensively evaluating the
model performance, resource consumption, and marginal deployment cost, we innovatively adopted
the dual-stream ViT architecture based on cross-modal attention fusion as the final solution for the
competition.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In this study, we mainly used the technology chain of ViT feature extraction-two-stage judgment
to extract the high-dimensional features of biomedical images (feature_dim=768) through the ViT
model and combined with the FAISS engine to eficiently retrieve feature similarity (IndexFlatIP +
normalize_L2), to capture the potential association between images. Then, based on the enhanced
neural network (EnhancedTraceModel), the high-similarity candidate pairs are judged twice, and
the feature interaction verification is strengthened by the attention mechanism to eliminate noise
interference. It can accurately identify potential training data "fingerprints" in synthetic biomedical
images, providing a reliable technical tool for verifying the compliance of generative models, such as
training data privacy leak detection. Follow-up work needs to further optimize feature expression and
noise robustness according to the characteristics of medical images to adapt to more complex clinical
application scenarios.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work Chat-GPT-4o and Grammarly were used to check grammar and
spelling. After using this tool, the author reviewed and edited the content as needed and takes full
responsibility for the publication’s content.</p>
    </sec>
    <sec id="sec-7">
      <title>A. Online Resources</title>
      <p>The sources for the ceur-art style are available via
• GitHub,
• Overleaf template.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Hinton</surname>
          </string-name>
          , et al.,
          <article-title>Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups</article-title>
          ,
          <source>IEEE Signal Processing Magazine</source>
          <volume>29</volume>
          (
          <year>2012</year>
          )
          <fpage>82</fpage>
          -
          <lpage>97</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Chéron</surname>
          </string-name>
          , I. Laptev,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schmid</surname>
          </string-name>
          , P-cnn:
          <article-title>Pose-based CNN features for action recognition</article-title>
          ,
          <source>in: Proceedings of the IEEE International Conference on Computer Vision</source>
          (ICCV),
          <year>2015</year>
          , pp.
          <fpage>3218</fpage>
          -
          <lpage>3226</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Greenspan</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. van Ginneken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Summers</surname>
          </string-name>
          ,
          <article-title>Guest editorial: Deep learning in medical imaging-overview and future promise of an exciting new technique</article-title>
          ,
          <source>IEEE Transactions on Medical Imaging</source>
          <volume>35</volume>
          (
          <year>2016</year>
          )
          <fpage>1153</fpage>
          -
          <lpage>1159</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Avendi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kheradvar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jafarkhani</surname>
          </string-name>
          ,
          <article-title>A combined deep-learning and deformable-model approach to fully automatic segmentation of the left ventricle in cardiac MRI</article-title>
          ,
          <source>Medical Image Analysis</source>
          <volume>30</volume>
          (
          <year>2016</year>
          )
          <fpage>108</fpage>
          -
          <lpage>119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          , et al.,
          <article-title>Patient re-identification in chest radiographs via metric learning</article-title>
          ,
          <source>Nature Communications</source>
          <volume>5</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F. C.</given-names>
            <surname>Ghesu</surname>
          </string-name>
          , et al.,
          <article-title>Anatomical fingerprinting of brain MR images, Medical Image Analysis (</article-title>
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Carlini</surname>
          </string-name>
          , et al.,
          <article-title>Extracting training data from difusion models</article-title>
          ,
          <source>in: Proceedings of the USENIX Security Symposium</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.-C.</given-names>
            <surname>Stanciu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-G.</given-names>
            <surname>Andrei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radzhabov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Prokopchuk</surname>
          </string-name>
          , Ştefan, LiviuDaniel, M.-G. Constantin,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dogariu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Damm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rückert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Ben</given-names>
            <surname>Abacha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>García Seco de Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Friedrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bloch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brüngel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Idrissi-Yaghir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schäfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M. G.</given-names>
            <surname>Pakull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bracke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pelka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Eryilmaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Becker</surname>
          </string-name>
          , W.-W. Yim,
          <string-name>
            <given-names>N.</given-names>
            <surname>Codella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Novoa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Malvehy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dimitrov</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. J. Das</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>H. M.</given-names>
          </string-name>
          <string-name>
            <surname>Shan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Nakov</surname>
            , I. Koychev,
            <given-names>S. A.</given-names>
          </string-name>
          <string-name>
            <surname>Hicks</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gautam</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Thambawita</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Halvorsen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Fabre</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macaire</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Lecouteux</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Heinrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kiesel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Stein</surname>
          </string-name>
          , Overview of imageclef 2025:
          <article-title>Multimedia retrieval in medical, social media and content recommendation applications, in: Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and
          <string-name>
            <surname>Interaction</surname>
          </string-name>
          ,
          <source>Proceedings of the 16th International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ), Springer Lecture Notes in Computer Science LNCS, Madrid, Spain,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.-G.</given-names>
            <surname>Andrei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Constantin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dogariu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radzhabov</surname>
          </string-name>
          , L.
          <string-name>
            <surname>-D. Ştefan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Prokopchuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Kovalev</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Müller</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <article-title>Overview of imageclefmedical 2025 GANs task: Training data analysis and ifngerprint detection</article-title>
          ,
          <source>in: CLEF2025 Working Notes, CEUR Workshop Proceedings</source>
          , CEUR-WS.org, Madrid, Spain,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tokozume</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ushiku</surname>
          </string-name>
          , T. Harada,
          <article-title>Between-class learning for image classification</article-title>
          ,
          <source>in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>5486</fpage>
          -
          <lpage>5494</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>I.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pouget-Abadie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mirza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Warde-Farley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ozair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>Generative adversarial networks</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>63</volume>
          (
          <year>2020</year>
          )
          <fpage>139</fpage>
          -
          <lpage>144</lpage>
          . doi:
          <volume>10</volume>
          .1145/ 3422622.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gu</surname>
          </string-name>
          , T. Dao, Mamba:
          <article-title>Linear-time sequence modeling with selective state spaces</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2312</volume>
          .
          <fpage>00752</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hameed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Shabut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>A. Hossain, Multi-class multi-level classification algorithm for skin lesions classification using machine learning techniques</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>141</volume>
          (
          <year>2020</year>
          )
          <article-title>112961</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.eswa.
          <year>2019</year>
          .
          <volume>112961</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.-Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Owens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Efros</surname>
          </string-name>
          ,
          <article-title>Cnn-generated images are surprisingly easy to spot</article-title>
          ... for now,
          <source>in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>8692</fpage>
          -
          <lpage>8701</lpage>
          . doi:
          <volume>10</volume>
          .1109/CVPR42600.
          <year>2020</year>
          .
          <volume>00872</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          , Fakeclr:
          <article-title>Exploring contrastive learning for gan-generated image detection</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>36</volume>
          ,
          <year>2022</year>
          , pp.
          <fpage>1224</fpage>
          -
          <lpage>1232</lpage>
          . doi:
          <volume>10</volume>
          .1609/aaai.v36i1.
          <fpage>20018</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>