<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Identifying the Origins of Synthetic Biomedical Images: An Ensemble Approach with Pseudo-Labeling</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Amilcare Gentili</string-name>
          <email>agentili@ucsd.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>San Diego VA Health Care System</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of California</institution>
          ,
          <addr-line>San Diego</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The increasing use of synthetic data in biomedical image analysis necessitates robust methods to ensure data provenance and detect potential links to original training datasets is prevented. This working note presents the results of SDVAHCS/UCSD team in the ImageCLEFMedical 2025 GANs Task, specifically Subtask 2: "Identify Training Data Subsets". Our approach involved training an ensemble of EficientNet models (b0, b1, and b2) using a multiclass classification framework. We explored a comprehensive hyperparameter space and employed a max voting ensembling strategy to improve prediction accuracy. Furthermore, we investigated the benefits of pseudo-labeling the unlabeled test data to augment our training set. To assess for overfitting on the validation data, we utilized a sequestered portion of the original training data to evaluate the reliability of our pseudolabeling process by comparing prediction accuracy on both datasets. Our final submissions demonstrated the efectiveness of this combined approach, with ensemble models leveraging pseudolabeled data achieving strong performance in identifying the origins of synthetic biomedical images. We discuss the implications of our findings and propose avenues for future research, including exploring alternative architectures and advanced ensembling techniques to further enhance the traceability and security of synthetic medical data.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;EficientNet</kwd>
        <kwd>Ensemble Learning</kwd>
        <kwd>Pseudo-Labeling</kwd>
        <kwd>Synthetic Biomedical Images</kwd>
        <kwd>ImageCLEF</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The development of AI systems for medical image analysis, including disease prediction, detection,
and classification, hinges on the availability of large and diverse training datasets. High-quality data
enable these models to learn intricate patterns, enhancing their accuracy and reliability. However,
the acquisition of real medical data is often restricted due to patient privacy concerns, limiting the
data available for efective AI model training and hindering advancements in healthcare applications.
Synthetic data, artificially generated to resemble real medical data without coming from actual patients,
presents a potential solution to this challenge. Generative models, such as Generative Adversarial
Networks (GANs), can produce these datasets, allowing researchers to develop and evaluate AI systems
while safeguarding patient privacy and facilitating the collection of diverse training information. A
critical concern with synthetic data is to ensure the absence of hidden links or "fingerprints" from the
real data used for its generation. The potential traceability of synthetic data back to the original patient
information poses a privacy risk. Therefore, guaranteeing that synthetic data are completely devoid of
such imprints is paramount to maintain patient confidentiality while leveraging its benefits for AI-driven
healthcare innovation. Previous iterations of this task at ImageCLEF[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] (2023 and 2024)[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] training
strategies, resulting in four distinct submissions to the competition investigated the presence of these
"fingerprints" in synthetic images generated by various models. The findings consistently revealed that
generative models retain and embed features of their training data, highlighting significant implications
for security and privacy. These results underscore the urgent need for efective methods to detect and
mitigate these imprints to ensure the privacy and utility of synthetic medical images for research and
development. The second edition further confirmed the existence of unique "fingerprints" that could
be used to attribute synthetic images to specific generative models based on identifiable patterns and
features. This working note presents the results of SDVAHCS/UCSD team in the ImageCLEFMedical
2025 GANs Task, specifically Subtask 2: "Identify Training Data Subsets". Our approach to identifying
the training data subset for each synthetic biomedical image involved training deep learning models
to perform a six-class classification (real vs. five generated subsets). We explored various model
architectures and training strategies, resulting in four distinct submissions to the competition [3].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Methods</title>
      <sec id="sec-2-1">
        <title>2.1. Data Description</title>
        <p>The benchmarking dataset for this task comprises both real and synthetic biomedical images. The real
images consist of axial slices from 3D CT scans of approximately 8,000 patients with lung tuberculosis.
These slices exhibit variability in appearance, ranging from relatively normal to displaying distinct lung
lesions, including severe cases. The real images are provided in an 8-bit per pixel PNG format with
a standardized resolution of 256x256 pixels. The synthetic images, also sized at 256x256 pixels, were
generated using various generative models, including Generative Adversarial Networks (GANs) and
Difusion Neural Networks. By providing both real and synthetic datasets, the task enables participants
to analyze and compare the characteristics of synthetic images with their real counterparts, investigating
potential "fingerprints" and patterns related to the training process. The training data set is organized
into two main folders: "generated" and "real." The "generated" folder contains five subfolders, each
holding synthetic images produced using a diferent training dataset. The "real" folder also contains
ifve corresponding subfolders, each representing a specific training data set used to train the generative
model. The real images within each of these subfolders were used to generate the synthetic images
found in the correspondingly named subfolder within the "generated" directory. The specific mapping
is as follows:
• Folder t1 (real images) → Used to generate synthetic images in gen_t1
• Folder t2 (real images) → Used to generate synthetic images in gen_t2
• Folder t3 (real images) → Used to generate synthetic images in gen_t3
• Folder t4 (real images) → Used to generate synthetic images in gen_t4
• Folder t5 (real images) → Used to generate synthetic images in gen_t5</p>
        <p>The test dataset includes 25,000 generated images, each derived from a real subgroup of images
within the training dataset. The images were divided into 6 classes, (0 for real images, 1 through 5 for
synthetic images generated from the ’t1’ through ’t5’ real subsets, respectively) See Table 1. For the first
2 submissions only the training images were utilized during training, using the classification obtained
by this training pseudolabels were assigned to the test images, and the test images were also used for
training but not validation. For the third submission, some of the training images were sequestered and
not used for training or validation, but only to check the accuracy of the model. For the final submission,
all images were used for training.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Data Preparation and Loading</title>
        <p>Standardized data loading and preprocessing were applied, including resizing images to the input size
of the chosen EficientNet variant [4], converting to tensors, and normalization.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Ensembling Strategy</title>
        <p>For ensemble submissions, the final prediction for each test image was determined by using the max
voting ensembling technique [5]</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Pseudo-Labeling and Sequestered Data</title>
        <p>Pseudolabeling was used to leverage information from the unlabeled test set. The use of a sequestered
dataset (in submission 1782) helped to verify the generalization of the pseudolabeling process.</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Model Architectures and Training Parameters</title>
        <p>We primarily utilized the EficientNet family of convolutional neural networks (specifically versions b0,
b1, and b2), pretrained on ImageNet. We conducted extensive hyperparameter tuning, exploring batch
sizes of 16, 32, 64, and 128, learning rates of 0.0001, 0.00005, and 0.0005, and training for 5, 10, 20, and
30 epochs.
2.6. Individual Submissions
• Submission 1425 (Eficientnet1_lr_0.0001_bs_32_epoch.zip): This submission represents the
results of a single EficientNet-b1 model. The specific model submitted was selected as the best
performing model after training across the aforementioned hyperparameter grid.
• Submission 1426 (EficientNet0-2ensemble.zip): This submission was an ensemble of models
based on EficientNet-b0, EficientNet-b1, and EficientNet-b2. These individual models were
trained using the same hyperparameter search space as described for submission 1425. Only
models that achieved at least 98% accuracy on the validation set were included in this ensemble. The
ifnal prediction was determined by a max voting ensembling of the individual model predictions.
• Submission 1782 (EficientNet1Pseudolabelval2Ens26Ac.zip): This submission involved an
ensemble of 26 models trained with EficientNet-b0, EficientNet-b1, and EficientNet-b2. The
models in this ensemble were trained with batch sizes of 32 and 64, learning rates of 0.0001,
0.00005, and 0.0005, and trained for 20 and 30 epochs. A key aspect was the use of pseudo-labeled
images generated from the test set, and a sequestered group of training data used to check for
overfitting. Models were initially trained, and their predictions were tested on the sequestered
group. The accuracy of these predictions on the sequestered data was compared to the accuracy
on the validation set to ensure that the pseudolabeling process was not introducing overfitting on
the validation data. Only models that achieved at least 99% accuracy on the validation set were
included in this ensemble.
• Submission 1871 (EficientNet1PseuelAlldolab99.36.zip): This submission utilized an
ensemble of models based solely on EficientNet-b1 trained on the entire original training dataset
combined with the pseudo-labeled test images. No training data was sequestered in this case.
The models in this ensemble were trained with batch sizes of 32 and 64, learning rates of 0.0001,
0.00005, and 0.0005, and trained for 20 and 30 epochs.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>The results of our submission are presented in Table 2.</p>
      <p>Run 1425 using a single model had the worst results, even if the model with the best accuracy on
validation set was used. Ensembling multiple models, Run 1426 gave a significant improvement. Adding
pseudolabeled test images, Run 1782, to the training set provided a small additional improvement.
Training on the entire training set and pseudolabeled test dataset, Run 1871, did not provide any further
improvement.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion</title>
      <p>Our experiments demonstrated the efectiveness of using pretrained EficientNet models for the task
of identifying the training data subsets of synthetic biomedical images. The high validation
accuracies achieved by individual models, particularly those included in our ensembles, suggest that these
architectures are well-suited for capturing subtle diferences between real and synthetically generated
images from diferent origins.</p>
      <p>The use of ensemble learning consistently improved performance over single models, highlighting the
benefit of combining the strengths of diferent model instances. The mode-based ensembling strategy
proved efective in aggregating the predictions.</p>
      <p>Pseudolabeling provided a valuable mechanism to leverage the information present in the unlabeled
test data. By incorporating these pseudo-labels into the training set, we were able to further refine
our models. The strategy of using a sequestered dataset to generate and validate the quality of the
pseudo-labels (as in submission 1782) was crucial in ensuring that this process contributed positively
to generalization and did not lead to overfitting on the validation set. The comparison of prediction
accuracy on the sequestered data with that on the validation set provided a useful metric for assessing
the reliability of the generated pseudolabels.</p>
      <p>The fact that adding sequestered data did not change the submission classification for the final
submission (submission 1871) suggests that the performance gains from increasing training data can
plateau if the models are already capturing the underlying data patterns efectively.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Ideas for Future Work</title>
      <p>Several avenues for future research and improvement could be explored:
• Exploring other state-of-the-art Vision Transformers (ViTs): While EficientNet models
proved efective, investigating the performance of ViTs [ 6], which have shown remarkable success
in various computer vision tasks, could yield further improvements. Their attention mechanisms
might be particularly adept at identifying subtle "fingerprints."
• Advanced Ensembling Techniques: Instead of simple vote-based ensembling, more
sophisticated techniques like weighted averaging based on validation performance, or even using a
meta-learner to combine the predictions of individual models could be investigated [7].
• Larger and More Diverse Datasets: Training on larger and more diverse datasets of both real
and synthetic biomedical images could improve the generalizability of the models.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>Our participation in this task demonstrated a successful approach to identifying the training data
subsets of synthetic biomedical images using deep learning techniques. The combination of pre-trained
EficientNet models, careful hyperparameter tuning, efective ensemble strategies, and a principled
approach to pseudolabeling allowed us to achieve competitive results. The comparison of performance
on a sequestered dataset with the validation set provided a crucial step in ensuring the reliability of our
methods. Future work focusing on exploring alternative architectures, advanced ensembling techniques,
holds the potential for further advancements in this important area of research.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Gemini and Perplexity to convert the Word
document to LaTeX format to follow the publication formatting guideline. After using these tools, the
author reviewed and edited the content as needed and takes full responsibility for the publication’s
content.
[3] A.-G. Andrei, M. G. Constantin, M. Dogariu, A. Radzhabov, L.-D. Ştefan, Y. Prokopchuk, V. Kovalev,
H. Müller, B. Ionescu, Overview of ImageCLEFmedical 2025 – GANs Task, in: CLEF2025 Working
Notes, CEUR Workshop Proceedings, CEUR-WS.org, Madrid, Spain, 2025.
[4] M. Tan, Q. Le, EficientNet: Rethinking model scaling for convolutional neural networks, in:
K. Chaudhuri, R. Salakhutdinov (Eds.), Proceedings of the 36th International Conference on Machine
Learning, volume 97 of Proceedings of Machine Learning Research, PMLR, 2019, pp. 6105–6114. URL:
https://proceedings.mlr.press/v97/tan19a.html.
[5] B. Hopkinson, A. King, D. Owen, M. Johnson-Roberson, M. Long, S. Bhandarkar, Automated
classiifcation of three-dimensional reconstructions of coral reefs using convolutional neural networks,
PLoS One 15 (2020) e0230671.
[6] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani,
M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, An image is worth 16x16 words:
Transformers for image recognition at scale, CoRR abs/2010.11929 (2020). URL: https://arxiv.org/
abs/2010.11929. arXiv:2010.11929.
[7] S. Imran, P. N, A review on ensemble machine and deep learning techniques used in the classification
of computed tomography medical images, International Journal of Health Sciences and Research 14
(2024) 201–213. URL: https://doi.org/10.52403/ijhsr.20240124. doi:10.52403/ijhsr.20240124.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.-C.</given-names>
            <surname>Stanciu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-G.</given-names>
            <surname>Andrei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radzhabov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Prokopchuk</surname>
          </string-name>
          , Ştefan, Liviu-Daniel, M.-G. Constantin,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dogariu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Damm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rückert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Ben</given-names>
            <surname>Abacha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>García Seco de Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Friedrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bloch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brüngel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Idrissi-Yaghir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schäfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M. G.</given-names>
            <surname>Pakull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bracke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pelka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Eryilmaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Becker</surname>
          </string-name>
          , W.-W. Yim,
          <string-name>
            <given-names>N.</given-names>
            <surname>Codella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Novoa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Malvehy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dimitrov</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. J. Das</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>H. M.</given-names>
          </string-name>
          <string-name>
            <surname>Shan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Nakov</surname>
            , I. Koychev,
            <given-names>S. A.</given-names>
          </string-name>
          <string-name>
            <surname>Hicks</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gautam</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Thambawita</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Halvorsen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Fabre</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macaire</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Lecouteux</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Heinrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kiesel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Stein</surname>
          </string-name>
          , Overview of imageclef 2025:
          <article-title>Multimedia retrieval in medical, social media and content recommendation applications, in: Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and
          <string-name>
            <surname>Interaction</surname>
          </string-name>
          ,
          <source>Proceedings of the 16th International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ), Springer Lecture Notes in Computer Science LNCS, Madrid, Spain,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Andrei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radzhabov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Karpenka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Prokopchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.</surname>
          </string-name>
          <article-title>M uller, Overview of 2024 imageclefmedical gans task, in: Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and
          <string-name>
            <surname>Interaction</surname>
          </string-name>
          ,
          <source>Proceedings of the 15th International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ), Springer Lecture Notes in Computer Science LNCS, Grenoble, France,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>