<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Fingerprint Identification of Generative Models Using a MultiFormer Ensemble Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Md. Ismail Siddiqi Emon</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mahmudul Hoque</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Md Rakibul Hasan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fahmi Khalifa</string-name>
          <email>fahmi.khalifa@morgan.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Md Mahmudur Rahman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, SCMNS School, Morgan State University</institution>
          ,
          <addr-line>Baltimore, Maryland 21251</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Electrical &amp; Computer Engineering Dept., School of Engineering, Morgan State University</institution>
          ,
          <addr-line>Baltimore, Maryland 21251</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <abstract>
        <p>In the ever-changing realm of medical image processing, ImageCLEF brought a new dimension with the Identifying GAN Fingerprint task, catering to the advancement of visual media analysis. This year, the author presented the task of detecting training image fingerprints to control the quality of synthetic images for the second time (as task 1) and introduced the task of detecting generative model fingerprints for the first time (as task 2). Both tasks are aimed at discerning these fingerprints from images, on both real training images and the generative models. The dataset utilized encompassed 3D CT images of lung tuberculosis patients, with the development dataset featuring a mix of real and generated images, and the test dataset. Our team 'CSMorgan' contributed several approaches, leveraging multiformer (combined feature extracted using BLIP2 and DINOv2) networks, additive and mode thresholding techniques, and late fusion methodologies, bolstered by morphological operations. In Task 1, our optimal performance was attained through a late fusion-based reranking strategy, achieving an F1 score of 0.51, while the additive average thresholding approach closely followed with a score of 0.504. In Task 2, our multiformer model garnered an impressive Adjusted Rand Index (ARI) score of 0.90, and a fine-tuned variant of the multiformer yielded a score of 0.8137. These outcomes underscore the eficacy of the multiformer-based approach in accurately discerning both real image and generative model fingerprints.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In 2004 ImageCLEF [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] was established for medical imaging which has pioneered advancements in this
ifeld. After it advanced to its second edition [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], it marked a significant evolution by the inclusion of
medical Generative Adversarial Networks (GANs) tasks. And, the first iteration of its kind came in 2023.
This task aimed to explore a specific hypothesis which was that the GANs might embed "fingerprints" of
real images within the synthetic medical images they generate. Confirming this hypothesis could have
significant implications. It might lead to a reconsideration of the copyright status of synthetic images.
This would challenge the conventional view that synthetic images are entirely artificial. In recent years,
there has been a substantial surge in the application of GANs and difusion models within the medical
domain [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3, 4, 5</xref>
        ]. These sophisticated architectures are capable of generating synthetic images and
facilitating their translation across diferent modalities. This burgeoning utilization underscores the
transformative potential of GANs and difusion models in enhancing medical imaging and diagnostics.
Medical imaging professionals have been exploring various applications of GANs in medical image
analysis, such as creating artificial medical images and distinguishing between real and fake images.
They have developed efective architectures like "Attention GAN" [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and "ABC GAN" [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] to produce
lifelike medical images, which assist with various tasks, including training AI models and protecting
patient privacy. However, despite these advancements [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ], a major challenge remains: diferentiating
between real and synthetic medical images. This is an area where scientists continue to focus their
eforts. Generative models [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12, 13</xref>
        ], a recent AI innovation, have driven significant advancements
across multiple domains [14, 15, 16], including generative medical imaging [17, 18, 19], such as authors
from [17] shows that synthesizing high-resolution images of skin lesions with Generative Adversarial
Networks (GANs) can address the lack of labeled data and skewed class distributions in skin image
analysis. Using progressive growing, they produce realistic dermoscopic images that are dificult for
expert dermatologists to distinguish from real ones, outperforming other GAN architectures like DCGAN
and LAPGAN. Synthetic biomedical images serve vital roles in research, healthcare professional training
[20], and patient care enhancement from anisotropic difusion [ 21] to AnoGAN, an unsupervised
deep convolutional GAN to identify anomalies in imaging data for disease markers, demonstrated
by accurately detecting anomalies in retinal OCT images [22]. Additionally recent advancement
demonstrates that [23] discriminating between malignant and benign lung nodules remains challenging,
necessitating CAD systems to assist radiologists. Using unsupervised learning with Deep
ConvolutionalGenerative Adversarial Networks (DC-GANs), they aim to generate realistic lung nodule samples,
hypothesizing that dificult-to-diferentiate imaging features will be highly discriminative, thereby
improving diagnostic accuracy, training radiologists, and generating realistic samples for deep network
training.They address challenges such as data scarcity, cost, and ethical considerations associated with
real patient data acquisition.
      </p>
      <p>The 2024 ImageCLEFmedical GANs Task provides a forum to investigate the influence of GANs on the
generation of artificial biomedical images, facilitating the examination of the potential advantages and
ethical concerns of their application. This includes two basic objectives: (1) how to identify fingerprints
of training data in synthetic biomedical images (inspection), and (2) how to find fingerprints of diferent
generative models on images that they are designed to generate (diferentiation). This essentially
allows researchers to compare models and highlight the characteristics, patterns, or features present in
synthetic images that would separate the models other than that they might not be alike. This dataset
includes axial slices of 3D CT images of around 8000 lung tuberculosis patients and acts as a great
resource for research.</p>
      <p>Task 1 aims to identify the real images that GANs’ images were generated from, so as to address
privacy and security concerns associated with the use of artificial images [18]. The datasets were
provided by the organizers, with a development set divided into marked artificial and real images with
training while mentioning no information on the percentage of the used and unused images from
this set is disclosed. This extensive dataset is useful to rigorously test the hypotheses and push the
community forward with discoveries in biomedical image synthesis.</p>
      <p>Task 2 is to detect the fingerprints of generative models in GANs-generated images. The authors
of ImageCLEFmed GANs argue that one way to understand this behavior is to envision that each AI
model imparts its own distinct “signature” on the images it generates. So, here, our aim is to reveal
these hidden signatures to identify what makes each model unique. It’s like reading diferently styled
handwriting from diferent authors but in the AI area. We do not simply aim to diferentiate models
but rather to gain a deeper understanding of them, by examining the hidden patterns and subtleties
contained within the synthetic images.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Datasets</title>
      <p>In the second edition of the ImageCLEFmed GANs challenge, the organizers presented two tasks:
"Identify training data fingerprints" and "Detect generative models’ fingerprints."</p>
      <p>For the first task , the hypothesis is that when images are generated using difusion models, the
original real images leave specific fingerprints in the generated models. If this hypothesis is true, it
could impose additional restrictions on publishing or sharing such images publicly, as they would be as
sensitive as the original images. Conversely, if the hypothesis is false, it could lead to a vast dataset of
artificially generated images using difusion models, potentially revolutionizing the medical imaging
ifeld.</p>
      <p>The second task, although diferent in its subjective nature, shares a similar objective: identifying
generative model fingerprints in images produced by various difusion models . The challenge is to determine
whether these models imprint unique fingerprints in the generated images. The organizers did not
disclose the number of difusion models used, but the approach remains consistent regardless of the
quantity.</p>
      <p>The author of the ImageCLEFmed GANs task provided both a development set and a training set. For
Task 1, there were two datasets consisting of axial slices of 3D CT images from approximately 8,000 lung
tuberculosis patients. The artificial slice images, sized at 256 × 256 pixels, were generated using various
undisclosed generative adversarial networks and difusion neural networks. Over 12,000 generative
images were included, and the test set contained a total of 8,000 images. For Task 2, a dataset comprising
3,000 generated image files was provided. Figure 1 depicts examples of images from the original dataset.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed Methodology</title>
      <p>Here we are going to explain the details of the approaches utilized in our submission for both tasks:
“Identify training data fingerprints” and “Detect generative models’ fingerprints”. For the task of
identifying training data fingerprints, we first performed morphological operations to reduce noise
in the CT images. This preprocessing step was crucial for improving the quality of synthetic medical
images generated by GANs. Subsequently, we implemented BLIP and DINOv2 as image signature
generators. As illustrated in Figure 2, these morphological operations helped control the quality of
the images. After preprocessing, we conducted individual feature rankings for each model and then
concatenated the feature rankings from both models. We also performed dimensionality reduction on
the concatenated features to enhance the ranking process. Finally, we applied late fusion to combine
the results from the previous steps, optimizing the identification of training data fingerprints.</p>
      <sec id="sec-3-1">
        <title>3.1. Morphological operations</title>
        <p>During image processing image opening is widely utilized for efective morphological operation. It
primarily aimed to remove small objects from 3D axial CT images. But at the same time preserves the
size and shape of larger structures. Which is an efective noise-releasing mechanism. As illustrated in
Figure 3, electronic noise in CT images often originates from the combination of the detector system and
the reconstruction kernel, with sharper kernels typically resulting in noisier images. According to [24],
this noise is a consequence of eforts to enhance image quality without increasing the radiation dose.
Our hypothesis is that applying image opening will eliminate small, noisy, and irrelevant details, thereby
potentially enhancing the eficiency of fingerprint detection in medical images. By preserving key
features, image opening maintains the integrity of essential image details, ensuring that the important
structures remain intact while the noise is reduced.</p>
        <p>Image opening consists of two main steps erosion (Eq. 1) followed by dilation (Eq. 2). Image opening
(3) can be represented as:
 ⊖  = { ∈ | ⊆ }
 ⊕  = ⋃︁</p>
        <p>∈
 ∘  = ( ⊖ ) ⊕ 
(1)
(2)
(3)</p>
        <p>Here in equation (1) above image erosion has been performed, where  and  are input image and
structuring elements respectively. And  is the translation of  by . Then in equation (2), image
dilation is performed where  is the translation of  by . In the equation (3), # is the image opening
operation.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Multiformer</title>
        <p>To build a multiformer we have chosen two foundation models which are BLIP and DINOv2 as backbone
architecture, see Fig. 4, for which details are given below.</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. BLIP Architecture</title>
          <p>Bootstrapping Language-Image Pre-training 2 (BLIP2) [25, 26] utilize a Visual Transformer (ViT) [27]
for image encoding, dividing images into patches and encoding them into a sequence of embeddings
with a [CLS] token representing the global image feature. This method is computationally eficient
compared to traditional object detectors. BLIP is a multimodal Mixture of Encoder-Decoder (MED),
operates in three modes: unimodal encoder [28] (similar to BERT for text), image-grounded text
encoder (incorporates visual information via cross-attention layers), and image-grounded text decoder
(uses causal self-attention layers for text generation). During pre-training, we jointly optimize three
objectives: two for understanding and one for generation. Each image-text pair undergoes one forward
pass through the ViT and three through the text transformer for diferent tasks. The Image-Text
Contrastive Loss (ITC) inspired from [29, 30] to align visual and textual feature spaces, improving
vision-language understanding by encouraging positive image-text pairs to have similar representations.
The Image-Text Matching Loss (ITM) [30] learns fine-grained multimodal representations through a
binary classification task to determine whether image-text pairs are matched, utilizing a hard negative
mining strategy. The Language Modeling Loss (LM) trains the model to generate textual descriptions
from images using cross-entropy loss, enhancing the model’s capability to convert visual information
into coherent captions. To maximize eficiency and leverage multi-task learning, the text encoder and
decoder share parameters except for the self-attention layers, which capture the diferences between
encoding and decoding tasks. This shared architecture benefits from improved training eficiency and
efective multi-task learning.</p>
          <p>The BLIP2 feature extractor is a multi-modal model designed to extract and integrate features from
both images and text. It begins with patch embedding for images, converting each image  into patches
 using a convolutional layer,  = Conv(). These patches are then processed through a transformer
encoder,  = Transformer( ), to capture visual relationships. Image data  is tokenized and
embedded into high-dimensional space,  = Embedding(), and passed through another transformer
encoder,  = Transformer(). The cross-modal attention mechanisms enhance its performance
to state of the art,  = CrossAttention(, ), are used to align and integrate visual features. The
model is pre-trained on large datasets with paired images and text to learn these representations. The
pre-trained model can be fine-tuned for downstream tasks to improve performance on specific datasets.
The BLIP2 feature extractor thus provides robust, high-level features suitable for tasks such as image
classification, object detection, and text-image matching.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. DINOv2 Architecture</title>
          <p>DINOv2 [31] is an enhanced version of DINO [32], integrating various improvements and using a larger,
more diverse dataset to accelerate and stabilize training at scale. It utilizes the LVD-142M curated
dataset, which includes data from sources like ImageNet [33], Google Landmarks [34], Mapillary SLS
[35], and Food-101 [36]. The dataset is de-duplicated using FAISS [37] batch searches and embeddings.
Training combines DINO and iBOT losses with SwAV centering, involving a learnable student and
an EMA teacher. Key techniques include multi-crop cross-entropy for global image representation,
patch-level masking, and separate weights for image and patch objectives. The teacher’s
softmaxcentering is replaced by Sinkhorn-Knopp batch normalization, and the KoLeo regularizer ensures batch
uniformity. High-resolution images are used towards the end of pre-training. DINOv2’s implementation
features FlashAttention [38, 39] for eficiency, nested tensors from xFormers, stochastic depth, and
mixed-precision PyTorch FSDP. Distillation [40, 41] into smaller models from a larger teacher model
is also included. Ablation studies cover model selection, data curation strategies, model scaling, and
loss objectives. DINOv2 achieves results comparable to weakly supervised text models like
EVACLIP on ImageNet and performs better than SSL methods like Mugs, EsViT, and iBOT. It shows
strong performance in domain generalization, image and video classification, instance recognition,
and semantic segmentation. For depth estimation, DINOv2 uses a linear layer on frozen tokens and
experiments with concatenation of ViT [27] layers/blocks and regression over a DPT decoder, achieving
superior results on datasets like NYUd, KITTI, and SUN-RGBd. Qualitative results demonstrate efective
semantic matching and foreground extraction using PCA on patch features.</p>
          <p>We utilized the DINO Vision Transformer because it is a sophisticated image processing model
capable of handling and analyzing images through a systematic series of steps. Below is a description
of how its architecture operates. Initially, the input image is divided into smaller patches, which are
then embedded into a higher-dimensional space using a convolutional layer. This embedding captures
the local features of each patch. These embedded patches pass through multiple nested tensor blocks,
each consisting of several components: layer normalization to standardize inputs, memory-eficient
attention mechanisms to focus on diferent parts of the image, and multi-layer perceptrons for feature
refinement. Each block also includes layer scaling and dropout layers to improve training stability and
prevent overfitting. The model incorporates 12 of these blocks, creating a deep network capable of
learning complex image representations. After processing through all blocks, a final normalization
layer is applied, followed by an identity layer that prepares the features for subsequent tasks. This
architecture allows the DINO ViT to efectively learn and process detailed image features, making it
suitable for various image processing applications.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Autoencoder for feature reduction</title>
        <p>Initially, we employed raw features for clustering to assign labels to the test set. Subsequently, we
integrated robust features from both the BLIP and DINOv2 models. To enhance the clustering results,
we implemented an Autoencoder to encode this extensive feature set into a more meaningful and
reduced representation. This dimensionality reduction facilitated improved clustering performance.
Autoencoders (AEs) have been quite popularised in AI research over recent years and consequently,
there have been many studies and advancements made in this subject, There are mainly two types,
they are Fundamental AEs And Variants. The simplest ones, as used in our working example, one
can find, are auto-associative neural networks, connected to a multi-layer perceptron, where input
is being reconstructed. The encoders consist of an encoder that compresses an input vector with
recognition weights into a code vector and a decoder that decompresses the input vector from a code
vector with generative weights. This structure allows each layer of a deep network to be trained
separately using the basic AE as a building block. The encoder activation function (sf) may be, for
example, sigmoid or hyperbolic tangent, a weight matrix (W) is a bias vector (b), and x Wine are used
in the computation of the hidden representation of an input vector x with a  = Θ() =  (  + ),
where  is a weight matrix,  is a bias vector, and  is the encoder activation function (e.g., sigmoid
or hyperbolic tangent). The hidden representation  is then decoded back to a reconstruction vector
 using  = Θ() = ( ′ + ′), where sg is the activation function of the decoder. The aim is
to make the reconstruction( z ) as close to the input( x ) as possible. To simplify training, the weight
matrix  ′ is often constrained to be the transpose of  (tied weights), reducing the number of free
parameters. This mapping process ensures that each input is transformed into a hidden representation
and then reconstructed, enabling efective learning and data representation.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Result Analysis</title>
      <sec id="sec-4-1">
        <title>4.1. System Specification and Parameter Settings</title>
        <p>The proposed model is implemented on a Google Cloud Vertex AI instance, utilizing a NVIDIA V100
Tensor Core GPU. This GPU boasts 5120 CUDA cores, 640 Tensor cores, up to 32 GB of HBM2 memory,
and a memory bandwidth reaching up to 900 GB/s. These specifications allow for highly eficient
processing and accelerated computation, which are essential for handling the complex tasks and large
datasets involved in our model’s execution.</p>
        <p>For the BLIP architecture, the parameters used make it better in feature extraction. The input image
size was normalized at 256 × 256 pixels, followed by data augmentations of random cropping, horizontal
lfipping with probability  = 0.5, rotation within ± 20∘ , and color jittering with brightness adjustment
factors [0.75, 1.25]. Feature Extraction used the pre-trained ViT large model as the backbone. The value
chosen for the learning rate  was 0.0005 and a batch size  of 16 was employed. The AdamW optimizer
where weight decay factor was added was used for loss function minimization ( ) Normalization
layers were used for scaling the features within the same scale of any particular signal for the dataset.</p>
        <p>For Dinov2 architecture, a number of key parameters were defined to maximally optimize their
performance. The model was pre-trained on a large companys own well curated set of images, making
it a great base for later fine-tuning. The size of the input image was kept uniform during training,
224 × 224 pixels, where several data augmentations like center cropping, random afine transformations,
resizing, and normalizing, were performed. We used a learning rate  of 0.001 and batch size (B) of
32. Adam optimizer with  1 = 0.9 and  2 = 0.999 was used to minimize the loss function ( )
where  denotes the model parameters. We applied a dropout with a probability of p = 0.5 and layer
normalization to prevent overfitting and maintain training stability.</p>
        <p>For the Dinov2, BLIP we then tried to tune these parameters carefully to get as high performance
as possible to identify generative model fingerprintfs and test image fingerprints. I executed these
procedures while training the item co-occurrence model with data augmentation, learning rates, and
optimization technique and achieved high quality and stable item embeddings.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Identify Training Data Fingerprints Experiments</title>
        <p>For identifying training data fingerprints, we first reduced noise in the CT images through morphological
operations. We then used BLIP and DINOv2 as image signature generators. As shown in Figure 2,
these steps improved the quality of synthetic medical images generated by GANs. We ranked features
individually and after concatenation, performed dimensionality reduction, and used late fusion to refine
our fingerprint identification results.</p>
        <p>For submission 1, we implemented the "additive mode thresholding" technique, which considers
local variations in image intensity to enhance image processing. First, we reduced the dimension of
the feature vector using Principal Component Analysis (PCA). We then combined all the features and
weighted them by the total. The mode of this final weighted result was used as the threshold. For the
test images, we applied a similar weighting approach: if the weighted value was less than the mode, the
image was tagged as not used; otherwise, it was tagged as used. This method allowed us to account for
local intensity variations, improving the accuracy of our thresholding.</p>
        <p>In submission 2, which we titled "additive average thresholding," we took a diferent approach. We
calculated the final result for each subject and then averaged these results across all subjects. This
average became the threshold value for classification. By using the average, we aimed to create a more
generalized threshold that could efectively classify the images based on the overall distribution of the
data.</p>
        <p>For submission 3, we used an encoder model to handle the extensive feature set generated by the
backbone models. The encoder compressed this concatenated feature set, reducing its dimensionality.
With the reduced feature set, we applied both mode and mean thresholding techniques. This dual
approach allowed us to leverage the strengths of both thresholding methods, providing a robust
classification mechanism.</p>
        <p>In submission 5, we employed a late fusion strategy to combine the decisions from the previous four
methods. Late fusion involves aggregating the results at the decision level rather than at the feature
level. We used majority voting to finalize the classification, ensuring that the combined decisions of the
diferent methods provided a more accurate and reliable result. This ensemble approach helped to
mitigate the weaknesses of individual methods and improved the overall performance of our classification
system.</p>
        <p>For the final submission 6, we performed reranking using the Agglomerative Clustering algorithm.
This algorithm conducts hierarchical clustering with a bottom-up approach, allowing us to specify
parameters such as the number of clusters, distance metric, and linkage criterion. The reranking was
based on decisions from the previous submissions.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Identify Generative Model Fingerprints Experiemnts</title>
        <p>To identify generative models’ fingerprints, we initially reduced noise in the CT images using
morphological operations. We then employed the pre-trained BLIP2 and DINOv2 architectures for feature
extraction. As illustrated in Figure 2, our objective was to accurately label each subject with the
corresponding model number, determining which generative or difusion model produced each image.</p>
        <p>For submissions 1 and 2, we utilized a combination of feature sets from BLIP and DINOv2 titled
’multiformer’ architecture with diferent augmentation techniques. In submission 1, we applied center
cropping and random afine transformations, along with resizing, normalizing, and other standard
setups. In submission 2, we used random cropping, random horizontal flipping, random rotation,
and color jittering. These augmentations introduced variations in size, orientation, and brightness
to the training dataset, enhancing the model’s robustness and accuracy. We then used k-means and
agglomerative clustering to assign labels to each subject.</p>
        <p>Submissions 3 and 4 followed a similar feature extraction method, but we applied PCA and autoencoder
was applied for combined feature dimensionality reduction, respectively. The same clustering algorithms
were then used for label assignment.</p>
        <p>In submissions 5 and 6, we leveraged solely the BLIP architecture. Submission 5 used a BLIP base
model for feature extraction, while submission 6 utilized the BLIP pre-trained ViT large model. The
normalized feature sets were subsequently fed into clustering algorithms for labeling.</p>
        <p>For submissions 7 and 8, we performed ensemble voting and reranking based on the decisions from
previous submissions. Ensemble voting combined results at the decision level rather than the feature
level, employing majority voting to determine the final classification, ensuring a more accurate and
reliable outcome. For reranking, we applied Density-Based Spatial Clustering (DBSCAN). This algorithm
identifies clusters by ensuring each point within a cluster has a neighborhood defined by a specified
radius, containing at least a minimum number of points, thereby separating dense regions from areas
with fewer points.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Results and Discussions</title>
        <p>The results presented in Table 1 show the metrics for datasets 1 and 2, referred to as db1 and db2, for the
task of identifying training data fingerprints. Initially, we applied morphological opening operations,
which first erode the image and then dilate the eroded image using the same structuring element.
After performing these operations, we passed the images to our vision transformer models for further
processing.</p>
        <p>Submission ID
1
2
3
4
5
6</p>
        <p>Submission 1 utilized the Dinov2 model with additive mode thresholding. This model employs
task-agnostic and cognitive approaches through self-supervised learning and features a multipurpose
backbone derived from a pre-trained, extensive, and well-curated image set, powered by a visual
transformer model. Additive mode thresholding was then used to assign labels to each image. This
method achieved (shown in Table 1) an accuracy and precision of 0.5025 for dataset 1, and an accuracy
and F1 score of 0.491 for dataset 2. As shown in Table 2 the overall accuracy was 0.492.
Submission 2 used the Blip architecture with additive average thresholding. This model employs
an unimodal decoder to extract pattern signatures from images. Subsequently, additive average
thresholding was applied to assign labels to each image. This approach resulted in an accuracy and
precision of 0.50425 for dataset 1, and an accuracy and F1 score of 0.4995 for dataset 2. As shown in
2.As shown in Table 2 the overall accuracy for submission 2 was 0.501875.</p>
        <p>Submissions 3 and 4 are based on the concatenated multiformer feature fusion. We subsequently
applied PCA and autoencoder for dimensionality reduction to both feature sets derived by the Dinov2
and Blip models. This approach yielded our best results, with submission 4 achieving an accuracy
and precision of 0.49575 for dataset 1. For dataset 2, the highest accuracy and F1 score of 0.5005 were
attained, making it the best among these submissions. As shown in Table 2 the overall accuracy for
submissions 3 and 4 was 0.496357 and 0.4957 respectively.</p>
        <p>In the final two submissions, 5 and 6, we employed a reranking technique. Submission 6 provided the
highest accuracy for dataset 1, reaching 0.51. Meanwhile, for dataset 2, submission 5 achieved the best
accuracy at 0.506. As shown in Table 2 the overall accuracy for submissions 5 and 6 was 0.500875 and
0.502625 respectively.</p>
        <p>
          Moreover for Task 2, we investigated the intriguing notion that generative models might leave distinct
marks on the images they create. Our goal was to determine whether diferent models have unique
"fingerprints" within the synthetic images they produce. By closely examining these images, we aimed
to uncover the specific characteristics that define each model’s output. This time the author of the
ImageCLEFmed GANs [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] provided results are represented by the Adjusted Rand Index (ARI), which
measures the similarity between two clusters. The ARI improves upon the Rand Index by considering
the likelihood of chance agreements between clusters, thus enhancing its reliability.
        </p>
        <p>ARI =</p>
        <p>∑︀ ︀( 2)︀ − [︁∑︀ (︀ 2.)︀ ∑︀ (︀ 2.)︀ ]︁ /(︀ 2)︀
12 [︁∑︀ (︀ 2.)︀ + ∑︀ (︀ 2.)︀ ]︁ −
[︁∑︀ (︀ 2.)︀ ∑︀ (︀ 2.)︀ ]︁ /(︀ 2)︀
(4)
As shown in Eqn (4) we can see that the ARI is calculated with the formula: ARI = MaIxndInexd−exE−xEpexcpteecdteIdndInexdex .
Here, the Index represents the raw agreement index, which counts pairs of elements that are either in
the same or diferent clusters in both the true and predicted clusterings. The Expected Index accounts
for the expected value of the raw index if the cluster assignments were random, while the Max Index
is the maximum value of the raw index, indicating perfect clustering. The ARI ranges from -1 to 1,
where 1 signifies perfect agreement, 0 indicates a random clustering, and -1 suggests no agreement. The
formula leverages a contingency table and binomial coeficients to adjust the Rand Index for change,
providing a more accurate measure of clustering similarity.</p>
        <p>Among our total of eight submissions shown in Table 3 Submission 2 stood out with the highest ARI
score of 0.900, demonstrating a strong agreement between the predicted clustering and the ground truth,
and showcasing its exceptional performance in data clustering. In contrast, Sub5 and Sub6 recorded
very low ARI scores of 0.001 and 0.002, respectively, indicating poor alignment between the predicted
clusters and the actual data.</p>
        <p>Our analysis of the results revealed a spectrum of scores for each submission, illustrating how well
they matched the real data. While submissions like Sub2 performed exceptionally well, others such as
Sub5 and Sub6 fell short of expectations. Overall, our findings highlight the distinctive marks generative
models leave on the images they produce, which could aid in recognizing and attributing these images
to specific models in the future.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In the dynamic field of medical image processing, ImageCLEFmed GANs has launched a pioneering
initiative with the Identifying GAN Fingerprint task, alongside the task of detecting generative model
ifngerprints. In this paper, we tackled the first task by proposing six approaches, utilizing additive
thresholding, autoencoder, and reranking techniques to classify images as used or not used for generating
synthetic images. To address the task of detecting generative models’ fingerprints, we implemented
eight approaches involving Dinov2, Blip, and ensemble feature fusion. These findings highlight the
significance of our eforts in advancing medical image analysis techniques. Moving forward, we plan to
apply advanced noise-removing techniques to leverage pixel-level connectivity. Additionally, we aim to
develop a unified framework that integrates classical and transformer-based architectures to enhance
our ability to detect imprint signatures.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgments</title>
      <p>This work was supported by the National Science Foundation (NSF) grant (ID. 2131307) “CISE-MSI: DP:
IIS: III: Deep Learning-Based Automated Concept and Caption Generation of Medical Images Towards
Developing an Efective Decision Support”.
 -vae: Curvature regularized variational autoencoders for uncovering emergent low dimensional
geometric structure in high dimensional data, 2024. arXiv:2403.01078.
[13] D. Ahn, H. Cho, J. Min, W. Jang, J. Kim, S. Kim, H. H. Park, K. H. Jin, S. Kim, Self-rectifying difusion
sampling with perturbed-attention guidance, 2024. arXiv:2403.17377.
[14] Y. Skandarani, P.-M. Jodoin, A. Lalande, Gans for medical image synthesis: An empirical study,
2021. arXiv:2105.05318.
[15] W. Ahmad, H. Ali, Z. Shah, S. Azmat, A new generative adversarial network for medical images
super resolution, Scientific Reports 12 (2022). URL: http://dx.doi.org/10.1038/s41598-022-13658-4.
doi:10.1038/s41598-022-13658-4.
[16] W. Zhang, W.-K. Cham, Hallucinating face in the dct domain, IEEE Transactions on Image</p>
      <p>Processing 20 (2011) 2769–2779. doi:10.1109/TIP.2011.2142001.
[17] C. Baur, S. Albarqouni, N. Navab, Generating highly realistic images of skin lesions with gans,
in: OR 2.0 Context-Aware Operating Theaters, Computer Assisted Robotic Endoscopy, Clinical
Image-Based Procedures, and Skin Image Analysis: First International Workshop, OR 2.0 2018, 5th
International Workshop, CARE 2018, 7th International Workshop, CLIP 2018, Third International
Workshop, ISIC 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16 and
20, 2018, Proceedings 5, Springer, 2018, pp. 260–267.
[18] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio,</p>
      <p>Generative adversarial nets, Advances in neural information processing systems 27 (2014).
[19] A. F. Frangi, S. A. Tsaftaris, J. L. Prince, Simulation and synthesis in medical imaging, IEEE
transactions on medical imaging 37 (2018) 673–679.
[20] C. Wang, G. Yang, G. Papanastasiou, S. Tsaftaris, D. Newby, C. Gray, G. Macnaught, T. MacGillivray,
Dicyc: Gan-based deformation invariant cross-domain information fusion for medical image
synthesis, Information Fusion 67 (2021) 147–160. doi:10.1016/j.inffus.2020.10.015.
[21] P. Perona, J. Malik, Scale-space and edge detection using anisotropic difusion, IEEE Transactions
on pattern analysis and machine intelligence 12 (1990) 629–639.
[22] T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, G. Langs, Unsupervised anomaly
detection with generative adversarial networks to guide marker discovery, in: International
conference on information processing in medical imaging, Springer, 2017, pp. 146–157.
[23] M. J. Chuquicusma, S. Hussein, J. Burt, U. Bagci, How to fool radiologists with generative
adversarial networks? a visual turing test for lung cancer diagnosis, in: 2018 IEEE 15th international
symposium on biomedical imaging (ISBI 2018), IEEE, 2018, pp. 240–244.
[24] M. Diwakar, M. Kumar, Ct image noise reduction based on adaptive wiener filtering with wavelet
packet thresholding, 2014 International Conference on Parallel, Distributed and Grid Computing
(2014) 94–98. URL: https://api.semanticscholar.org/CorpusID:15070114.
[25] J. Li, D. Li, C. Xiong, S. C. H. Hoi, Blip: Bootstrapping language-image pre-training for unified
vision-language understanding and generation, in: International Conference on Machine Learning,
2022. URL: https://api.semanticscholar.org/CorpusID:246411402.
[26] J. Li, D. Li, S. Savarese, S. Hoi, Blip-2: Bootstrapping language-image pre-training with frozen
image encoders and large language models, 2023. arXiv:2301.12597.
[27] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani,
M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, An image is worth 16x16 words:
Transformers for image recognition at scale, 2021. arXiv:2010.11929.
[28] J. Devlin, M. Chang, K. Lee, K. Toutanova, Pre-training of deep bidirectional transformers for
language understanding, Minneapolis, MN: Association for Computational Linguistics (2019)
4171–4186.
[29] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell,
P. Mishkin, J. Clark, G. Krueger, I. Sutskever, Learning transferable visual models from
natural language supervision, in: International Conference on Machine Learning, 2021. URL:
https://api.semanticscholar.org/CorpusID:231591445.
[30] J. Li, R. R. Selvaraju, A. D. Gotmare, S. R. Joty, C. Xiong, S. C. H. Hoi, Align before fuse: Vision and
language representation learning with momentum distillation, in: Neural Information Processing
Systems, 2021. URL: https://api.semanticscholar.org/CorpusID:236034189.
[31] M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza,
F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P.-Y. Huang, S.-W. Li, I. Misra,
M. Rabbat, V. Sharma, G. Synnaeve, H. Xu, H. Jegou, J. Mairal, P. Labatut, A. Joulin, P. Bojanowski,
Dinov2: Learning robust visual features without supervision, 2024. arXiv:2304.07193.
[32] M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, A. Joulin, Emerging properties
in self-supervised vision transformers, 2021. arXiv:2104.14294.
[33] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: A large-scale hierarchical image
database, in: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on,
IEEE, 2009, pp. 248–255. URL: https://ieeexplore.ieee.org/abstract/document/5206848/.
[34] T. Weyand, A. Araujo, B. Cao, J. Sim, Google landmarks dataset v2 – a large-scale benchmark for
instance-level recognition and retrieval, 2020. arXiv:2004.01804.
[35] F. Warburg, S. Hauberg, M. Lopez-Antequera, P. Gargallo, Y. Kuang, J. Civera, Mapillary
street-level sequences: A dataset for lifelong place recognition, in: Proceedings of IEEE
Conference on Computer Vision and Pattern Recognition 2020, IEEE, United States, 2020. URL:
http://cvpr2020.thecvf.com/,https://ieeexplore.ieee.org/xpl/conhome/9142308/proceeding, 2020
IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 ; Conference date:
14-06-2020 Through 19-06-2020.
[36] L. Bossard, M. Guillaumin, L. Van Gool, Food-101 – mining discriminative components with
random forests, in: European Conference on Computer Vision, 2014.
[37] M. Douze, A. Guzhva, C. Deng, J. Johnson, G. Szilvasy, P.-E. Mazaré, M. Lomeli, L. Hosseini,</p>
      <p>H. Jégou, The faiss library, 2024. arXiv:2401.08281.
[38] T. Dao, D. Y. Fu, S. Ermon, A. Rudra, C. Ré, Flashattention: Fast and memory-eficient exact
attention with io-awareness, 2022. arXiv:2205.14135.
[39] T. Dao, Flashattention-2: Faster attention with better parallelism and work partitioning, 2023.</p>
      <p>arXiv:2307.08691.
[40] N. Reimers, I. Gurevych, Making monolingual sentence embeddings multilingual using knowledge
distillation, 2020. arXiv:2004.09813.
[41] N. Brown, A. Williamson, T. Anderson, L. Lawrence, Eficient transformer knowledge distillation:
A performance review, 2023. arXiv:2311.13657.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Drăgulinescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rückert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Ben</given-names>
            <surname>Abacha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garcıa Seco de Herrera</surname>
          </string-name>
          , L. Bloch,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brüngel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Idrissi-Yaghir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schäfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Pakull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Damm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bracke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Friedrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Andrei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Prokopchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Karpenka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radzhabov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Macaire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schwab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lecouteux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Esperança-Rodier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yetisgen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Hicks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Riegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Thambawita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Storås</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Halvorsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Heinrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kiesel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          , Overview of ImageCLEF 2024:
          <article-title>Multimedia retrieval in medical applications, in: Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and
          <string-name>
            <surname>Interaction</surname>
          </string-name>
          ,
          <source>Proceedings of the 15th International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ), Springer Lecture Notes in Computer Science LNCS, Grenoble, France,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Andrei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radzhabov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Karpenka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Prokopchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <surname>Overview of 2024 ImageCLEFmedical GANs Task - Investigating Generative</surname>
          </string-name>
          Models'
          <article-title>Impact on Biomedical Synthetic Images</article-title>
          , in: CLEF2024 Working Notes, CEUR Workshop Proceedings, CEUR-WS.org, Grenoble, France,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Xun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , P. Huang,
          <article-title>Generative adversarial networks in medical image segmentation: A review</article-title>
          ,
          <source>Computers in Biology and Medicine</source>
          <volume>140</volume>
          (
          <year>2022</year>
          )
          <article-title>105063</article-title>
          . URL: https://www.sciencedirect.com/science/article/ pii/S001048252100857X. doi:https://doi.org/10.1016/j.compbiomed.
          <year>2021</year>
          .
          <volume>105063</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Anantatamukala</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. Krishna</surname>
            ,
            <given-names>N. B.</given-names>
          </string-name>
          <string-name>
            <surname>Dahotre</surname>
          </string-name>
          ,
          <article-title>Generative adversarial networks assisted machine learning based automated quantification of grain size from scanning electron microscope back scatter images</article-title>
          ,
          <source>Materials Characterization</source>
          <volume>206</volume>
          (
          <year>2023</year>
          )
          <article-title>113396</article-title>
          . URL: https://www. sciencedirect.com/science/article/pii/S1044580323007556. doi:https://doi.org/10.1016/j. matchar.
          <year>2023</year>
          .
          <volume>113396</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Makhlouf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maayah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Abughanam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Catal</surname>
          </string-name>
          ,
          <article-title>The use of generative adversarial networks in medical image augmentation</article-title>
          ,
          <source>Neural Comput. Appl</source>
          .
          <volume>35</volume>
          (
          <year>2023</year>
          )
          <fpage>24055</fpage>
          -
          <lpage>24068</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. H. S.</given-names>
            <surname>Torr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sebe</surname>
          </string-name>
          ,
          <article-title>Attentiongan: Unpaired image-to-image translation using attention-guided generative adversarial networks</article-title>
          , CoRR abs/
          <year>1911</year>
          .11897 (
          <year>2019</year>
          ). URL: http: //arxiv.org/abs/
          <year>1911</year>
          .11897. arXiv:
          <year>1911</year>
          .11897.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Mindlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schilling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          , Abc-gan:
          <article-title>Spatially constrained counterfactual generation for image classification explanations</article-title>
          , in: L.
          <string-name>
            <surname>Longo</surname>
          </string-name>
          (Ed.),
          <source>Explainable Artificial Intelligence</source>
          , Springer Nature Switzerland, Cham,
          <year>2023</year>
          , pp.
          <fpage>260</fpage>
          -
          <lpage>282</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Andrei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radzhabov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Coman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          , Overview of ImageCLEFmedical GANs 2023 task
          <article-title>- Identifying Training Data "Fingerprints" in Synthetic Biomedical Images Generated by GANs for Medical Image Security</article-title>
          , in: CLEF2023 Working Notes, CEUR Workshop Proceedings, CEUR-WS.org, Thessaloniki, Greece,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Drăgulinescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Ben</given-names>
            <surname>Abacha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Snider</surname>
          </string-name>
          , G. Adams,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yetisgen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rückert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garcıa Seco de Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Friedrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bloch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brüngel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Idrissi-Yaghir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schäfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Hicks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Riegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Thambawita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Storås</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Halvorsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J. A. A. A. R. I. C. V. K. A. S. G. I. Nikolaos</given-names>
            <surname>Papachrysos</surname>
          </string-name>
          , Johanna Schöler,
          <string-name>
            <given-names>H.</given-names>
            <surname>Manguinhas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ştefan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Constantin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dogariu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Deshayes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Popescu</surname>
          </string-name>
          , Overview of ImageCLEF 2023:
          <article-title>Multimedia retrieval in medical, socialmedia and recommender systems applications</article-title>
          , in: Experimental IR Meets Multilinguality, Multimodality, and
          <string-name>
            <surname>Interaction</surname>
          </string-name>
          ,
          <source>Proceedings of the 14th International Conference of the CLEF Association (CLEF</source>
          <year>2023</year>
          ), Springer Lecture Notes in Computer Science LNCS, Thessaloniki, Greece,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Jang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-K.</given-names>
            <surname>Noh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. C.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <article-title>Geometrically regularized autoencoders for non-euclidean data</article-title>
          ,
          <source>in: The Eleventh International Conference on Learning Representations</source>
          ,
          <year>2023</year>
          . URL: https: //openreview.net/forum?id=_q7A0m3vXH0.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Rhodes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bhattacharjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. D.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Learning from demonstration using a curvature regularized variational auto-encoder (curvvae)</article-title>
          ,
          <source>in: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>10795</fpage>
          -
          <lpage>10800</lpage>
          . doi:
          <volume>10</volume>
          .1109/IROS47612.
          <year>2022</year>
          .
          <volume>9981930</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J. Z.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Perrin-Gilbert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Narmanli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Klein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Myers</surname>
          </string-name>
          , I. Cohen,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Waterfall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Sethna</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>