<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Classification of Real and Generated Images based on Feature Similarity Notebook for ImageCLEF Lab at CLEF 2024</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Huihui</forename><surname>Tang</surname></persName>
							<email>tanghh@gxi.gov.cn</email>
							<affiliation key="aff0">
								<orgName type="laboratory">Guangxi Key Laboratory of Digital Infrastructure</orgName>
								<orgName type="institution">Guangxi Zhuang Autonomous Region Information Center</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Hancheng</forename><surname>Wang</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">Guangxi Key Laboratory of Digital Infrastructure</orgName>
								<orgName type="institution">Guangxi Zhuang Autonomous Region Information Center</orgName>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">GUANGXI BEITOU IT INNOVATION TECHNOLOGY INVESTMENT GROUP CO.LTD</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jining</forename><surname>Chen</surname></persName>
							<email>chenjn@gxi.gov.cn</email>
							<affiliation key="aff0">
								<orgName type="laboratory">Guangxi Key Laboratory of Digital Infrastructure</orgName>
								<orgName type="institution">Guangxi Zhuang Autonomous Region Information Center</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Classification of Real and Generated Images based on Feature Similarity Notebook for ImageCLEF Lab at CLEF 2024</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">9E6ABA2CBC25D5E4371DA8EA2FA861B8</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:53+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>GANs</term>
					<term>Pre-trained Model</term>
					<term>Feature Extraction</term>
					<term>Similarity Calculation</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Deceptive images can be shared on social network services within seconds, posing a significant risk. In the application and research of artificial intelligence on medical images, data issues have always been a challenge, including insufficient amounts of medical image data and privacy concerns. Currently, generative models such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and diffusion models have achieved remarkable results, generating high-quality images. Using generative models for the generation of medical images is a research focus. The essence of generative models is to learn the distribution of data, not to create something out of nothing. Therefore, we investigate whether the real data can be identified from the generated data. In this paper, based on similarity calculations, we calculate the similarity between original images and images with added perturbations. We use self-supervised Masked Autoencoders to reconstruct the images and thus achieve feature similarity calculation. By judging the similarity, we can identify the real images used for training the generative model. Our experimental results on the validation set show an accuracy of 0.743, a precision of 0.721, a recall of 0.720, and an F1 score of 0.726; the F1 score on the test set is 0.603. The experimental results indicate that generated images also pose a threat to patient privacy.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Currently, artificial intelligence has numerous hot topics in medical research, including studies on medical imaging, medical image classification, detection, and segmentation tasks <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b2">3]</ref>. However, these tasks face a common challenge: the lack of data. Initially, data augmentation <ref type="bibr" target="#b3">[4]</ref> was used as a solution, transforming small amounts of data into larger datasets through operations such as flipping and cropping. This approach relies on the translation invariance of convolutional neural networks. However, data augmentation only addresses superficial issues, and in some tasks, the increased quantity is still insufficient. Additionally, the amount of data often determines the upper limit of the model. Large models require vast amounts of data for training, potentially exceeding tens of billions of data points. As models become larger, the demand for data also increases; otherwise, the models either overfit to noise or fail to fully utilize their capabilities.</p><p>Medical data, in particular, presents complex challenges due to privacy concerns and the difficulty of annotation, making data acquisition difficult and resulting in consistently small datasets. The advent of generative models represents a technological breakthrough, capable of creating large amounts of highquality data. Currently, models based on Generative Adversarial Networks (GANs) <ref type="bibr" target="#b4">[5]</ref> and Variational Autoencoders (VAEs) <ref type="bibr" target="#b5">[6]</ref> can generate high-quality images, while the popular diffusion models not only produce high-quality images but also exhibit diversity. Therefore, generative models are an effective method for data augmentation. However, privacy concerns must be considered for medical data. Even if the generated data differs, does it still pose a privacy risk? Can the original data be identified from the generated images?</p><p>To address this issue, this paper employs multiple methods, including similarity calculation, enhancing image details before calculating similarity, and using deep learning and Masked Autoencoders (MAE) <ref type="bibr" target="#b6">[7]</ref> methods to calculate the similarity of extracted image features. This approach aims to determine which real data were used for training the generative models based on the generated data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>In recent years, Generative Adversarial Networks (GANs) <ref type="bibr" target="#b4">[5]</ref> have garnered widespread attention in the medical field for various image generation and translation tasks. Numerous studies have explored the application of GANs in medical image synthesis and image-to-image translation, particularly in the recognition or detection of synthetic images. A substantial amount of work has investigated the use of GANs to generate synthetic medical images. For instance, Choi et al. proposed a method called "StarGAN" <ref type="bibr" target="#b7">[8]</ref>, a multi-domain image synthesis technique successfully applied to generate diverse and realistic brain MRI images. Similarly, the paper by Kench et al. introduced SliceGAN <ref type="bibr" target="#b8">[9]</ref>, an architecture that utilizes GANs to generate high-quality three-dimensional datasets from a single representative two-dimensional image.</p><p>Synthetic images play a crucial role in the medical field as they offer significant advantages and address major challenges <ref type="bibr" target="#b9">[10]</ref>. Firstly, the generation of synthetic images allows for the augmentation of limited or insufficient datasets. In many medical imaging applications, obtaining large and diverse annotated datasets can be challenging and time-consuming. By generating synthetic images, researchers can expand the training data, thereby enhancing the robustness and generalization of machine learning models. Secondly, synthetic images can simulate rare or difficult-to-obtain medical scenarios. Certain conditions or diseases may have low prevalence or be challenging to capture through traditional imaging methods, making synthetic images a valuable resource in these instances.</p><p>Synthetic images offer a novel approach to creating representative cases, allowing researchers and clinicians to study and understand these conditions better, develop more effective diagnostic tools, and explore various treatment strategies. Additionally, synthetic images address privacy concerns associated with patient data. Medical images often contain sensitive information, making data sharing and public release challenging. By generating synthetic images, it is possible to retain the statistical and anatomical characteristics of the data while removing specific patient information. This approach preserves privacy, facilitates more open collaboration, and advances research progress. In conclusion, synthetic images are indispensable in the medical field. They play a crucial role in data augmentation, simulation of rare conditions, and privacy protection. Their utilization empowers researchers, clinicians, and technologists to tackle key challenges, enhance diagnostic accuracy, improve patient care, and advance medical imaging technologies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Method</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Task Description and Dataset Analysis</head><p>After training the generative model, it is possible to produce new images. In order to identify the potential privacy threats of using and sharing synthetic medical data in various real-world scenarios, a new challenge (ImageCLEFmedical GANs <ref type="bibr" target="#b10">[11]</ref>) arose as part of the medical track of the ImageCLEF Challenge 2024 <ref type="bibr" target="#b11">[12]</ref>. Our team's username is robot. This task aims to verify whether the generated images can leak information from the original training data. By analyzing the generated images, we can distinguish which images from the real dataset were used to train the generative model. The dataset used in this study is as follows:</p><p>The data is divided into two folders: development data and test data. In the development data, the dataset includes both used and not used images.</p><p>In Table <ref type="table" target="#tab_0">1</ref> and Table <ref type="table" target="#tab_1">2</ref>, the dataset structures for Task 1 and Task 2 are shown, respectively. Although there is a significant difference in the number of real datasets between the two tasks during the development phase, the approach and methods for handling both tasks are consistent.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Data Visualization Analysis</head><p>The quality of a generative model is evaluated based on its ability to produce high-quality and diverse images. A generative model learns the distribution of real data, and a model that can generate highquality images indicates that it has accurately learned the data distribution of real images. Diversity reflects the creative ability of the generative model, assessing whether it can create different images based on the learned data distribution. Visualizing the data distribution allows for an intuitive understanding of the relationships between the data. In this paper, we provide histogram visualizations of the statistical data for both the generated and real images.  As illustrated in Figure <ref type="figure" target="#fig_1">1</ref>, we have compared the pixel values of the generated images with those of the real images. It is evident that the pixel value distributions are quite similar. The horizontal axis of the histogram represents the pixel values, while the vertical axis represents the number of pixels. Based on the results of the two histogram statistics, it can be seen that the generated images and the real images are highly similar. Additionally, it is evident that the real images are also highly similar to each other, even though they include two categories: used images and not used images.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Feature Extraction</head><p>In this paper, we explore the inherent connection between generated images and used images, which were involved in the training process. As we know, in mainstream generative model methods, Generative Adversarial Networks (GANs) have both a generator and a discriminator that mutually enhance each other, ultimately generating the required image data through the generator. The generator's structure involves extracting features and then reconstructing them. The principle structure of Variational Autoencoders (VAEs) is similar, directly reconstructing the extracted features by building a reconstruction loss. Diffusion models work in the same way, adding noise and then reconstructing the noisy features. Although these generative models employ various ingenious designs when constructing loss functions, they essentially aim to achieve reconstruction loss. The calculation of reconstruction loss is typically done using the Mean Squared Error (MSE) function, which is also a method of image similarity comparison. Therefore, the overall approach adopted in this paper is reverse inference through similarity comparison.</p><p>MAE (Masked Autoencoder-Decode) is a self-supervised learning and deep learning method. We know that image similarity can be compared, and similarly, features extracted by deep learning can also be compared for similarity. The features extracted by deep learning often contain highly integrated information. Therefore, calculating the similarity of extracted features is a worthwhile approach to consider. Feature extraction networks need to be trained to accurately extract information from images. Although we can directly use models pre-trained on ImageNet for feature extraction and similarity calculation, the results are not satisfactory because there is a significant difference between ImageNet data and medical data.</p><p>To address this issue, unsupervised learning or self-supervised learning methods can be considered. As shown in Figure <ref type="figure" target="#fig_2">2</ref>, this is the structure of our overall network model. MAE is a self-supervised learning method that treats both data and labels as inputs for model reconstruction. The uniqueness of MAE lies in its approach of masking part of the information and then reconstructing it. This is an excellent idea because, in data with high image similarity, masking part of the information forces the model to focus more on details. This increases the difficulty of reconstruction, allowing the model to learn to distinguish image details from a small amount of data. This approach ensures that the model can approximate the true image even with occlusions, thus emphasizing detailed information. Consequently, we can calculate the similarity of the extracted features.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Feature Similarity</head><p>After completing feature extraction, similarity calculation becomes a critical step for image recognition and classification. This process aims to measure how close two images are in the feature space, and it employs common similarity measures such as Euclidean distance, cosine similarity, and Manhattan distance to achieve this. Each of these measures has its own advantages and applications depending on the nature of the data and the specific requirements of the task.</p><p>The specific steps for similarity calculation using features extracted by the MAE network are as follows. First, acquire the feature vectors for each image to be compared by using a pre-trained MAE network. This involves passing the images through the encoder part of the MAE network, which compresses the input into a latent representation capturing the essential features of the image.</p><p>Next, choose an appropriate similarity measure. For instance, cosine similarity can be particularly useful in cases where the magnitude of the feature vectors is not as important as the orientation, making it ideal for assessing the angle between vectors in high-dimensional space. Alternatively, Euclidean distance might be preferred for tasks where the absolute differences in feature values are more meaningful. Once the similarity measure is selected, calculate the distance or similarity score between the feature vectors of the two images. This score quantitatively expresses how similar or different the images are based on their extracted features.</p><p>Finally, based on the calculated similarity scores, perform operations such as classification, clustering, or retrieval of images. In classification tasks, images can be assigned to predefined categories based on their similarity to representative examples. In clustering, images are grouped into clusters of similar items, which can reveal inherent structures in the data without prior labeling. For image retrieval, the similarity scores can be used to rank a database of images, retrieving those that are most similar to a given query image. This comprehensive approach ensures that the images are analyzed and utilized effectively, leveraging the power of MAE-based feature extraction and similarity calculation to enhance various image processing tasks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experiments</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Experimental Design</head><p>In our experiment, we used the NVIDIA GeForce RTX 3090 graphics card to complete two tasks. The detailed process and time for each experiment are as follows:</p><p>Task 1: To ensure the accuracy and efficiency of the experiment, we loaded all the data onto the RTX 3090 graphics card for processing. Throughout the experiment, we leveraged its powerful computational capabilities and efficient parallel processing features, significantly enhancing the data processing speed. After multiple iterations and optimizations, the total experiment time for Task 1 was approximately 15 hours. During this period, the graphics card operated efficiently, ensuring the integrity of the data and the reliability of the experimental results.</p><p>Task 2: Following the completion of Task 1, we continued to utilize the RTX 3090 graphics card for the second experiment. Similar to Task 1, we performed multiple data loading and processing operations, fully exploiting the card's advantages in deep learning and large-scale data processing.</p><p>Through repeated experiments and optimizations, we successfully completed Task 2 in approximately 16 hours. Throughout this period, the graphics card maintained high efficiency, ensuring the continuity of the experiment and the consistency of the results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Evaluation Metrics</head><p>This task was approached as a binary classification problem, and its evaluation involved several key performance metrics: F1-score, accuracy, precision, recall, and specificity. Among these, the F1-score has been designated as the primary metric for this year's evaluation. The definitions of these metrics are as follows:</p><formula xml:id="formula_0">Precision = 𝑇 𝑃 𝑇 𝑃 + 𝐹 𝑃<label>(1)</label></formula><formula xml:id="formula_1">Recall = 𝑇 𝑃 𝑇 𝑃 + 𝐹 𝑁<label>(2)</label></formula><formula xml:id="formula_2">Specificity = 𝑇 𝑁 𝑇 𝑁 + 𝐹 𝑃<label>(3)</label></formula><formula xml:id="formula_3">Accuracy = 𝑇 𝑃 + 𝑇 𝑁 𝑇 𝑃 + 𝑇 𝑁 + 𝐹 𝑃 + 𝐹 𝑁 (4) F1-score = Precision • Recall Precision + Recall<label>(5)</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Experimental Results</head><p>To conduct a more comprehensive and detailed experimental analysis, we divided the validation dataset into two parts: one as the validation set and the other as the test set. This division allows the validation set to be used for parameter tuning and performance evaluation of the model, ensuring that adjustments made during training are effective. Meanwhile, the test set is used for the final performance evaluation to assess the model's generalization ability on unseen data. This approach helps us to more accurately measure the actual performance and stability of the model, thereby obtaining more reliable and representative experimental results. The experimental results are shown in the table <ref type="table" target="#tab_2">3</ref>. We conducted an ablation study on different feature extraction modules to evaluate their performance in image classification tasks. Specifically, we used VGG, InceptionNet, ResNet50, ResNet101, MobileNetV2, MobileNetV3, EfficientNet, and MAE pre-trained models for feature extraction. Subsequently, we performed similarity calculations to classify the images based on these features.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.1.">Ablation Study</head><p>To comprehensively assess the effectiveness of these feature extraction modules, we evaluated multiple metrics including accuracy, precision, specificity, recall, and F1-score. By calculating and analyzing these metrics, we were able to compare the strengths and weaknesses of each model in similarity computation and image classification tasks.</p><p>The data in the table <ref type="table" target="#tab_2">3</ref> indicates that the MAE pre-trained model achieved the best performance. This model excelled across all the evaluated metrics, demonstrating its robust capability in feature extraction and image classification. These results suggest that the MAE pre-trained model not only captures detailed features of images but also effectively performs classification tasks, providing strong support for future research and applications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.2.">Submission</head><p>We submitted results for the IMAGECLEFmed GANS 2024: Identify Training Data Fingerprints competition. Each submission file had to contain predictions (1 -used, 0 -not used) for all 4,000 images generated by each model. The evaluation method primarily used the F1-score as the evaluation metric, while accuracy was used as the secondary metric. In total, we submitted eight results, with our best score being an F1-score of 0.711.</p><p>As shown in Table <ref type="table" target="#tab_3">4</ref>, the submitted results exhibited significant score differences, which may be due to the selection of less optimal features. When classifying based on the similarity scores, the resulting score differences were considerable. However, the overall experimental results indicate that our method is capable of identifying which real images were used to train the image generation model from the generated images.</p><p>The results obtained from the development dataset differed somewhat from those of the test dataset, likely due to dimensional variations between the datasets. Despite these differences, we successfully developed methods that achieved high F1-scores and accuracy in identifying images used across both datasets. These findings reinforce the hypothesis that synthetic images generated by deep generative models can potentially expose patient identities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>We performed feature extraction on the images, using advanced deep learning models to obtain highdimensional feature representations. Subsequently, we calculated the similarity of these extracted features, using cosine similarity to evaluate the similarity scores between each pair of image features. Based on these similarity scores, we accomplished the binary classification task, categorizing the images as either used or unused. This method allows us to effectively identify and classify images, providing a solid foundation for subsequent image processing and analysis.</p><p>In conclusion, this paper and the ImageCLEFmed GANS challenge contribute to raising awareness about the potential privacy risks associated with the use and sharing of synthetic medical data in real-world applications. We underscore the importance of implementing privacy protection techniques when developing deep generative models using sensitive medical data.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>(a) Pixel statistics of generated images. (b) Pixel statistics of real images.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Pixel statistics are performed on the generated images and real images of the two tasks respectively, and the results of the two tasks are averaged.</figDesc><graphic coords="3,72.00,417.14,203.07,152.30" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Comparative Learning Diagram. The generated image is obtained from the used image, so the two are similar in comparative learning. When comparing the generated image with the not used image for learning, it is dissimilar.</figDesc><graphic coords="5,115.39,65.61,364.50,371.50" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Task1 dataset structure description.</figDesc><table><row><cell>Dataset</cell><cell>Generated</cell><cell>Real</cell></row><row><cell>Development</cell><cell>10000</cell><cell>100 (used) 100 (not used)</cell></row><row><cell>Test</cell><cell>5000</cell><cell>4000 (used and not used)</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Task2 dataset structure description.</figDesc><table><row><cell>Dataset</cell><cell>Generated</cell><cell>Real</cell></row><row><cell>Development</cell><cell>10000</cell><cell>3000 (used) 3000 (not used)</cell></row><row><cell>Test</cell><cell>7200</cell><cell>4000 (used and not used)</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>The average of the experimental results of task1 and task2 on the development dataset. We used different feature extraction models for ablation experiments.</figDesc><table><row><cell>Model</cell><cell cols="5">Accuracy Precision Specificity Recall F1-score</cell></row><row><cell>VGG[13]</cell><cell>0.655</cell><cell>0.652</cell><cell>0.661</cell><cell>0.706</cell><cell>0.676</cell></row><row><cell>InceptionNet[14]</cell><cell>0.616</cell><cell>0.637</cell><cell>0.590</cell><cell>0.850</cell><cell>0.677</cell></row><row><cell>Resnet50[15]</cell><cell>0.632</cell><cell>0.640</cell><cell>0.648</cell><cell>0.652</cell><cell>0.651</cell></row><row><cell>Resnet101[15]</cell><cell>0.650</cell><cell>0.873</cell><cell>0.812</cell><cell>0.484</cell><cell>0.587</cell></row><row><cell>Mobilenetv2[16]</cell><cell>0.721</cell><cell>0.732</cell><cell>0.748</cell><cell>0.715</cell><cell>0.720</cell></row><row><cell>Mobilenetv3[17]</cell><cell>0.722</cell><cell>0.717</cell><cell>0.744</cell><cell>0.694</cell><cell>0.708</cell></row><row><cell>EfficientNet[18]</cell><cell>0.648</cell><cell>0.752</cell><cell>0.815</cell><cell>0.487</cell><cell>0.644</cell></row><row><cell>MAE[7]</cell><cell>0.743</cell><cell>0.721</cell><cell>0.733</cell><cell>0.720</cell><cell>0.726</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4</head><label>4</label><figDesc>Scores of the eight different submitted results.</figDesc><table><row><cell>Submission</cell><cell>Model</cell><cell>F1</cell><cell>Acc</cell><cell>Prec Recall</cell><cell>F1</cell><cell>Acc</cell><cell>Prec Recall</cell><cell>F1</cell></row><row><cell>VGG</cell><cell cols="6">0.312 0.583 0.812 0.216 0.341 0.559 0.761</cell><cell>0.174</cell><cell>0.283</cell></row><row><cell cols="7">InceptionNet 0.314 0.583 0.812 0.216 0.341 0.559 0.747</cell><cell>0.178</cell><cell>0.287</cell></row><row><cell>ResNet50</cell><cell cols="6">0.503 0.503 0.504 0.409 0.451 0.504 0.503</cell><cell>0.619</cell><cell>0.555</cell></row><row><cell>Mobilenetv2</cell><cell cols="6">0.350 0.615 0.600 0.688 0.641 0.504 0.579</cell><cell>0.031</cell><cell>0.058</cell></row><row><cell>Mobilenetv3</cell><cell cols="6">0.429 0.503 0.504 0.409 0.451 0.593 0.751</cell><cell>0.279</cell><cell>0.407</cell></row><row><cell>EfficientNet</cell><cell cols="6">0.524 0.615 0.600 0.688 0.641 0.593 0.751</cell><cell>0.279</cell><cell>0.407</cell></row><row><cell>MAE</cell><cell cols="6">0.603 0.711 0.824 0.538 0.651 0.504 0.503</cell><cell>0.619</cell><cell>0.555</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Acknowledgements</head><p>This work is supported by Open Project Program of Guangxi Key Laboratory of Digital Infrastructure No.GXDINB2024001.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A review of the application of deep learning in medical image classification and segmentation</title>
		<author>
			<persName><forename type="first">L</forename><surname>Cai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Annals of translational medicine</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Focalmix: Semi-supervised learning for 3d medical image detection</title>
		<author>
			<persName><forename type="first">D</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</title>
				<meeting>the IEEE/CVF Conference on Computer Vision and Pattern Recognition</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="3951" to="3960" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A review of medical image segmentation algorithms</title>
		<author>
			<persName><forename type="first">K</forename><surname>Ramesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">K</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Swapna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Datta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Rajest</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">EAI Endorsed Transactions on Pervasive Health and Technology</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="e6" to="e6" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Medical image enhancement based on histogram algorithms</title>
		<author>
			<persName><forename type="first">N</forename><surname>Salem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Malik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Shams</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Procedia Computer Science</title>
		<imprint>
			<biblScope unit="volume">163</biblScope>
			<biblScope unit="page" from="300" to="311" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Generative adversarial networks</title>
		<author>
			<persName><forename type="first">I</forename><surname>Goodfellow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pouget-Abadie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mirza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Warde-Farley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ozair</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Courville</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">63</biblScope>
			<biblScope unit="page" from="139" to="144" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Welling</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1312.6114</idno>
		<title level="m">Auto-encoding variational bayes</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Masked autoencoders are scalable vision learners</title>
		<author>
			<persName><forename type="first">K</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dollár</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Girshick</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</title>
				<meeting>the IEEE/CVF conference on computer vision and pattern recognition</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="16000" to="16009" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Stargan: Unified generative adversarial networks for multi-domain image-to-image translation</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Choi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Choi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-W</forename><surname>Ha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Choo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE conference on computer vision and pattern recognition</title>
				<meeting>the IEEE conference on computer vision and pattern recognition</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="8789" to="8797" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Chung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C</forename><surname>Ye</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2105.00194</idno>
		<title level="m">Feature disentanglement in generating three-dimensional structure from two-dimensional slice with slicegan</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">T</forename><surname>Guibas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">S</forename><surname>Virdi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">S</forename><surname>Li</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1709.01872</idno>
		<title level="m">Synthetic medical images from dual generative adversarial networks</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Overview of ImageCLEF 2024: Multimedia retrieval in medical applications</title>
		<author>
			<persName><forename type="first">B</forename><surname>Ionescu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Müller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Drăgulinescu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Rückert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ben Abacha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Garcıa Seco De Herrera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bloch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Brüngel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Idrissi-Yaghir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Schäfer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">S</forename><surname>Schmidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Pakull</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Damm</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Bracke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">M</forename><surname>Friedrich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Andrei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Prokopchuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Karpenka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Radzhabov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Kovalev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Macaire</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Schwab</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Lecouteux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Esperança-Rodier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Yim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Fu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Yetisgen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Xia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Hicks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Riegler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Thambawita</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Storås</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Halvorsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Heinrich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kiesel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Experimental IR Meets Multilinguality, Multimodality, and Interaction, Proceedings of the 15th International Conference of the CLEF Association (CLEF 2024</title>
		<title level="s">Springer Lecture Notes in Computer Science LNCS</title>
		<meeting><address><addrLine>Grenoble, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Overview of 2024 ImageCLEFmedical GANs Task -Investigating Generative Models&apos; Impact on Biomedical Synthetic Images</title>
		<author>
			<persName><forename type="first">A</forename><surname>Andrei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Radzhabov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Karpenka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Prokopchuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Kovalev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ionescu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Müller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF2024 Working Notes, CEUR Workshop Proceedings</title>
				<meeting><address><addrLine>Grenoble, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Simonyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zisserman</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1409.1556</idno>
		<title level="m">Very deep convolutional networks for large-scale image recognition</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Going deeper with convolutions</title>
		<author>
			<persName><forename type="first">C</forename><surname>Szegedy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Jia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sermanet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Reed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Anguelov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Erhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vanhoucke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rabinovich</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE conference on computer vision and pattern recognition</title>
				<meeting>the IEEE conference on computer vision and pattern recognition</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1" to="9" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Deep residual learning for image recognition</title>
		<author>
			<persName><forename type="first">K</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE conference on computer vision and pattern recognition</title>
				<meeting>the IEEE conference on computer vision and pattern recognition</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="770" to="778" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Mobilenetv2: Inverted residuals and linear bottlenecks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Sandler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Howard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zhmoginov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L.-C</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE conference on computer vision and pattern recognition</title>
				<meeting>the IEEE conference on computer vision and pattern recognition</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="4510" to="4520" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Searching for mobilenetv3</title>
		<author>
			<persName><forename type="first">A</forename><surname>Howard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sandler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Chu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L.-C</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Pang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vasudevan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE/CVF international conference on computer vision</title>
				<meeting>the IEEE/CVF international conference on computer vision</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="1314" to="1324" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Efficientnet: Rethinking model scaling for convolutional neural networks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Le</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International conference on machine learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="6105" to="6114" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
