KDE Lab at ImageCLEFmedical GANs 2024 Shota Fukuyama1 , Tetsuya Asakawa1 , Kazuki Shimizu2 , Kei Nomura2 and Masaki Aono1 1 Toyohashi University of Technology, 1-1 Hibarigaokam Tempakucho, Toyohashi, Aichi, 441-8580, Japan 2 Toyohashi Heart Center, 21-1 Gobutori, Oyama-cho, Toyohashi, Aichi, Japan, 441-8530 Abstract This paper describes KDE Lab approach in ImageCLEF GANs 2024. ImageCLEFmedical GANs 2024 task consists of two sub tasks, Identify training data fingerprints task and Detect Generative models’fingerprints task. Identify training data fingerprints task aims to examine the existing hypothesis that GANs are generating medical images that contain certain "fingerprints" of the real images used for generative network training. Detect Generative models’fingerprints task aims to investigate the hypothesis that generative models imprint distinctive "fingerprints" onto the generated images. We studied Identify training data fingerprints task in this research. For experiment, we attempted a methods that is approach using Multiple pre-training models and the results were compared. Also, we attempted a methods that is approach using super-resolution technique. Finally, the experiment was a failure and the results could not be compared. The competition result had an accuracy of 0.484. Keywords Medical Images, GANs, CNN, super-resolution 1. Introduction In recent years, the emergence of various generative models has made it possible to construct highly accurate models based on a small amount of image data. In the medical field, it has been difficult to prepare an even and large amount of images with specific attributes, such as image data of rare diseases, resulting in biased data, but medical image generation AI is solving this problem. However, the problem arises when medical images can be falsified using these technologies. Generated images must be useful data for training models, and it must be possible to determine whether they are generated images or not. This paper describes KDE Lab approach in Identify training data fingerprints task of ImageCLEF GANs 2024 [1]. ImageCLEF has been held as part of CLEF since 2003. ImageCLEF2024 [2] focus on multiple applications between different tasks, and ImageCLEFmedical GANs Task is one of them. We have taken an approach to determine whether a generated image is a real image or a generated image. In the architecture, MobileNetv2 [3], DenseNet- 121 [4] , and ResNet-50 [5], which were pre-trained on ImageNet [6] , were used for the CNN. In the following, we will describe dataset, methods, experiment and results. 2. Dataset In this research, we use some of the images given in the ‘Identify training data “fingerprints” Tasks’ of the ImageCLEFmedical GANs 2024. The image data are axial slices of 3D CT images of about 8000 pulmonary TB patients. These images are stored in the form of 256x256 pixel 8-bit/pixel PNG images. From that image data, we used a total of 10,200 images, 10,000 images generated from GAN model 1, which is not revealed, and 100 images each annotated as having been used/not used for training the image generation for that model. The 10,000 images generated from GAN model 2 that were not CLEF 2024: Conference and Labs of the Evaluation Forum, September 09–12, 2024, Grenoble, France $ fukuyama.shota.jf@tut.jp (S. Fukuyama); asakawa.tetsuya.um@tut.jp (T. Asakawa); shimizu@heart-center.or.jp (K. Shimizu); kein312@gmail.com (K. Nomura); masaki.aono.ss@tut.jp (M. Aono) € https://www.kde.cs.tut.ac.jp/~aono/ (M. Aono)  0009-0005-1269-039X (S. Fukuyama); 0000-0002-8345-7094 (T. Asakawa); 0009-0000-3448-7986 (K. Shimizu); 0000-0003-2838-7844 (K. Nomura); 0000-0003-1383-1076 (M. Aono) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Table 1 CNN approach approach model super-resolution Method1 MobileNetV2 without Method2 DenseNet-121 without Method3 ResNet-50 without Method4 MobileNetV2 with revealed and the 100 images each annotated as used/not used for training the image generation of that model were not used this time. 3. Methods 3.1. Preprocessing The images in the dataset are grayscale. Therefore, we attempted to colorize them using (tensor- flow.image.grayscale_to_rgb). To augment the data, horizontal inversion and ImageGenerator were applied as follows. • zoom_range : 0.97-1.03 – Scaling is performed randomly within a scaling factor range of 0.97 1.03. • rotation_range : 10 – Randomly rotate the image in the range of -10 10 degrees. • height_shift_range : 0.05 – Vertical translation in the range of x 0.05 pixels of the original image height. • width_shift_range : 0.05 – Left-right translation within a range of x 0.05 pixels of the original image’s width. The default input channel is (224,224), so it was resized from (256,256) to (224,224). The horizontal inversion is performed only on the real image; 24 images are added in ImageGenerator for the image before and after horizontal inversion, respectively. The number of images can be reduced from 1 real image to 1 original image + 1 flipped image + 48 images converted by ImageGenerator = 50 images, which means that the data can be padded up to 10,000 images in total (1:1) when 200 real images are used. We thought that super-resolution technique would allow us to obtain more features that are not present in the real images but are prominent only in the generated images, so we performed super-resolution technique on the real images using ESRGAN [7] and then performed the same operation. 3.2. Architecture We attempted CNN approach with three different CNN that employed ResNet50, MobileNetV2 and DenseNet-121. These CNNs used models pre-trained in imagenet(Method1-4). The input of the structure is set to (224,224,3), and the preprocess_input function is used before the CNN to perform preprocessing according to the weight data of each model. After the CNN, Dropout and Dense are installed. For Experimental details, Epoch is 50, Adam optimizer is used with learning rate 10-4. Additionally, Binary Cross Entropy is used as loss function. Figure 2 shows the architecture of our CNN approach. Method 4 uses images enhanced by ESRGAN. The shape of the image changed from (224,224) to (896,896) when the image was enlarged by ESRGAN. Therefore, the input of the architecture is adjusted to the shape. In addition, the pre-training of the model is “False”. Figure 1: architecture Table 2 confusion matrix prediction_Negative prediction_Positive Actual_Negative TN(True Negative) FP(False Positive) Actual_Positive FN(False Negative) TP(True Positive) 3.3. Evaluation Accuracy, Precision, Recall, and F1 values were used as evaluation indices. They can be calculated from the confusion_matrix of sklearn. The calculation formula is as follows(1)-(4). 𝑇𝑃 + 𝑇𝑁 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (1) 𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 + 𝑇𝑁 𝑇𝑃 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (2) 𝑇𝑃 + 𝐹𝑃 𝑇𝑃 𝑅𝑒𝑐𝑎𝑙𝑙 = (3) 𝑇𝑃 + 𝐹𝑁 2 · 𝑅𝑒𝑐𝑎𝑙𝑙 · 𝑃 𝑟𝑒𝑠𝑖𝑠𝑖𝑜𝑛 𝐹1 = (4) 𝑅𝑒𝑐𝑎𝑙𝑙 · 𝑃 𝑟𝑒𝑠𝑖𝑠𝑖𝑜𝑛 4. Results The results obtained from the Method 1-4 approach were submitted and returned scores are shown in Table 3-5. Methods 2-4 failed to submit; Method 1 received a accuracy_score of 0.48425,precision_score of 0.480433, recall_score of 0.44825,f1_score of 0.4542. Failed of score was due to the fact that there were many of the same predicted results and that the method of submission was incorrect. The Method 1 values of Accuracy, Precision, Recall, and F1 were 0.484, 0.476, 0.317, 0.380 for Dataset1, 0.481, 0.484, 0.579, 0.527 for Dataset2. 5. Conclusion This paper describes an approach to the training data fingerprint identification task in the Image- CLEF2024 GAN. We attempted to use multiple pre-training models and super-resolution techniques in our approach. However, this time, we were unable to produce correct values and could not compare the results for each model or image data with and without super-imaging. Therefore, it was not possible to Table 3 Submission Result(Dataset1&Dataset2) approach accuracy precision recall f1_score Method1 0.484 0.480 0.448 0.454 Method2 Failed Failed Failed Failed Method3 Failed Failed Failed Failed Method4 Failed Failed Failed Failed Table 4 Submission Result(Dataset1) approach accuracy precision recall f1_score Method1 0.484 0.476 0.317 0.380 Table 5 Submission Result(Dataset2) approach accuracy precision recall f1_score Method1 0.481 0.484 0.579 0.527 identify features that are not present in the real images but are prominent only in the generated images. As a result of the competition, the CNN approach with EfficientNetV2 achieved an accuracy of 0.484. Acknowledgments A part of this research was carried out with the support of the Grant for Toyohashi Heart Center Smart Hospital Joint Research Course and the Grant-in-Aid for Scientific Research (C) (issue numbers 22K12149 and 22K12040). References [1] A. Andrei, A. Radzhabov, D. Karpenka, Y. Prokopchuk, V. Kovalev, B. Ionescu, H. Müller, Overview of 2024 ImageCLEFmedical GANs Task – Investigating Generative Models’ Impact on Biomedical Synthetic Images, in: CLEF2024 Working Notes, CEUR Workshop Proceedings, CEUR-WS.org, Grenoble, France, 2024. [2] B. Ionescu, H. Müller, A. Drăgulinescu, J. Rückert, A. Ben Abacha, A. García Seco de Herrera, L. Bloch, R. Brüngel, A. Idrissi-Yaghir, H. Schäfer, C. S. Schmidt, T. M. G. Pakull, H. Damm, B. Bracke, C. M. Friedrich, A. Andrei, Y. Prokopchuk, D. Karpenka, A. Radzhabov, V. Kovalev, C. Macaire, D. Schwab, B. Lecouteux, E. Esperança-Rodier, W. Yim, Y. Fu, Z. Sun, M. Yetisgen, F. Xia, S. A. Hicks, M. A. Riegler, V. Thambawita, A. Storås, P. Halvorsen, M. Heinrich, J. Kiesel, M. Potthast, B. Stein, Overview of ImageCLEF 2024: Multimedia retrieval in medical applications, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction, Proceedings of the 15th International Conference of the CLEF Association (CLEF 2024), Springer Lecture Notes in Computer Science LNCS, Grenoble, France, 2024. [3] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen, Mobilenetv2: Inverted residuals and linear bottlenecks, 2019. arXiv:1801.04381. [4] G. Huang, Z. Liu, L. van der Maaten, K. Q. Weinberger, Densely Connected Convolutional Networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4700–4708. [5] K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778. [6] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, ImageNet: A large-scale hierarchical image database, in: 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), 2009, pp. 248–255. doi:10.1109/CVPR.2009.5206848. [7] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, C. C. Loy, Y. Qiao, X. Tang, Esrgan: Enhanced super-resolution generative adversarial networks, 2018. arXiv:1809.00219.