Identifying Training Data "Fingerprints" Using Border Enhancing Image Processing Methods and Their Ensemble Notebook for the inouekokiteam Lab at CLEF 2024 Koki Inoue1,* , Tetsuya Asakawa1 , Kazuki Shimizu2 , Kei Nomura2 and Masaki Aono1 1 Toyohashi University of Technology, 1-1 Hibarigaoka, Tempaku-cho, Toyohashi, Aichi, 441-8580, Japan 2 Toyohashi Heart Center, 21-1Gobutori, Ohyamacho, Toyohashi, Aichi, 441-8071, Japan Abstract This paper describes our approach to the Identify training data "fingerprints" task of ImageCLEFmedical GANs 2024. In Task 1, the goal is to detect "fingerprints" within the synthetic biomedical image data to determine which real images were used in training to produce the generated images. The proposed method uses image processing as a preprocessing step, and a pre-trained model, Resnet-152, is used for training. We also integrated the predictions of each model. As a result, the model with histogram equalization was able to outperform the baseline model trained without preprocessing by 66.6%. The model with prediction integration achieved 63.1%. Keywords Image Processing, Integrated the Predictions, Histogram Equalization 1. Introduction ImageCLEF has been held as part of CLEF since 2003, and ImageCLEF2024 [1] approaches different areas, including ImageCLEFmedical GANs 2024 [2]. In Task 1 (Task to identify training data "fingerprints"), the goal is to detect "fingerprints" within the synthetic biomedical image data to determine which real images were used in training to produce the generated images. We are participating as a member of the inouekokiteam and are challenging this task. This paper describes the approach used to determine which images were used in training to create the generative model. 2. ImageCLEF 2024 Dataset This section describes the dataset for the Identify training data "fingerprints" task of ImageCLEFmedical GANs 2024 [2]. This task uses two generative models. The dataset contains images used to train each model, images not used for training, and images generated by the models. 2.1. Development Dataset The first generative model consists of 200 images annotated as used/not used for training image generation and 10k generative images generated by model 1. The second generative model consists of 6k images annotated as used/not used for training image generation, and 10k generative images generated by model 2. CLEF 2024: Conference and Labs of the Evaluation Forum, September 09–12, 2024, Grenoble, France * Corresponding author. $ inoue.koki.we@tut.jp (K. Inoue); asakawa.tetsuya.um@tut.jp (T. Asakawa); shimizu@heart-center.or.jp (K. Shimizu); kein312@gmail.com (K. Nomura); masaki.aono.ss@tut.jp (M. Aono)  0009-0003-9600-1939 (K. Inoue); 0000-0002-8345-7094 (T. Asakawa); 0009-0000-3448-7986 (K. Shimizu); 0000-0003-2838-7844 (K. Nomura); 0000-0003-1383-1076 (M. Aono) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 2.2. Test Dataset The test dataset contains two CSV files and two folders, and does not specify which set of images was used to train the generative model. The ratio of generated to real images is not identical. The first folder contains 7200 generated images and 4000 real images. The second folder contains 5000 generated images and 4000 real images. 3. Proposed Method In this section, we describe our approach to the task of identifying the training data "fingerprints" of ImageCLEFmed GANs 2024 [2]. We have observed that the color boundaries of the generated images are often unclear. Therefore, we propose a method that captures the boundary sharpness using a set of OpenCV [3] image processing functions as preprocessing for both training and prediction. We also propose a method to integrate the prediction results of each training model into a single result. The image processing methods used are shown below. • Binarization • Histogram Equalization • Laplacian Process • Contrast Adjustment We also propose a method to integrate the predictions of each training model into a single prediction. A total of five models are used: one model trained without image processing and four models trained with the image processing described above. The integration procedure is described below. • Take a majority vote of the five models’ forecasts and make an integrated forecast. • If the predictions of all five models are not in agreement, a negative result is assumed. 4. Preprocessing by Image Processing This section describes the image processing preprocessing performed on the development and test datasets. 4.1. Binarization Binarization was performed in preprocessing using OpenCV [3]. The image was loaded as grayscale, and Otsu binarization [4] was performed. It uses the threshold that maximizes the separation between classes. 4.2. Histogram Equalization We describe the preprocessing histogram equalization performed using OpenCV [3]. The images were loaded as grayscale and subjected to histogram equalization. This is a process that transforms the density so that the histogram of pixel values is uniform throughout. 4.3. Laplacian Process Laplacian processing was performed using OpenCV [3]. The images were loaded as grayscale and pro- cessed with a Laplacian filter. It detects edges where the difference in pixel values changes significantly. Figure 1: Flow of model Predictions 4.4. Contrast Adjustment Contrast adjustment was performed in preprocessing using OpenCV [3]. The images were loaded as grayscale images and the contrast was adjusted. It was adjusted with 𝛼=1.5 and 𝛽=0. v′ is the output pixel value and v is the input pixel value. v′ = 𝛼 × v + 𝛽 (1) 5. Train In this section, we describe the training of the model. A pre-trained model from Resnet-152[5] was used for training. As training data, we used 3100 images each from generated_1 and generated_2 in the development dataset, for a total of 6200 images as generated, and all images from not_used_1, used_1, not_used_2, and used_2 as real. A total of 6200 images were considered REAL. In addition to preprocessing by image processing, random horizontal flipping was applied to the training images. 6. Prediction In this section, we describe the prediction using the model described in the previous section and the integration of the prediction results. Test dataset preprocesses the models for prediction by image processing according to the model used. A total of five models are used for the prediction, one trained without image processing and four trained with different image processing methods. 6.1. Model Predictions The prediction for each model is described in the following section. The detailed flow is shown in Figure 1. For the prediction of a trained model without image processing, no image processing is applied to test dataset. For the trained model with image processing, the same image processing was applied to test dataset to make predictions. 6.2. Integration of Prediction We describe the integration of the predictions, using two methods: one with no image processing on test dataset, and the other with four different image processing methods. For the integration of the predictions, we used majority voting and perfect agreement. The integration flow is shown in Figure 2. For perfect agreement, the results were accepted only when all the results predicted by the five models were in agreement, and rejected when they were not. 7. Submission Results In this section we describe the results of our team’s submissions. The submissions included predictions for each of the five models (Run ID: 891-896) and the integration of the predictions (Run ID: 301, 890). The prediction for the model without added preprocessing (Run ID: 896) was 66.3%. The highest score Figure 2: Integration of the five models’ predictions Table 1 Submission Results M1 M2 Run ID method name Score Acc Prec Recall F1 Acc Prec Recall F1 896 Non-Preprocessed 0.663 0.495 0.497 0.987 0.661 0.5 0.5 0.996 0.66 895 Binarization 0.638 0.484 0.49 0.838 0.619 0.503 0.501 0.951 0.656 894 Contrast Adjustment 0.660 0.491 0.495 0.973 0.656 0.499 0.499 0.993 0.664 892 Histogram Equalization 0.666 0.499 0.499 0.998 0.665 0.501 0.5 0.999 0.667 891 Laplacian Process 0.663 0.484 0.49 0.838 0.619 0.503 0.501 0.951 0.656 301 Majority Voting Non 890 Perfect Agreement 0.631 0.473 0.484 0.805 0.604 0.508 0.504 0.945 0.657 for the prediction using the model with histogram equalization (Run ID: 892) was 66.6%. No score was returned for majority voting (Run ID: 301), one of the proposed methods. The reason for not returning a score is believed to be that it produced the same prediction result for all test data. For perfect agreement (Run ID: 890) the score was 63.1%. 8. Discussion In this section, we describe the submitted results. The model with histogram equalization and laplacian processing outperformed the baseline model with no preprocessing (Run ID: 896). Other models with additional preprocessing underperformed the baseline. This suggests that histogram equalization is an effective image processing method for detecting "fingerprints" within the synthetic biomedical image data to determine which real images were used in training to produce the generated images. We were not able to exceed the baseline for perfect agreement in predictive integration. One possible reason for this is that histogram equalization was effective, but other image processing methods were not. It is also possible that the acceptance method of rejecting all predictions if they did not match resulted in the rejection of accurate predictions. No results were returned for majority voting for prediction integration. A possible reason for this could be that the prediction was not accepted because it was used for all images. 9. Conclusion This paper describes an approach to the identify training data "fingerprints" task of ImageCLEFmedical GANs 2024[2]. We applied image processing as a preprocessing step, and attempted training and prediction. We also made predictions for each model, and attempted to integrate the predictions using majority voting and perfect agreement methods. The results showed that only the models with histogram equalization and laplacian processing were able to exceed the 66.3% of the models without image processing that were set as the baseline. Both predictions integration failed to exceed the baseline. This paper describes an approach to the task of identifying training data "fingerprints" of Image- CLEFmedical GANs 2024 [2]. We applied image processing as a preprocessing step and attempted training and prediction. We also made predictions for each model and attempted to integrate the predictions using majority voting and perfect agreement methods. The results showed that only the models with histogram equalization and laplacian processing were able to exceed the 66.3% of the models without image processing, which was set as the baseline. Both prediction integrations failed to outperform the baseline. 10. Acknowledgments A part of this research was carried out with the support of the Grant for Toyohashi Heart Center Smart Hospital Joint Research Course and the Grant-in-Aid for Scientific Research (C) (issue numbers 22K12149 and 22K12040). References [1] B. Ionescu, H. Müller, A. Drăgulinescu, J. Rückert, A. Ben Abacha, A. Garcıa Seco de Herrera, L. Bloch, R. Brüngel, A. Idrissi-Yaghir, H. Schäfer, C. S. Schmidt, T. M. Pakull, H. Damm, B. Bracke, C. M. Friedrich, A. Andrei, Y. Prokopchuk, D. Karpenka, A. Radzhabov, V. Kovalev, C. Macaire, D. Schwab, B. Lecouteux, E. Esperança-Rodier, W. Yim, Y. Fu, Z. Sun, M. Yetisgen, F. Xia, S. A. Hicks, M. A. Riegler, V. Thambawita, A. Storås, P. Halvorsen, M. Heinrich, J. Kiesel, M. Potthast, B. Stein, Overview of ImageCLEF 2024: Multimedia retrieval in medical applications, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction, Proceedings of the 15th International Conference of the CLEF Association (CLEF 2024), Springer Lecture Notes in Computer Science LNCS, Grenoble, France, 2024. [2] A. Andrei, A. Radzhabov, D. Karpenka, Y. Prokopchuk, V. Kovalev, B. Ionescu, H. Müller, Overview of 2024 ImageCLEFmedical GANs Task – Investigating Generative Models’ Impact on Biomedical Synthetic Images, in: CLEF2024 Working Notes, CEUR Workshop Proceedings, CEUR-WS.org, Grenoble, France, 2024. [3] G. Bradski, The OpenCV Library, Dr. Dobb’s Journal of Software Tools (2000). [4] N. Otsu, A threshold selection method from gray-level histograms, IEEE Transactions on Systems, Man, and Cybernetics 9 (1979) 62–66. doi:10.1109/TSMC.1979.4310076. [5] K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.