=Paper=
{{Paper
|id=Vol-3609/paper16
|storemode=property
|title=New CNN Stacking Model for Classification of Medical Imaging Modalities and Anatomical Organs on Medical Images
|pdfUrl=https://ceur-ws.org/Vol-3609/paper14.pdf
|volume=Vol-3609
|authors=Mamar Khaled,Djamel Gaceb,Fayçal Touazi,Chakib Ammar Aouchiche,Youcef Bellouche,Ayoub Titoun
|dblpUrl=https://dblp.org/rec/conf/iddm/KhaledGTABT23
}}
==New CNN Stacking Model for Classification of Medical Imaging Modalities and Anatomical Organs on Medical Images==
New CNN stacking model for classification of medical imaging modalities and anatomical organs on medical images Mamar Khaleda, Djamel Gaceba, Fayçal Touazia, Chakib Ammar Aouchichea, Youcef Bellouchea, Ayoub Titounb a LIMOSE laboratory, Computer science department, University M’hamed Bougara, Independence Avenue, 35000 Boumerdes, Algeria b F2i Institute, School of Computer Science, Digital & Commerce, Vincennes, France Abstract Decision making in medical diagnosis is tedious and very rigorous task, hence the requirement to use more advanced and intelligent medical imaging diagnostic support systems. The automation of the recognition of medical imaging modalities and human anatomical organs gives these systems the possibility of processing, in an automatic and adapted manner, different types of images in consideration of different medical imaging modalities. It also offers better support to clinicians and patients allowing them to access to more effective image analysis and diagnostic tools. In this context, three deep learning approaches were developed and tested on six different CNN models (VGG16, VGG19, ResNet-50, Xcpetion, Inception and NASNet). Two deep transfer learning modes and an ensemble deep learning algorithm based on stacking were used. The experiments carried out on two datasets of medium and high challenges show very interesting results with F-score reaching 99% for the classification of image modalities and 98% for the classification of anatomical organs. Keywords 1 Anatomy organs, medical imaging modalities, deep transfer learning, ensemble deep learning, medical image processing, computer-aided diagnosis. 1. Introduction The medical imaging field has undergone spectacular evolution in recent decades, offering unprecedented opportunities for the early and accurate diagnosis of various pathologies. However, manual or conventional interpretation of medical images and segmentation of anatomical organs or lesions remain complex and time-consuming tasks for radiologists and clinicians. In this context, the use of machine learning techniques, and in particular deep learning, has shown promise in improving the efficiency and accuracy of these processes. Today, the scientific community uses deep learning algorithms to improve diagnosis and help doctors in their work [1]. These algorithms offer more relevant automatic characterization and are capable of learning by developing broad knowledge on a large volume image datasets. The possibility of transferring learning in an incremental and scalable manner offers a great advantage to these algorithms in recognition, prediction or classification tasks with better precision. These properties are particularly interesting in the medical field, which is very demanding in terms of precision on datasets, which are often limited. By leveraging models already pre-trained on image classification tasks, we can capitalize on learned visual features to aid in the automatic identification of medical imaging modalities and anatomical organs. A large number of deep learning methods use deep convolutional neural networks (CNN). They are successfully applied in medical image analysis, giving promising results. The application area IDDM’2023: 6th International Conference on Informatics & Data-Driven Medicine, November 17 - 19, 2023, Bratislava, Slovakia EMAIL: m.khaled@univ-boumerdes.dz (A. 1); d.gaceb@univ-boumerdez.dz (A. 2), f.touazi@univ-boumerdez.dz (A. 3) ORCID: 0000-0002-6178-0608 (A. 2); 0000-0001-5949-5421 (A. 3); ©️ 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings covers the entire spectrum of medical image analysis, including detection, segmentation, classification and computer-aided diagnosis [2]. The automatic classification of medical imaging modalities and anatomical organs will enable the development of medical diagnosis support systems and the appropriate automatic processing of a large corpus of images. It also facilitates the research work of doctors and healthcare professionals, by automatically knowing the image modality, clinicians can correctly interpret the image and make precise medical decisions. For example, it allows clinicians to quickly find the medical images they need to make comparisons, diagnoses, plan treatments and track the progression of diseases. This work presents several original contributions by addressing the problem of recognition of medical imaging modalities and anatomical organs. A problem that is rarely addressed in the literature where public datasets are very rare. The main contributions of this work are as follows: - construction of a new dataset with different challenges, created from several sources, - comparisons of six existing CNN models with simple and complex architecture according to different transfer learning strategies (VGG 16, VGG 19, ResNet-50, Xcpetion, Inception and NASNet networks). We also studied the ability to transfer knowledge and features, learned initially on the basis of ImageNet dataset containing more than 14 million non-medical images to medical images constituting a set of images of relatively small size. Therefore, we chose to explore different levels of fine tuning allowing partial transfer of the learned features from ImageNet dataset to the images from our targeted dataset. This type of transfer can be very useful when deep neural networks are pre- trained on datasets very different from the subject area and when we have a very small and insufficient image dataset. - Application of ensemble deep learning using the stacking of different CNN sub-models, for the combination of knowledge of different models and the pooling of their complementarity. The rest of this paper is organized as follows: Section 2 presents related works on classifications of medical image modalities and anatomical organs using deep learning. In the third section, we present the different proposed approaches. The experimental results are presented and discussed in the fourth section. 2. Related works 2.1. Overview of existing deep learning-based approaches for classification of medical imaging modalities The classification of medical imaging modalities is a very important preliminary step to bring more autonomy to intelligent medical diagnostic support systems and to help clinicians access the required medical imaging in the system. Among the existing works that focus on deep learning, we find the work of Yu et al. [3, 4], which focuses on the combination of two CNN architectures (VGG16 and ResNet-50), already pre-trained on ImageNet dataset, using deep transfer learning and a voting system. The experiments, which were carried out on two medical datasets (ImageCLEF2015 and ImageCLEF2016), showed that the proposed combination approach offers the best accuracy (90.22% on the ImageCLEF2015 dataset and 88.40 on the ImageCLEF2016 dataset) in comparison with the VGG16 architectures (87.27% on ImageCLEF2015 and 85.13 on ImageCLEF2016) and ResNet-50 (89.34% on ImageCLEF2015 and 87.47 on ImageCLEF2016). Kim et al. [5] developed a new method called Class-selective Relevance Mapping (CRM), to locate and visualize RoI (regions of interest) in a medical image in order to improve the predictions of CNN models for medical imaging classification. In addition to this model, a pre-trained VGG16 was used to classify seven different types of image modalities. An accuracy of 0.98% is obtained on the Access Biomedical Image Search engine dataset from the United States National Library of Medicine (NLM) and on the ImageCLEF2013 dataset. In another study, Remedios et al. [6], explored a CNN architecture known as Φ-Net to classify MRI images into different categories according to the acquisition modality (T1, T2, FLAIR and subclasses T1 pre, T1 post, FLAIR pre and FLAIR post). This model was created by combining several CNN architectures with the concept of residual learning [7]. The experiments carried out on a dataset of 3418 MRI images showed that the Φ-Net model had an average accuracy of 97.57% for classification (T1, T2, FLAIR), compared to 95.47% obtained by the ResNet architecture. Chiang et al. [8] used a CNN model for the classification of 4 classes of medical imaging (CT of the abdomen, CT of the brain, MRI of the brain and MRI of the spine). The experiments carried out on the dataset from the Taiwanese Shuang-Ho hospital (700 images per class) showed an accuracy of 99.5% and an F-score of 99%, in the same direction we find the work of Laribi and al. [9], where the authors developed a new progressive deep transfer learning approach to diagnose Alzheimer Disease, applied on the same dataset (Brain MRI dataset), and they achieved best results. Recent work of Atrey and al. [10], who developed a hybrid deep learning bimodal CAD algorithm for the classification of breast lesions using mammogram and ultrasound imaging modalities combined, A combined CNN and LSTM model was implemented using different images obtained from both mammogram and ultrasound modalities to improve the early diagnosis of Breast Cancer. The proposed bimodal approach achieved a 99.35% of accuracy for the classification. According to the literature, we notice the intensive need to explore new ways and approaches based on the transfer learning and on the ensemble deep learning in order to achieve better results in medical diagnostic support systems and especially in the context of classification of medical imaging modalities as well as for the classification of anatomical organs which represents the object of our study. 2.2. Overview of existing deep learning-based approaches for anatomical organs classification Automated classification of anatomical organs is an important step and a prerequisite for many medical diagnostic support systems. Spatial complexity and variability of anatomy throughout the human body make classification difficult. In the literature we can find the review of Jiang and al. [11], where they reviewed in-depth and analyzed some deep learning-based methods utilized in multiple- lesion recognition, they were interested to the multiple-lesion recognition in diverse body areas and recognition of whole-body multiple diseases. Holger and al. [12] trained a CNN model to identify anatomical organs (neck, lungs, liver, pelvis and legs) on axial tomography images. An accuracy of 0.998% was achieved on images from Hospital PACS Dataset. Takiyama et al. [7] worked on the classification of endoscopic (esophagogastroduodenal) medical images to recognize the locations of anatomical organs. Images were categorized into four anatomical locations (larynx, esophagus, duodenum, and stomach) and three additional sublocations of the stomach (upper, middle, and lower), allowing for accurate anatomical classification of the images. The experiments were carried out on a dataset of 27,335 endoscopic gastroesophageal (EGD) images from a Japanese hospital. An accuracy of 97% was achieved using the GoogleNet model. In the study done by Kolbinger and al. [13], We see the combination of two well-known methods (DeepLabv3 and SegFormer) on a new dataset of 13195 laparoscopic images, in the aim to develop segmentation models for the anatomical structures, they concluded that ML methods can improve the assistance in anatomy recognition. Khan et al. [14] proposed a new CNN architecture (compared to three existing CNN architectures: LeNet, AlexNet and GoogLeNet) for the classification of images of different parts of the human body (head, neck, thorax, abdomen, pelvis, upper and lower limbs) coming from different medical imaging modalities, including CT, MRI, PET, ultrasound and X-rays. The proposed architecture gave a Test Accuracy rate of 81%, the best rate in comparison with three existing CNN architectures (LeNet 59%, AlexNet 74% and GoogLeNet 45%) on a dataset of 37,198 images of various anatomical organs. This work shows the interest of existing work in more powerful CNN architectures. Deep learning-based approaches have yielded promising results in the classification of anatomical organs. However, they often require large amounts of data, which can be difficult to obtain in the medical field. 3. Proposed approaches As part of this work, we present different approaches based on deep transfer learning that we have developed for the classification of medical imaging modalities and anatomical organs. Such classifications present several challenges that should not be overlooked during development. The main difficulties are often posed by the intra-class variability of medical images, the diversity of imaging modalities used, the complexity of anatomical structures and the unavailability of datasets of sufficient size in the medical field. The use of deep transfer learning is a better choice to design a more robust approach to these constraints. This involves the use of pre-trained CNN models on generic image datasets of sufficient volume, to benefit from representations that focus on generic and low-level image features, learned on massive and diverse data. These models are thus re-trained (on our small dataset) and refined by seeking the best level of fine tuning, making it possible to complete the initial low-level representation, valid for all kinds of images, with a second high-level representation, specific to our problem of classification of image modalities and anatomical organs. In this context, we chose to develop six very popular CNN models (VGG16, VGG19, ResNet-50, Inception, Xception and NASNet) with performance already demonstrated in the medical field. Based on our previous study on CNN models combination [15] and it’s benefits, we also developed an ensemble deep learning which involves combining several CNNs to take advantage of their complementarities. With the stacking mechanism, we propose to use a softmax meta-model which learns the best weighting and combination of these sub-models. 3.1. Proposed approaches for the classification of medical imaging modalities At this level, we have developed three different approaches to determine the best behavior to follow, the first two approaches concern the development of six CNN models using two transfer learning modes (features extractor and fine-tuning modes). Transfer learning makes it possible to solve the problem of the reduced size of a dataset. It consists of reusing a pre-trained model on another large dataset (even outside the medical field), preserving part of it for relevant extraction of generic characteristics and fine tuning the remaining part on our small target dataset to extract specific characteristics. This approach allows for faster learning and a more reliable model from a very small dataset. The third approach consists of seeking the best combination of different CNN models with the stacking technique in order to take advantage of the complementarity between them. This combination has the advantage of being able to aggregate very different classifiers and significantly improve the quality of the final prediction. The use of ensemble deep learning methods is necessary when we want to take a step forward in obtaining better prediction results of medical imaging modalities. • Approach 1: is based on transfer learning in features extractor mode. The convolutional part (features extractor) of the pre-trained CNN is completely frozen in order to preserve all the knowledge already acquired on the initial (very large) dataset. With this mode, only the classifier part (Softmax) will be adapted to the new image modalities classification task. The use of pre- trained CNN models, aims to extract high-level characteristics from medical images. Then, these features were used to train modality-specific classifiers. This approach allows us to benefit from knowledge learned from large generic databases. Transfer learning with this mode is faster than that based on fine tuning, however, it requires the presence of certain similarities between the original dataset images and the target dataset images. Six CNNs are compared using this approach (see Figure 1). Figure 1: Approach 1 architecture Approach 2: is based on transfer learning in fine-tuning mode. In this mode, a pre-trained model is used as a starting point, but unlike the features extractor mode, a set of the last layers of the convolutional part of the model are fine-tuned during training on the new dataset, specific to medical imaging. The layers which are not refined are frozen (this concerns the layers closest to the input) to preserve certain generic knowledge of the pre-trained CNN model which has already learned on a large dataset to extract low level features (which concerns all kinds of images). For the convolution part, the number of frozen blocks must be fixed empirically in order to have a better score. Since the number of classes is different in the target dataset compared to that of the original dataset, the structure of the classifier part (Full connected layers) must be adapted to recognize the new classes of different image modalities. This method is composed of three steps, the first one concern the fine- tuning of the parameters of the non-fixed layers, secondly extraction of image features, finally, the last step consists of the generation of predications for the classification using the Softmax classifier. Using fine tuning mode of TL Figure 2: Approach 2 architecture • Approach 3: is based on the combining mode of different CNN models with the stacking technique. In this mode, several pre-trained models are used as a starting point, we train the models on a database which only contains MRI images of different types (ex. Flair, T1w, T1wCE and T2w), we will thus obtain trained models, then we combined them with stacking. Figure 3: Synoptic diagram of approach 3, based on stacking of CNNs Stacking process consists of training several CNN models independently on the same dataset. Each sub-model may have a different architecture with different settings. Once the sub-models are trained individually, a meta-model is added to the output of these models. It receives predictions from different CNNs as input and learns to combine these predictions to produce a final prediction. This meta-model can be based on any machine learning model. In our method, a softmax classifier is used as a stacking meta-model. It receives as input the different prediction probabilities coming from the output of different stacked CNN sub-models. The objective of this step is to train a new model (softmax) to learn how to best combine the contributions from each CNN sub-model. Stacking process allows you to take advantage of the diversity of individual models by combining their strengths and mitigating their weaknesses. This approach can result in better predictive performance than any single contributing model. It is also important to note that it is possible to distribute this approach in a real-time framework and make the CNN sub-models work in multitasking, multiprocessors, multicores or parallelism. This allows better management of the complexity generated by the stacking of sub-models. 3.2. Anatomical organs classification For the classification of anatomical organs, we will use the transfer learning in feature extractor mode. We used it by the same idea as that of the approach 1 to the classification of medical imaging modalities. 4. Experiments and results In this section, we will present the experiments that we carried out as part of our study, as well as the results obtained. First, we will present the evaluation metrics used. Then we will present in detail the datasets used with samples of each of them. After that, a description of the data augmentation technique used. Then, the results of our experiments for each approach illustrated in tables followed by comparisons and comments. 4.1. Evaluation metrics In the literature, there are several evaluation metrics, in our case, and to evaluate the different proposed approaches, we used the following metrics: Accuracy, Precision, Recall and F1-score. • Confusion matrix: Confusion matrix or error matrix is one of the key concepts when we talk about classification problems. This matrix is a two-dimensional array (“actual” and “predicted”) and sets of “classes” in both dimensions. Our actual classifications are columns and the predicted ones are rows as shown in the table below: Table 1 Confusion Matrix Actual Positive (1) Negative (0) Positive (1) TP FP Predicted Negative (0) FN TN Almost all performance measures are based on the confusion matrix and the numbers it contains. • True positive (TP): Real class = True and the prediction=True. • True negative (TN): Real class = False and the prediction = False. • False positive (FP): Real class =False and prediction =True. • False negative (FN): Real class = True and the prediction=False. • Accuracy: Number of correct predictions divided by the total number of samples. Is a good measure when the classes of target variables in the data are almost balanced. 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠 𝑇𝑃 + 𝑇𝑁 (1) 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 + 𝑇𝑁 • Precision: 𝑇𝑃 (2) 𝑃= 𝑇𝑃 + 𝐹𝑃 • Recall: 𝑇𝑃 (3) 𝑅= 𝑇𝑃 + 𝐹𝑁 • F1-score: 2 × 𝑃𝑅 (4) 𝐹1 − 𝑆𝑐𝑜𝑟𝑒 = 𝑃+𝑅 The difference between Precision and Recall in the classification problem is that Recall gives us information about the performance of a classifier against false negatives (how many did we miss), while precision gives us information about its performance versus false positives (how many did we catch). 4.2. Datasets used Our experiments were carried out on two different datasets with different sizes and challenges. • First dataset: called MC4_Dataset (Size and average challenges): This image dataset was created by us from nine public image datasets (size of 35150 images). Then, we divided this dataset into 4 classes depending on the imaging modality used (MRI 7023 images, ultrasound 2116 images, CT-Scan 12988 images, X-ray: 12023 images). Each class is divided into subclasses according to the different anatomical organs (see Table 2). This dataset is created from different datasets and image sources presenting different degradations, resolutions, complexities, etc. This increases the challenges of our dataset which will be confronted with the developed models. Table 2 List of public datasets combined for each medical imaging modality IRM Ultrasound CT-Scan X-ray 7023 images 2116 images 12988 images 12023 images - Public Dataset Combination of two Combination of two Combination of three “Brain Tumor MRI datasets: datasets: datasets: Dataset” [16] 1) 780 medical images 1) 988 images of the 1) Chest X-ray Dataset: 5856 7023 Images of women's torso [19]. images [21]. breasts.[17] 2) 12,000 medical 2) Chest Xray Masks and 2)1336 ultrasound images of the kidneys Labels: 1600 images of images of the fetal [20]. human torsos [21]. head [18] 3) Figure-detection: 5567 hand images [21] This table contains an unbalanced dataset, ex. the Ultrasound subset (2116 images) is a minority class compared to the CT-scan class (12988 images). The significant difference between the sizes of the classes can destabilize the model, limit its generalization and/or cause overfitting. To reduce these problems, we balanced the dataset by oversampling each minority class to give it the same weight as the majority class, using data augmentation: image rotation, zooming, shifting, scaling and shearing (see section 4.3). Figure 4: Example of MRI images [16] Figure 5: Example of X-ray images, a) medical image of the torso, b) medical image of the hand [21] Figure 6: Example of ultrasound images a) medical image of a woman's chest, b) medical image of the fetal head by ultrasound [18] Figure 7: Example of CT-Scan image a) Medical image of the torso, b) medical image of the kidney acquired by CT scan [19]. • Second RSNA-MICCAI dataset (high size and challenges) The second used dataset, presents more challenges and more diversity compared to the first dataset. This is the public dataset “RSNA-MICCAI Brain Tumor Radio-genomic Classification” (size = 290923 images) [22]. It is used to test CNN combination in approach 3. The latter is divided into four classes of modalities (FLAIR, T1w, T1wCE and T2w). The following table summarizes the image distribution in this dataset according to the different classes. This dataset is balanced like the first dataset using data augmentation. Table 3 RSNA-MICCAI Brain Tumor Radio-genomic Classification dataset table [22]. Number of Size of resized Dataset Repartition Class images images FLAIR 50682 T1w 55440 224x224x3 Train T1wCE 68012 T2w 67722 RSNA Total 241856 MICCAI FLAIR 3000 PNG T1w 3000 224x224x3 Valid T1wCE 3000 T2w 3000 Total 12000 FLAIR 7926 T1w 8791 224x224x3 Test T1wCE 10482 T2w 9863 Total 37067 Total 290923 4.3. Data Augmentation To improve deep learning on small datasets, there is a technique called data augmentation. It consists of creating new training data samples by applying simple transformations to existing data, which increases the size of the available dataset in the aim to improve model training. It has several advantages for deep learning, it allows generalization of models, by exposing the model to a variety of transformations, it becomes more resilient to variations such. The transformations used in our study for the data augmentation are rotation, zooming, shifting, scaling, cropping, etc. By using data augmentation, it is possible to improve the performance of deep learning models in an extraordinary way. Data augmentation makes it possible to avoid overfitting on complex CNN architectures (with a large number of parameters) applied to a small dataset. It also makes it possible to offer better stability and model generalization on an unbalanced dataset like the MC4_Dataset. 4.4. Experiments and results: Classification of medical imaging modalities In order to study the relevance of deep learning for the classification of medical imaging modalities, we tested the three proposed approaches using different CNN models with transfer learning on our datasets. 4.4.1. Results of approach 1: Features extractor mode This approach was applied on each of six CNN models (VGG16, VGG19, ResNet-50, Inception 3, Xception and NASNet) already pre-trained on the ImageNet dataset. The dataset used is “MC4_Dataset” with data augmentation technique to overcome the problem of the small size in this dataset. The features extraction mode is applied with the adaptation of the Softmax classifier part to classification in 4 modalities (MRI, ultrasound, CT-scan and X-ray). The following table summarizes the results obtained by the six CNNs: Table 4 Comparative table of the results of different models in features extraction mode for the classification of modalities on the first MC4_Dataset dataset. Model Accuracy Precision Recall F1-Score VGG16 0.96 0.96 0.96 0.96 VGG19 0.94 0.95 0.94 0.95 ResNet-50 0.81 0.82 0.80 0.81 Inception 0.97 0.97 0.97 0.97 Xception 0.976 0.976 0.976 0.976 NASNet 0.99 0.99 0.99 0.99 According to this table, we see the effectiveness of the NASNet model compared to all the other models with a value of 0.99 for each of the evaluation metrics, then comes the Xception model with values less than the NASNet but very convincing values. 4.4.2. Results of approach 2: Fine Tuning mode The goal of this experiment is to determine whether the performance of the first five CNNs (VGG 16, VGG 19, ResNet50, Inception 3, XCeption) will be enhanced by the fine-tuning mode and make them also competitive compared to the NASNet CNN. Still working on the MC4_Dataset dataset with four output classes (MRI, ultrasound, CT-Scan and x-ray), we trained these five CNN models (already pre-trained on ImageNet) in fine-tuning mode on our medical image dataset. Each model is trained in 5 epochs with a batch-size of 100, where we have frozen a certain percentage of layers (the leftmost layers which are closer to the initial image), this process can be described as follows: • VGG16: By default, this model contains 5 convolution blocks, but in our case, we generated two models from the base model, the first was fixed at 80% (4 blocks out of 5) and the second was fixed at 60 % (3 blocks out of 5). • VGG19: In the same way as the VGG16 model, we generated two models from this model where the first was fixed at 80% and the second one at only 60%. • ResNet-50: This model contains 50 layers, while we generated only one single model from it, with a percentage of 80%. • Inception 3: We thus generated two models from this model which contains 48 layers, the first one was fixed at 90% (43 layers) and the other at 80% (38 layers). • XCeption: Based on its number of layers which is 71 layers, we thus generated two models from the base model, the first one we frozen 90% (64 layers) of its layers and the other at 80% (57 layers). The obtained results are displayed in this table: Table 5 Comparison of the results of the different models in fine-tuning mode for the classification of image modalities. Model Accuracy Precision Recall F1-Score VGG16 (60%) 0.42 0.37 0.47 0.41 VGG16 (80%) 0.94 0.95 0.94 0.94 VGG19 (60%) 0.46 0.46 0.46 0.46 VGG19 (80%) 0.96 0.96 0.96 0.96 Resnet 50 (80%) 0.84 0.87 0.82 0.84 Inception 3 (80%) 0.99 0.99 0.99 0.99 Inception 3 (90%) 0.99 0.99 0.99 0.99 XCeption (80%) 0.99 0.99 0.99 0.99 NASNet (100%) Features extraction mode 0.99 0.99 0.99 0.99 According to these results, we can see the effectiveness of the fine-tuning approach, and especially for the two models Inception 3 and XCeption which gave very high values in terms of accuracy, precision, recall and F1-Score (0.99) and which are also competitive with the NASNet architecture. 4.4.3. Results of approach 3: Stacking mode Given that the performance of the CNNs (approaches 1 and 2) on the MC4_Dataset dataset (average challenges) was 0.99, we considered that it was unnecessary to use the third approach (combination of CNNs) which was dedicated on all to higher challenge datasets. This is why we tested approach 3 only on the second dataset ("RSNA-MICCAI Brain" with 4 classes Flair, T1w, T1wCE, T2w) which presents greater diversity and difficulties. Initially we tested each of the six CNNs separately (using approach 2). Subsequently we combined three best architectures (Inception 3, XCeption and NASNet), relying on Stacking mode in order to achieve a more efficient model with very promising and convincing results which are displayed in the following table: Table 6 Comparison of the results of the different models in fine-tuning mode for the classification of modalities and combination of models. Model Accuracy Precision Recall F1-Score VGG16 0.58 0.75 0.24 0.37 VGG19 0.50 0.73 0.24 0.36 Resnet 50 0.63 0.70 0.41 0.51 Inception 3 0.84 0.89 0.81 0.85 XCeption 0.88 0.89 0.87 0.88 NASNet 0.86 0.91 0.82 0.87 Stacking Model 0.91 0.95 0.89 0.92 As we see in the table, the combination of models is efficient and gave us better results comparing to other models separately. 4.5. Experiments and results: Classification of anatomical organs For the classification of anatomical organs, we tested on the six CNN models, which are already pre-trained based on ImageNet dataset. The transfer learning technique was adopted in features extraction mode in order to increase the learning results, and to take advantage of the power of the ImageNet dataset. In this case, the MC4_Dataset dataset is subdivided into 8 classes of anatomical organs: human torso acquired by scanner, human kidney acquired by scanner, brain acquired by MRI, torso acquired by MRI, female chest acquired by ultrasound, fetal head acquired by ultrasound and finally of the torso and hand both acquired by x-ray. Table 7 Comparative table of the results of the different models in features extraction mode for the classification of anatomical organs Model Accuracy Precision Recall F1-Score VGG16 0.94 0.94 0.93 0.95 VGG19 0.64 0.51 0.38 0.78 Resnet 50 0.66 0.54 0.39 0.89 Inception 3 0.98 0.99 0.96 0.97 XCeption 0.98 0.99 0.97 0.98 NASNet 0.98 0.99 0.96 0.98 The obtained results mentioned in this table, showing the effectiveness of the three models (NASNet, Inception 3 and XCeption) well recognized in the literature, giving very convincing values. 4.6. Discussion In order to select the best architecture among the six architectures seen precisely, we adopted the F1-Score as the best performance comparison criterion, due to the costs of false positives and false negatives which differ in number, which leads to obtaining additional false positives (false alerts) rather than saving false negatives. In the first approach, we saw that the NASNet model outperformed the other models with an F1- score of 0.99, but it should be noted that the NASNet in features extraction mode did not have a high challenge dataset. In the second approach, we noticed that the results increased or decreased, depending on the number of layers frozen by the fine-tuning mode. The XCeption architecture beat the other models in fine-tuning by 80% where it improved its F1-score compared to the first approach in features extraction mode, this shows that the choice of fine-tuning levels had a significant impact to overcome the problem of the small size of the image database. The third approach which serves to combine the three CNN architectures (Inception 3, XCeption and NASNet) where the choice was justified by the best results obtained by these models on the basis of medical imaging of different types of MRI, this approach gave us very good results on a very high challenge dataset. The model that combines the three models improved the F1-score of the best performing model (XCeption) by 5%. This leads us to the conclusion that the deep learning stacking approach is very powerful on high challenge datasets. For the classification of anatomical organs, we noted the effectiveness of the two models (XCeption and NASNet) with the softmax classifier which had a score of 0.98. Considering the sufficient results, the application of approach three was not necessary. 5. Conclusion This work focuses on the classification of medical imaging modalities and the classification of anatomical organs. For this, six CNN architectures were tested and compared according to three different approaches, which we proposed. The objective was to explore deep transfer learning in two modes (features extraction and fine-tuning) and ensemble deep learning using the stacking technique which combines and complements several models (the best) CNNs. The experiments were carried out on two datasets with different challenges: unbalanced MC4_Dataset (created from nine existing datasets, medium size and challenges) and RSNA-MICCAI Brain (very high size and challenges). The experimental results showed that the NASNet architecture is very powerful compared to the other five models on small or medium challenge datasets. Its performance on challenges datasets with larger sizes is significantly increased when using combinations with other models. Overall, our approach represents a significant advancement in the classification of medical image modalities and anatomical organs via the use of deep transfer learning. These results open new perspectives for the automation and improvement of medical image analysis tools, thus contributing to the improvement of healthcare and medical decision-making. In future work, we plan to test stacking on higher challenge datasets by combining CNN models with ViT model. 6. References [1] P.K. Mall, P.K. Singh, S. Srivastav, V. Narayan, M. Paprzycki, T. Jaworska, M. Ganzha, A comprehensive review of deep neural networks for medical image processing: Recent developments and future opportunities, Healthcare Analytics 4 (2023) 100216. [2] A.S. Panayides, A. Amini, N.D. Filipovic, A. Sharma, S.A. Tsaftaris, A. Young, D. Foran, N. Do, S. Golemati, T. Kurc, K. Huang, K.S. Nikita, B.P. Veasey, M. Zervakis, J.H. Saltz, C.S. Pattichis, AI in Medical Imaging Informatics: Current Challenges and Future Directions, IEEE journal of biomedical and health informatics 24(7) (2020) 1837-1857. [3] F. Sica, G. Gobbi, P. Rizzoli, L. Bruzzone, F-Net: Deep Residual Learning for InSAR Parameters Estimation, IEEE Transactions on Geoscience and Remote Sensing 59(5) (2021) 3917-3941. [4] Y. Yu, H. Lin, J. Meng, X. Wei, H. Guo, Z. Zhao, Deep Transfer Learning for Modality Classification of Medical Images, Information, 2017. [5] I. Kim, S. Rajaraman, S. Antani, Visual Interpretation of Convolutional Neural Network Predictions in Classifying Medical Image Modalities, Diagnostics (Basel, Switzerland) 9(2) (2019). [6] S. Remedios, D.L. Pham, J.A. Butman, S. Roy, Classifying magnetic resonance image modalities with convolutional neural networks, Medical Imaging 2018: Computer-Aided Diagnosis, SPIE, 2018, pp. 558-563. [7] H. Takiyama, T. Ozawa, S. Ishihara, M. Fujishiro, S. Shichijo, S. Nomura, M. Miura, T. Tada, Automatic anatomical classification of esophagogastroduodenoscopy images using deep convolutional neural networks, Scientific reports 8(1) (2018) 7497. [8] C.H. Chiang, C.L. Weng, H.W. Chiu, Automatic classification of medical image modality and anatomical location using convolutional neural network, PloS one 16(6) (2021) e0253205. [9] N. Laribi, D. Gaceb, A. Benmira, S. Bakiri, A. Tadrist, A. Rezoug, A. Titoun, F. Touazi, A Progressive Deep Transfer Learning for the Diagnosis of Alzheimer’s Disease on Brain MRI Images, in: M. Salem, J.J. Merelo, P. Siarry, R. Bachir Bouiadjra, M. Debakla, F. Debbat (Eds.) Artificial Intelligence: Theories and Applications, Springer Nature Switzerland, Cham, 2023, pp. 65-78. [10] K. Atrey, B.K. Singh, N.K. Bodhey, R. Bilas Pachori, Mammography and ultrasound based dual modality classification of breast cancer using a hybrid deep learning approach, Biomedical Signal Processing and Control 86 (2023) 104919. [11] H. Jiang, Z. Diao, T. Shi, Y. Zhou, F. Wang, W. Hu, X. Zhu, S. Luo, G. Tong, Y.-D. Yao, A review of deep learning-based multiple-lesion recognition from medical images: classification, detection and segmentation, Computers in Biology and Medicine 157 (2023) 106726. [12] H.R. Roth, C.T. Lee, H.-C. Shin, A. Seff, L. Kim, J. Yao, L. Lu, R.M.J.I.t.I.S.o.B.I. Summers, Anatomy-specific classification of medical images using deep convolutional nets, (2015) 101-104. [13] F.R. Kolbinger, F.M. Rinner, A.C. Jenke, M. Carstens, S. Krell, S. Leger, M. Distler, J. Weitz, S. Speidel, S. Bodenstedt, Anatomy segmentation in laparoscopic surgery: comparison of machine learning and human expertise – an experimental study, International Journal of Surgery (9900). [14] S. Khan, S.P. Yong, A deep learning architecture for classifying medical images of anatomy object, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2017, pp. 1661-1668. [15] M. Khaled, D. Gaceb, F. Touazi, A. Otsmane, F. Boutoutaou, Progressive and Combined Deep Transfer Learning for pneumonia diagnosis in chest X-ray images. IDDM’2022: 5th International Conference on Informatics & Data-Driven Medicine, 2022, pp. 160-173. [16] M. Nickparvar, Brain Tumor MRI Dataset, 2021. https://www.kaggle.com/dsv/2645886, https://doi.org/10.34740/KAGGLE/DSV/2645886 [17] W. Al-Dhabyani, M. Gomaa, H. Khaled, A. Fahmy, Dataset of breast ultrasound images, Data in Brief 28 (2020) 104863. [18] Fetal Head UltraSound Dataset For Image Segment, 2023. https://www.kaggle.com/datasets/ankit8467/fetal-head-ultrasound-dataset-for-image-segment. [19] Chest CT-Scan images Dataset, 2020. https://www.kaggle.com/datasets/mohamedhanyyy/chest-ctscan-images. [20] MD NAZMUL ISLAM & Md Humaion Kabir Mehedi. (2021). CT KIDNEY Dataset, https://www.kaggle.com/datasets/nazmul0087/ct-kidney-dataset-normal-cyst-tumor- and-stone. [21] L. Rubini, Soundarapandian,P., and Eswaran,P.. Chronic_Kidney_Disease, 2015. [22] D.S. Kermany, M. Goldbaum, W. Cai, C.C.S. Valentim, H. Liang, S.L. Baxter, A. McKeown, G. Yang, X. Wu, F. Yan, J. Dong, M.K. Prasadha, J. Pei, M.Y.L. Ting, J. Zhu, C. Li, S. Hewett, J. Dong, I. Ziyar, A. Shi, R. Zhang, L. Zheng, R. Hou, W. Shi, X. Fu, Y. Duan, V.A.N. Huu, C. Wen, E.D. Zhang, C.L. Zhang, O. Li, X. Wang, M.A. Singer, X. Sun, J. Xu, A. Tafreshi, M.A. Lewis, H. Xia, K. Zhang, Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning, Cell 172(5) (2018) 1122-1131.e9. [23] U. Baid, S. Ghodasara, S. Mohan, M. Bilello, E. Calabrese, E. Colak, K. Farahani, J. Kalpathy-Cramer, F.C. Kitamura, S.J.a.p.a. Pati, The rsna-asnr-miccai brats 2021 benchmark on brain tumor segmentation and radiogenomic classification, (2021).