Progressive and Combined Deep Transfer Learning for
pneumonia diagnosis in chest X-ray images
Mamar Khaled1, Djamel Gaceb1, Fayçal Touazi1, Ahmed Otsmane1 and Farouk Boutoutaou1
 1 LIMOSE Laborator, University M'Hamed Bougara of Boumerdes, Boumerdès, 35000, Algeria


             Abstract
             Pneumonia is a life-threatening disease that occurs in the lungs and is caused by a bacterial or
             viral infection. However, it is very difficult to diagnose it by simply looking at chest x-rays,
             because it is necessary to improve diagnostic accuracy. This study aims to simplify the
             process of detecting and classifying pneumonia for both experts and patients, using a dataset
             containing 5247 CXR images. Five different pretrained CNNs: AlexNet, VGG-16, ResNet-
             50, DenseNet-121 and InceptionV3 were used separately or together for transfer learning in a
             progressive way. Firstly, they are pretrained on ImageNet dataset, and secondly, on a
             radiographic images which concerns another disease (available in medium size with a nature
             close to our base). These models are refined according to different fine-tuning levels and
             strategies. A weighted classifier-based approach is introduced to combine their weighted
             prediction. The results obtained show the possibility of moving easily from the classification
             of a disease to another using a progressive transfer learning, which has a limited number of
             images by taking advantage of the knowledge already acquired on another very large base.

             Keywords
             Pneumonia, deep learning; progressive transfer learning; medical image processing;
             computer-aided diagnosis.

1. Introduction
   Medical imaging has a key role in the detection and classification of diseases. Although chest x-
rays (CXR) have lower resolution than magnetic resonance imaging (MRI) or computed tomography
(CT), and they adopt low-cost and easy to-use technology. Pneumonia is an acute respiratory infection
that affects the lungs. This disease is considered the leading cause of infant mortality worldwide.
About 1.4 million children die of pneumonia every year, or 18% of the total number of children who
die under the age of five. There are mainly two types of pneumonia: viral and bacterial. Generally,
viral pneumonia tends to be mild while bacterial pneumonia is more severe, especially in children.
Inflammations seen as white patches on chest x-ray are due to various abnormalities such as
pneumonia, tuberculosis, pneumothorax, pleural effusion, etc. hence the ambiguity in the diagnosis of
pneumonia from chest X-ray images, either by radiologists or by computer-aided diagnosis (CAD)
systems. This article studies the application of deep transfer learning in medical imaging to help
diagnose pneumonia, it is organized as follows: Section 2 expose some related works; Section 3
explains the proposed method; section 4 details the experiments and results performed on the system,
and Section 5 presents the discussion and conclusions.

2. Related works
   Computer-aided diagnosis (CADe or CADx) has become very popular these days. Various
methods and works have been proposed in recent years to improve the precision of detection of
pulmonary diseases on CXRs.


    IDDM-2022: 5th International Conference on Informatics & Data-Driven Medicine, November 18–20, 2022, Lyon, France
2.1.    Machine learning based methods
    One of the primary use cases for machine learning in healthcare is in the early detection and
effective diagnosis of disease. Certain diseases such as hereditary or genetic disorders and certain
types of cancer are difficult to identify at an early stage by a specialist. However, the use of intelligent
computing solutions through machine learning can well detect these. These solutions have continued
to evolve using innovative learning algorithms on datasets representing different diseases with
varying complexity. In general, these systems consist of two main parts: a discriminant feature
extractor (which must be chosen by computer vision and health experts, or dedicated algorithms) and
a classifier (based on machine learning using a representative image base).
    The authors of [1] and [2], used the Random Forest classifier (RF), the first was devoted to colon
cancer, leukemia, breast cancer and lung cancer, while the second was devoted to breast cancer. The
RF classifier was combined with a most relevant features selector where they obtained better
classification rates by comparing their model with 15 classifiers. Several medical diagnostic works are
based on the SVM classifier, such as the work of El-Naqa et al. [3] which reached a rate of 94% for
the detection of microcalcification (MC) clusters in digital mammograms. Ghumbre et al. [4] have
combined the SVM classifier with a Sequential Minimal Optimization (SMO) algorithm for
diagnosing heart disease. Chaurasia and Pal [5], compared different classification techniques (naive
Bayesian classifiers, non-linear SVM with RBF kernel, RBF neural networks and decision trees) for
the diagnosis of breast cancer. The results obtained showed the efficiency of the SVM-RBF classifier
with an accuracy of 96.84%. Christobel and Sivaprakasa [6] have obtained an accuracy of 97.13%
using SVM classifier, which was compared to the Naive Bayes, KNN classifiers.
    The detection of pneumonia by chest X-ray has been an open problem for many years, the main
limitation being the scarcity of publicly available data. Yao and al. [7] achieved an accuracy of 80%
in the pneumonia detection task with the SVM classifier. While Naydenova and al. [8] compared the
SVM classifier with RF and Logistic Regression for automated detection of childhood pneumonia in
resource-constrained settings. Chandra and al. [9] developed an approach based on Multi Layers
Perceptron (MLP), Random Forest (RF), sequential minimal optimization (SMO), regression
classification and logistic regression for the pneumonia detection on Chest Xray. They segmented
lung regions from chest X-ray images and extracted eight (08) statistical features from these regions.
They obtained an accuracy rate of 95.39% for the MLP classifier. Kuo et al. [10], used 11 features to
detect pneumonia in 185 schizophrenic patients, using decision tree, SVM and logistic regression
(LR) classifiers. The highest accuracy rate (94.5%) was for the decision tree classifier. Along the
same lines, Yue et al. [11] used 6 features with LR and RF classifiers for predicting hospital stay in
patients with pneumonia associated with SARS-CoV-2 infection; the best AUC value they got was
97%. Sousa and al. [12] used a pneumonia detection algorithm with five classifiers KNN, Naive
Bayes, Multi Layers Perceptron, Decision Tree and SVM, combined with different dimensionality
reduction techniques. Their algorithm achieved an accuracy of 96%.

2.2.    Deep learning-based methods
    Deep learning has quickly established itself as a standard in several fields, often with better
performance compared to conventional machine learning algorithms (presented previously). Recently,
a number of researchers have come up with different artificial intelligence (AI) based solutions for
different medical problems. Convolutional Neural Networks (CNNs) have enabled researchers to
achieve positive results in broad medical problems such as detection of thyroid nodules, breast cancer,
detection and segmentation of brain tumors, classification of diseases in x-ray images, etc. The
authors of [13] proposed a hybrid method for the diagnosis of thyroid nodules on ultra-sound images.
They combined two pretrained CNNs with different convolutional layers and different fully connected
layers. The features maps of the two networks obtained after refinement are merged and used as input
to the softmax classifier. This method has been validated on 15,000 ultrasound images. Gulshan and
al. [14] applied a DCNN for the automatic detection of diabetic retinopathy (DR) on the EyePACS-1
dataset with more than 10,000 images. Haloi [15] implemented a five-layer CNN with drop-out
mechanism for early-stage DR detection on two different datasets (Retinopathy Online Challenge
(ROC) and Massidor). Rakhlin and al [16] classified images of H&E stained breast tissue. For each
image, 20 crops of 400×400 pixels and 650×650 were extracted. Then, the pretrained ResNet-50,
InceptionV3 and VGG-16 networks were used as features extractors. Then, they were combined into a
single features vector. A Light GBM classifier with 10-fold cross validation was used to classify deep
features. Van and al. [17], used the CNN InceptionV3 model for multi-class classification of breast
cancer images. Due to the voluminousness histology images, they used Inception V3 for patch-level
classification. The predictions were then passed through an ensemble fusion framework involving
majority voting, their proposed ensemble classifier included Gradient Boosting Machine (GBM) and
Logistic Regression to get the final prediction per frame. The refinement achieved an accuracy of
87.50%.
    In the article [18], the authors carried out a comparative study of different CNN networks in the
diagnosis of pulmonary diseases. They collected a sample of 357 images of the different diseases, and
a healthy sample (100 images), and having analyzed 38 non-image related features, including
complaints of cough, weakness, chest pain and high body temperature. The neural network with the
Levenberg Marquardt algorithm achieved the highest accuracy for the detection of pneumonia
(91.67%). In 2018, some researchers highlighted a visualization technique in conjunction with CNNs
to locate and detect Regions Of Interest (ROIs) that can be used to identify pneumonia and distinguish
between bacterial and viral types in pediatrics. In this sense, we find the work of the authors of [19]
who evaluated the performance of different CNN architectures: Sequential CNN, Inception CNN,
Residual CNN and VGG16 on a dataset of 5232 pediatric chest radiographs. Their approach was
based on a visualization technique to define the ROI. The customized VGG16 achieved 96.2%
accuracy in pneumonia detection and 91.8% in its classification. Xianghong and al. [20] used a 08-
layer FCN model with transfer learning to segment anatomical lung regions. The model was trained
and tested on both JSRT (241 images) and MC (138 images) datasets. After segmentation, an AlexNet
DCNN model was used to classify the lung regions. A binary classification [21] was performed using
the SVM-RBF kernel by combining DCNN and manual functionalities. Features extraction by DCNN
with transfer learning achieved accuracy (0.8048 ± 0.0202) and better sensitivity (0.7755 ± 0.0296).
Rajpurka and al. [22] proposed the Chexnet algorithm for pneumonia detection and tested its accuracy
against 4 radiologists, using an F1-score metric. The algorithm worked as the same as an experienced
radiologist; however, it exceeded the performance of an average radiologist. Jaiswal and al. [23]
proposed an identification model inspired by the Mask-RCNN model that included critical
modifications to the training process and a new post-processing step that merges the bounding boxes
of multiple models. Ayan and al. [24] used two CNN network models: Xception and VGG16 for the
diagnosis of pneumonia on the Kermany dataset (5856 images). Relying on transfer learning and fine-
tuning on the Xception model, hence the use of pretrained ImageNet weights before the start of
training and the last 10 first layers were frozen. The test results showed that the Vgg16 network
outperforms the Xception network. Varshani and al. [25] used ResNet50, DenseNet-121 and
DenseNet169 as the optimal CNN models for the features extraction step and different classifiers such
as RF, SVM, etc. The best results favored the SVM classifier. In 2020, Vikash and al. [26], used
AlexNet, DenseNet121, Inception V3, GoogLeNet and Res-Net18 pe-trained on the ImageNet basis.
Then they came up with an ensemble model that combines the outputs of all these models into a
prediction vector, and a majority vote is used to choose the final prediction. This combination
achieved a test accuracy of 96.39%, with an area under the ROC curve of 99.34% and a sensitivity of
99.62%. Rahman and al. [27] attempted to automatically diagnose different classes of pneumonia
(bacterial and viral), on the Kaggle Chest X-Ray pneumonia dataset, it includes 5247 X-ray images.
    Data training is done using the pretrained algorithms: AlexNet, ResNet18, DenseNet201 and
SqueezeNet. They observed that DenseNet201 outperforms the other three CNNs, achieving 98%
accuracy in pneumonia detection and accuracy of 93.3% in the differentiation between the two
etiological variants. Hashmi and al. [28], presented a method combining Five CNNs: ResNet18,
Dense-Net121, InceptionV3, Xception and MobileNetV2 for the automatic pneumonia detection,
using data augmentation, after that the five CNNs were refined to pneumonia classification. Then the
predictions were combined, using a weighted classifier to calculate the final prediction. Their model
achieved an accuracy of 98.857%, and furthermore, a high F1-score of 99.002 and an AUC score of
99.809.
   In 2021, the authors of [29], proposed a novel end-to-end Deep Transfer Learning framework
using deep convolutional neural network that detects and classifies three types of pneumonia from
chest X-ray scans. In the same year, Alqudah and al. [30] constructed two hybrid models, namely
CNN-KNN and CNN-SVM, while using a 10-fold cross-validation methodology. The proposed CNN
has been trained, validated and tested using a large chest X-ray image dataset of 5852 images. The
hybrid CNN-KNN model achieved an accuracy of 94.03%, while the CNN-SVM model achieved an
accuracy of 93.9%. In the work of [31], we meet a deep learning-based approach to diagnose and
classify pneumonia from Chest X-ray images using transfer learning based on three pre-trained
architectures (ResNet50, InceptionV3 and InceptionResNetV2).
    Polat and al. [32] proposed a classification approach based on extracting relevant features from
digital chest radiographs. They used a binary CNN and a three-class CNN to detect pneumonia in
5840 pediatric CXR images. They also used a minimum distance classifier for classification. They
conducted three different parametric studies and found that the proposed method achieved 100%
accuracy in detecting pneumonia, 92% in distinguishing between two types of pneumonia, and 90% in
distinguishing normal, bacterial, or viral pneumonia.

3. Proposed approach
    CNNs perform better on large datasets; however, our target dataset is very small. This requires the
use of data augmentation step to enrich the small base and the transfer learning strategy to take
advantage of the consistency of a pretrained network on a largest dataset. This strategy can improve
significantly the classification rates.
    The proposed approach is based on transfer learning of five CNN architectures (AlexNet, ResNet-
50V2, DenseNet-121, VGG-16 and InceptionV3) for pneumonia detection from chest x-rays. To
solve the dataset size issue, we also propose a new strategy of transfer learning, called progressive
transfer learning, that adopts a pretrained CNN with a large amount of acceptable medical images
(See Figure 1, 2). In order to evaluate the performance of the proposed approach, three transfer
learning approaches have been developed, using different strategies (See Figure 1, 2, 3 and 4). First,
the five models are pretrained on the ImageNet dataset, and second, are pretrained and refined on an
intermediate Tuberculosis image database (Kermany dataset). Then we reused the pretrained model
progressively on these two datasets to learn how to classify pneumonia on our dataset (Curated Xray
dataset). The third approach consists of combining the five CNN models, to take advantage of the
performance of each approach.

   1.   Approach 1: Direct transfer learning from ImageNet to Curated Xray dataset:


Figure 1: Direct transfer learning from ImageNet to Curated Xray dataset.
   2. Approach 2: Progressive transfer learning, form ImageNet to Kermany dataset, then to
   Curated Xray dataset


Figure 2: Progressive transfer learning, form ImageNet to Kermany dataset, to Curated Xray dataset

   3. Approach 3: The combined architecture (called AVRDIS) is based on the combination by
   weighting of the five CNN models (AlexNet, VGG-16, ResNet-50, DenseNet-121 and
   InceptionV3) using the Softmax classifier (See Figure 3).

                                         Alexnet       P1
                                                              w1


                                        VGG-16         P2
                                                              w2


                                         ResNet        P3                            E
                                                              w3

                                                       P4             Weighted classifier
                                        DenseNet              w4


                                       InceptionV3     P5
                                                              w5


Figure 3: Weighted classifier model of proposed combined approach.

   After feature extraction, different supervised classifiers (SVM, RF, Gaussian Naïve Bayes : GNB
and softmax) were used for the pneumonia classification task in combination with CNN part of
feature extraction. The proposed method is carried out in three steps:
         the first consists in extracting the features of the radiological images of pneumonia; such
            that the latter are passed as inputs to the five CNN models, the output is a feature vector
            which depends on this image.
         The second step consists in passing this feature vector as input of one of the four
            classifiers : SVM, RF, GNB and softmax.
         For the last step, the predictions of these models were combined, using a weighted
            classifier to calculate the final prediction.
   In summary, this strategy consists in passing as input for each architecture preprocessed
radiological images of pneumonia, then providing as output the following predictions: PNEUMONIA
or NORMAL for the first base (binary classification); Bacterial Pneumonia, Viral Pneumonia, Normal
and Covid-19 for the second base (4-class multiple classification).
   Weighted average assembly is a powerful classifier fusion mechanism. However, the choice of
weights to be assigned to the respective basic learners plays a central role in ensuring the success of
the whole. In this work, we adopted a weighted mean ensemble technique for better classification. To
find the best combination of weights that gives the maximum precision, we have implemented a
method that consists of calculating the final precision for each combination of weights, and the
weights with the best precision found will be returned.
   A weight (wk) takes a value between 0 and 1, each model, after being refined, returned the
probabilities for each class label, i.e. 2 classes in the form of a matrix (P). The weights (wk) are
multiplied by the corresponding basic learners probabilities Pk to calculate the weighted average
probability ensemble, as shown in Equation (1).

                                             5
                                       E  Wk  Pk                 (1)
                                            k 1


   The convolution layers of the five CNNs can be trained according to the following three transfer
learning strategies (See Figure 4).
        Strategy 1: Using a pretrained model as a feature extractor, and for that one has to freeze
            all the convolution blocks to keep the convolution part of CNN in its original form, and
            then use its outputs to feed an updated classifier, this strategy is generally used in the case
            of small databases where there is a lack of calculation.
        Strategy 2: Building on the idea that the lower layers of a CNN refer to general features
            (problem independent), while the upper layers refer to specific features (problem-
            dependent), it serves to replace the last layer Fully-Connected by the new randomly
            initialized classifier, and we fix the parameters of the last convolution block of the
            pretrained network.
        Strategy 3: It has the same principle as the previous strategy, except that instead of training
            only one block, the last two convolution blocks are trained. The AlexNet architecture is
            not affected by this strategy because it contains only 5 convolution layers.


Figure 4: The three Training strategies.
4. Experimentations and results
4.1. Datasets
    To evaluate the proposed approach, we used two different datasets: 1) Kermany dataset containing
5840 images labeled in two categories (Normal: healthy, and pneumonia: viral or bacterial). This
dataset is subdivided into 90% for training task and 10% for test task. 2) Curated Xray dataset with a
total of 18417 images labeled in 4 categories (Normal: healthy, Bacterial Pneumonia, viral
Pneumonia, COVID19), subdivided into 91% for training task and 9% for test task.


Figure 5: Curated Xray image samples [33].

4.2.    Evaluation metrics
   The proposed models were evaluated by using some evaluation metrics such as accuracy, recall,
precision, and f-score. The metrics formulas are given below:
         Accuracy =(TP+TN)/(TP+FN+TN+FP)                          (2)
         Precision =TP/(TP+FP)                                    (3)
         Recall =TP/(TP+FN)                                       (4)
         F1-Score=2 ×(precision × recall) / (precision + recall)  (5)

   TP, TN, FN, FP represents the number of true positive, true negative, false negative, false positive
respectively.

4.3. Results of approach 1 : Direct transfer learning method from ImageNet
dataset
4.3.1. To Kermany dataset
   Tables 1 and 2 summarize the results obtained by the 5 models without and with image
augmentation respectively using the Kermany database, including the transfer of learning based on
ImageNet dataset and refined with the three fine-tuning strategies quoted previously. Expressed in
terms of accuracy, precision, recall and F1-score. And we notice that the rates obtained with image
augmentation surpass the rates obtained without data augmentation. These results confirm that this
methodology is relevant to apply to our problem. The parameters used in data augmentation are:
(Rotation = 20°, Width shift = 0.1, Shear = 0.2, Zoom=0.2).
Table 1
Models performances without data augmentation
                                                                Strategy One
                                               Accuracy       Precision     Recall       F1-score
                           VGG-16               75.00           71.99       98.20         83.08
                        ReseNet-50V2            63.30           63.04       99.74         77.25
                        DenseNet-121            80.28           76.43       98.97         86.25
                         InceptionV3            82.37           79.04       97.69         87.38
                           AlexNet              80.60           77.28       97.96         86.29
                                                                Strategy Two
                                               Accuracy       Precision     Recall       F1-score
                           VGG-16               81.57           77.44       99.48         87.09
                        ReseNet-50V2            79.32           77.35       94.61         85.12
                        DenseNet-121            78.06           74.37       99.74         85.21
                         InceptionV3            79.80           75.88       99.23         86.00
                           AlexNet              78.52           74.71       99.23         84.24
                                                               Strategy Three
                                               Accuracy       Precision     Recall       F1-score
                           VGG-16               75.48           71.90       99.74         83.56
                        ReseNet-50V2            78.68           75.04       98.71         85.27
                        DenseNet-121            80.44           76.27       99.74         86.44
                         InceptionV3            80.28           76.43       98.97         86.25
                           AlexNet                -               -           -              -
Table 2
Models performances with data augmentation
                                                            Strategy One
                                            Accuracy      Precision     Recall       F1-score
                           VGG-16            81.89          80.17        94.5         86.69
                        ReseNet-50V2         90.86          89.92       96.15         92.93
                        DenseNet-121         85.89          84.31       95.12         89.39
                         InceptionV3         84.45          83.67       93.33         88.24
                           AlexNet           86.69          88.08       91.02         89.53
                                                            Strategy Two
                                            Accuracy      Precision     Recall       F1-score
                           VGG-16            91.34          90.19       96.66         93.31
                        ReseNet-50V2         93.42          92.45       97.43         94.55
                        DenseNet-121         89.74          88.26       96.41         92.15
                         InceptionV3         93.38          88.01       97.94         92.71
                           AlexNet           85.09          81.13       99.23         89.27
                                                           Strategy Three
                                            Accuracy      Precision     Recall       F1-score
                           VGG-16            91.98          89.90       98.20         93.87
                        ReseNet-50V2         92.94          93.46       95.38         94.41
                        DenseNet-121         89.42          87.15       97.43         92.00
                         InceptionV3         94.87          94.08       97.94         95.97
                           AlexNet             -               -          -              -


   Tables 3 and 4 show the performances of different classifiers (Softmax, SVM, RF and GNB ) on
the Kermany dataset (Table 3 without data augmentation and Table 4 with data augmentation). This
experiment allowed us to evaluate the impact of the choice of classifier type on the performance of
the network. we can notice that machine learning algorithms (such as SVM and RF) perform better
than a simple supervised softmax classifier.
Table 3
Classifiers performances (Softmax, SVM, RF and GNB ) with data augmentation.
                                                                 Strategy One
                                              Accuracy        Precision       Recall    F1-score
                      VGG-16 (Softmax)         75.00            71.99         98.20      83.08
                       VGG-16 (SVM)            83.79            86.12         83.79      82.67
                        VGG-16 (RF)            84.92            86.84         84.92      84.04
                       VGG-16 (GNB)            85.25            85.14         85.25      85.16
                                                                 Strategy Two
                                              Accuracy        Precision       Recall    F1-score
                      VGG-16 (Softmax)         81.57            77.44         99.48      87.09
                       VGG-16 (SVM)            88.16            89.09         88.16      87.74
                        VGG-16 (RF)            86.87            88.30         86.70      86.25
                       VGG-16 (GNB)            86.22            87.56         86.22      85.57
                                                                Strategy Three
                                              Accuracy        Precision       Recall    F1-score
                      VGG-16 (Softmax)         75.48            71.90         99.74      83.56
                       VGG-16 (SVM)            88.16            87.68         88.16      89.41
                        VGG-16 (RF)            86.54            87.40         86.54      86.04
                       VGG-16 (GNB)            87.84            88.74         87.84      87.40


Table 4
Classifiers performances (Softmax, SVM, RF and GNB) with data augmentation

                                                                Strategy One
                                          Accuracy       Precision        Recall       F1-score
                   VGG-16 (Softmax)        81.89           80.17          94.35         86.69
                    VGG-16 (SVM)           88.00           88.15          88.00         87.78
                     VGG-16 (RF)           86.78           87.02          86.87         86.60
                    VGG-16 (GNB)           79.90           93.09          79.90         80.22
                                                                Strategy Two
                                          Accuracy       Precision        Recall       F1-score
                   VGG-16 (Softmax)        91.34           90.19          96.66         93.31
                    VGG-16 (SVM)           90.11           90.62          96.11         89.86
                     VGG-16 (RF)           90.92           91.03          90.92         91.03
                    VGG-16 (GNB)           89.14           89.33          89.14         89.14
                                                               Strategy Three
                                          Accuracy       Precision        Recall       F1-score
                   VGG-16 (Softmax)        91.98           89.90          98.20         93.87
                    VGG-16 (SVM)           93.35           93.39          93.35         93.90
                     VGG-16 (RF)           92.54           92.52          92.54         92.51
                    VGG-16 (GNB)           92.41           91.87          91.41         91.48


4.3.2. To Curated Xray dataset
   Table 5 displays the found metric values of the five models for the three strategies, with data-
augmentation. This experiment was conducted with the aim of evaluating our 5 models on a broader
basis and with more classes (4 classes including viral, bacterial, Covid and normal pneumonia). The
results obtained by the models (using previous experience) with image augmentation using the
Curated X-ray image dataset are summarized in Table 5. The DenseNet model outperformed all other
models with an F1-score equal to 92.49%.
Table 5
Models performances with data augmentation on Curated Xray dataset .
                                                         Strategy One
                                         Accuracy    Precision        Recall    F1-score
                        VGG-16            72.79        73.50          68.27      64.00
                     ReseNet-50V2         84.06        84.00          81.75      82.50
                     DenseNet-121         82.47        81.25          80.50      80.50
                      InceptionV3         90.07        89.50          88.75      89.25
                        AlexNet           79.65        79.00          77.50      80.00
                                                         Strategy Two
                                         Accuracy    Precision        Recall    F1-score
                        VGG-16            87.99        87.50          87.00      87.00
                     ReseNet-50V2         91.91        92.25          90.50      91.00
                     DenseNet-121         88.23        88.25          86.25      86.75
                      InceptionV3         89.95        89.25          90.23      89.90
                        AlexNet           81.86        81.75          78.75      82.00
                                                        Strategy Three
                                         Accuracy    Precision        Recall    F1-score
                        VGG-16            87.13        88.00          83.75      84.50
                     ReseNet-50V2         87.13        88.00          83.75      84.50
                     DenseNet-121         87.75        87.61          97.94      92.49
                      InceptionV3         88.84        89.00          86.50      87.25
                        AlexNet             -            -              -           -


4.4.   Results of approach 2: Progressive transfer learning
   Table 6 shows the models performance using progressive transfer learning on the Kermany dataset
to Curated Xray dataset, with data augmentation (using three fine-tuning strategies).

Table 6
Models performances, using progressive transfer learning on the Kermany dataset to Curated Xray
dataset, with data augmentation.

                                                         Strategy One
                                         Accuracy    Precision       Recall    F1-score
                         VGG-16           82.53        83.53         89.74      86.52
                      ReseNet-50V2        87.50        94.31         85.12      89.48
                      DenseNet-121        89.42        87.85         96.41      91.93
                       InceptionV3        82.37        87.43         83.84      85.60
                         AlexNet          90.70        87.89         98.71      92.99
                                                         Strategy Two
                                         Accuracy    Precision       Recall    F1-score
                         VGG-16           90.06        87.27         98.46      92.53
                      ReseNet-50V2        93.58        96.79         92.82      94.76
                      DenseNet-121        93.10        92.00         97.43      94.64
                       InceptionV3        91.50        92.87         93.58      93.23
                         AlexNet          92.78        92.17         99.66      94.36
                                                        Strategy Three
                                         Accuracy    Precision       Recall    F1-score
                         VGG-16           93.42        94.62         94.87      94.75
                      ReseNet-50V2        92.94        96.01         92.56      94.25
                      DenseNet-121        88.30        85.14         98.46      91.31
                       InceptionV3        91.34        94.21         91.79      92.98
                         AlexNet            /            /             /           /
4.5.    Comparison between the two approaches
   The tables presented above allow us to see how the right choice of the training dataset and the
transfer of learning helps to improve performance. As well as the increase in data that has played a
crucial role in this improvement and confirms all its performance in the medical field where available
data is scarce. Table 7 shows a comparison of the two approaches (progressive TL and direct TL)
using F1-score metric. These results show that progressive transfer learning (Approach 2) offers better
performance compared to direct transfer learning from ImageNet (Approach 1).

Table 7
Comparison between the two approaches, based on F1-Score metric.
                                          Progressive (%)      ImageNet (%)
                             AlexNet          94,36               89,53
                             VGG-16           94,75               93,87
                              ResNet          94,76               94,55
                            DenseNet          94,64               92,15
                           InceptionV3        93,23               95,97

4.6.    Approach 3 : AVRDIS Combined model
    Approach 3 consists of combining the 5 models. This makes it possible to assess the presence of
the complementarities of different models in the pneumonia classification. The following table (Table
8 and Figure 6), shows a comparison of the results of approach 3 (called AVRDIS, See Figure 2) and
the results of each of the 5 models (with its best configuration). This comparison showed the
superiority of the approach based on the combination of models. This confirms the presence of
complementarities between the 5 models. This combination gave the best performance and achieved a
test accuracy of 96.79%, and an F1-score equal to 97.44%. Table 8 shows the results for each model
as well as for the proposed method.

Table 8
Comparison of the different architectures with the AVRDIS.
                              Accuracy (%)     Precision (%)     Recall (%)   F1-score (%)
               VGG-16            93,42             94,72           94,87         94,75
                ResNet           93,58             96,76           92,86         94,76
              DenseNet           93,10             92,00           97,43         94,64
             InceptionV3         94,87             94,08           97,94         95,97
               AlexNet           92,87             92,17           99,66         94,36
               AVRDIS            96,15             95,52           98,46         96,96
Figure 6: ROC curves obtained via the five models, and the zoom on the upper part.


4.7. Comparison of the AVRDIS approach to existing approaches in the
literature
   Table 9 gives us a comparison of the combined model (AVRDIS) with some existing methods
using the metrics mentioned above. It confirms the superiority of the proposed approach. This final
model is the most relevant to apply it to our problem with an accuracy of 96.76%, a sensitivity of
97.47% and an F1-score of 97.44%.

Table 9
Comparison of the different existed architectures with the proposed approach AVRDIS.

                                       Accuracy (%)    Precision (%)    Recall (%)   F1-score (%)

           Ayan and al. [24]              87,00           91,00           94,00         84,00
           Polat and al. [32]              92                -              -             -

          Alqudah and al.[30]             94,03           94,22           96,68           -

           Vikash and al.[26]              96             93,28           99,62           -

      Proposed method (AVRDIS)            96,15           95,52           98,46         96,96


5. Conclusion
   This paper deals with the problem of the classification of pneumonia on radiographic images, since
they are considered the main cause of infant mortality in the world. The goal was to develop a robust
and automatic approach, based on deep learning. In this context, we adopted the transfer learning
approach and used a combination of five pre-trained architectures: AlexNet, VGG-16, ResNet,
DenseNet121 and InceptionV3, initially trained on the ImageNet dataset which consists of 14 million
natural images. Our experiments revealed that the scenario of progressive transfer learning from the
tuberculosis image dataset to classify pneumonia improves the performance of the studied models. In
addition, a comparative study of supervised softmax classifier with different classifiers showed that
these latter offered better results thanks to the power of RF, SVM and Naive Bayes classifiers
combined with CNN models. Finally, we used a model combining five pretrained CNN models. The
latter outperformed all other models with a good performance in terms of F1-score (97.44%) and
accuracy (96.76%). Although many methods have been developed to work on this dataset, the
proposed methodology has obtained better results. We observed that performance could be further
improved by increasing the size of the databases using a data augmentation approach and and by
using other more advanced models, such as Vision Transformers (ViT) without or with combination
with a CNN. The coupling with a metaheuristic approach (like PSO) would allow better piloting and
optimization of the combination of classifiers. This will pave the way for our future works to explore
this line of research.

6. References
[1] A. Ozcift, Enhanced Cancer Recognition System Based on Random Forests Feature Elimination
     Algorithm, Journal of medicalsystems, 2011, Vol. 36, Num 4, pp. 2577-2585
[2] C. Nguyen, Y. Wang, H.N. Nguyen; Random Forest classifier combined with feature selection
     for breast cancer diagnosis and prognostic. 2013, journal of Biomedical Science and
     Engineering. Vol 06, pp. 551-560.
[3] I. El-Naqa, Y. Yang, M. Wernick, N. Galatsanos, R. Nishikawa, A support vector machine
     approach for detection of micro calcifications, IEEE Transactions on Medical Imaging. Vol. 21,
     pp. 1552-1563.
[4] S. Ghumbre, C. Patil, A. Ghatol, Heart Disease Diagnosis using Support Vector Machine. 2011.
[5] V. Chaurasia and S. Pal, A Novel Approach for Breast Cancer Detection using Data Mining
     Techniques. 2017, International Journal of Innovative Research in Computer and
     Communication Engineering. Vol. 2, issue 1.
[6] A. Christobel et Y. Sivaprakasa, An Emperical Comparison of Data Mining Classification
     Methods. 2011, journal, international journal of computer information systems Vol. 3, n° 2.
[7] J. Yao, A. Dwyer, R. Summers; Computer-aided diagnosis of pulmonary infections using texture
     analysis and support vector machine classification. 2011, journal of Academic Radiology. Vol.
     18, pp 306-314.
[8] E. Naydenova, A. Tsanas, C. Casals-Pascual, M. De Vos, Smart diagnostic algorithms for
     automated detection of childhood pneumonia in resource-constrained settings. 2015, IEEE
     Global humanitarian Technology Conference. pp. 377–384.
[9] T. B. Chandra and k. Verma, Pneumonia Detection on Chest Xray Using Machine Learning
     Paradigm. 2020, Actes de la 3e conference internationale sur la vision par ordinateur et le
     traitement d’images. p.2133
[10] K.M. Kuo, P.C. Talley, C.H Huang. L.C Cheng, Predicting Hospital-acquired pneumonia among
     schizophrenic patients: a machine learning approach. 2019, PMC article, PMID : 30866913, DOI
     : 10.1186/s12911-019-0792-1
[11] H. Yue, Q. Yu, C. Liu, Y. Huang, Z. Jiang, C. Shao, Machine learning-based CT radiomics
     method for predicting hospital stay in patients with pneumonia associated with SARS-CoV-2
     infection: a multicenter study. 2020, PMC article, PMID : 32793703 DOI : 10.21037/atm-20-
     3026
[12] R.T. Sousa, O. Marques, F. Alphonsus, A.M.N. Soares, I.I.G. Sene, L.L.G. de Oliveira, E.S.
     Spoto; Comparative performance analysis of machine learning classifiers in detection of
     childhood pneumonia using chest radiographs. 2013, journal of Procedia Computer Science. Vol.
     18, pp. 2579–2582.
[13] M. Jinlian, F. Wu, J. Zhu, D. Xu and D. Kong, A pretrained convolutional neural network-based
     method for thyroid nodule diagnosis. 2016, Ultra-sonics, pp. 221-230
[14] V. Gulshan, L. Peng, M. Coram, M. Stumpe, D. Wu, Aru-nachalam Narayanaswamy,
     SubhachiniVenu-gopalan; Development and validation of a deeplearning algorithm for detection
     of diabetic retinopathy in retinal fundus photographs. 2016, article. DOI :
     http://jamanetwork.com/article.aspx?doi=10.1001/jama.2016.17216.
[15] M. Haloi, Improved Microaneurysm Detection using Deep Neural Net-works.2015, article. DOI
     :https://doi.org/10.48550/arXiv.1505.04424
[16] A. Rakhlin, A. Shvets, V. Iglovikov, A. Kalinin, Deep convolutional neural networks for breast
     cancer histology image analysis. 2018, International Conference Image Analysis and
     Recognition, pp. 737–744.
[17] Y. S. Vang, Z. Chen, X. Xie, Deep learning framework for mul-ti-class breast cancer histology
     image classification. 2018, International Conference Image Analysis and Recognition, pp. 914–
     922.
[18] O. Er, N. Yumusk, F. Temurtas; Chest diseases diagnosis using artificial neural networks. 2010,
     journal of Expert Systems with Applications, pp. 7648-7655.
[19] S. Rajaraman, S. Candemir, I. Kim, G. Thoma, S. Antani, Visualization and interpretation of
     convolutional neural network predictions in detecting pneumonia in pediatric chest radiographs.
     2018, article. Doi : https://doi.org/10.3390/app8101715
[20] X. Gu, L. Pan, H. Liang, R. Yang, Classification of Bacte-rial and Viral Childhood Pneumonia
     Using Deep Learning in 71 Chest Radiography. 2018, In Proceedings of the 3rd international
     conference on Mobile and image Processing, pp. 88–93.
[21] A.H. Alharbi, H.A. Hosni Mahmoud, Pneumonia Transfer Learning Deep Learning Model from
     Segmented X-rays. Healthcare 2022, 10, 987. https://doi.org/10.3390/ healthcare10060987
[22] P. Rajpurkar, J. Irvin, R.L. Ball, K. Zhu, B. Yang, H. Mehta, T. Duan, D. Ding, A. Bagul, C. P.
     Langlotz, Deep learning for chest radiograph diagnosis: A retrospective comparison of the
     CheXNeXt algorithm to practicing radiologists. 2018, PMC article, PMID: 30457988, DOI:
     https://doi.org/10.1371/journal.pmed.1002686.
[23] A. K. Jaiswal, P.T.S. Kumar, D. Gupta, A. Khanna, J. Rodrigues; Identifying pneumonia in chest
     X-rays: A deep learning approach. 2019, Journal of Measurement. Vol. 145, pp 511-518.
[24] E. Ayan and H. M. Unver, Diagnosis of Pneumonia from Chest X-¨ Ray Images Using Deep
     Learning. 2019, Conference, Scientific Meeting on Electrical-Electronics & Biomedical
     Engineering and Computer Science (EBBT). pp. 1–5.
[25] D. Varshni, K. Thakral, L. Agarwal, R. Nijhawan, A. Mittal, Pneumonia Detection Using CNN
     based Feature Extraction. 2019, International Conference on Electronics, Communication and
     Computing Technologies (ICECCT), pp. 1-7.
[26] V. Chouhan, S.K. Singh, A. Khamparia, D. Gupta, P. Tiwari, C. Moreira; A Novel Transfer
     Learning Based Approach for Pneumonia Detection in Chest X-ray Images. 2020, journal of
     Applied Sciences. Vol. 10, pp 559.
[27] T. Rahman, M. E. H. Chowdhury, A. Khandakar, R. Islam, Z. B. Mahboub, Transfer Learning
     with Deep Convolutional Neural Network (CNN) for Pneumonia Detection Using Chest X-ray.
     2020, article. Doi : https://doi.org/10.3390/app10093233
[28] M. F. Hashmi, S. Katiyar, A.G. Keskar, N.D. Bokde, Z.W. Geem; Efficient Pneumonia Detection
     in Chest Xray Images Using Deep Transfer Learning. 2020, PMC article, PMID: 32575475,
     DOI: https://doi.org/10.3390%2Fdiagnostics10060417
[29] Y. Brima, M. Atemkeng, S. Tankio Djiokap, J. Ebiele, F. Tchakounté, Transfer Learning for the
     Detection and Diagnosis of Types of Pneumonia including Pneumonia Induced by COVID-19
     from Chest X-ray Images. Diagnostics 2021, 11, 1480. https://doi.org/10.3390/
     diagnostics11081480
[30] A.M. Alqudah, S. Qazan, I.S. Masad, Artificial Intelligence Framework for Efficient Detection
     and Classification of Pneumonia Using Chest Radiography Images. 2021, Journal of Medical and
     biological Engeineering, pp. 599–609.
     A. Manickam, J. Jiang, Y. Zhou, A. Sagar, R. Soundrapandiyan, R.D. Samuel, (2021).
     Automated pneumonia detection on chest X-ray images: A deep learning approach with different
     optimizers and transfer learning architectures. https://doi.org/10.1016/j.measurement.2021.
     109953
[31] O. Polat, Z. Dokur, T. Olmez, Determination of Pneumonia¨ in X-ray Chest Images by Using
     Convolutional Neural Network. 2021, journal of Electrical Engineering and Computer Sciences,
     pp. 16151627.
     D. Avola, A. Bacciu, L. Cinque, A. Fagioli, M. R. Marini, R. Taiello. Study on transfer learning
     capabilities for pneumonia classification in chest-x-rays images. Comput Methods Programs
     Biomed. 2022 Jun;221:106833. doi: 10.1016/j.cmpb.2022.106833. Epub 2022 Apr 22. PMID:
     35537296; PMCID: PMC9033299.