Coronavirus (Covid-19) Classification Using CT Images by
Machine Learning Methods
Mücahid Barstuğana, Umut Özkayaa, and Şaban Öztürkb
a
    Konya Technical University, Electrical and Electronics Engineering, Konya, 42250, Turkey
b
    Amasya University, Electrical and Electronics Engineering, Amasya, 05000, Turkey


                 Abstract
                 This study detected the Coronavirus (COVID-19) disease by implementing artificial learning
                 methods. Coronavirus disease occurs in the lungs and can cause death. The detection process
                 was performed on chest Computed Tomography (CT) images. The training process was
                 implemented by using 32x32 patches that were obtained from CT images. This study includes
                 three phases: The first phase classifies patches by the SVM algorithm without implementing
                 the feature extraction methods. The second phase extracts features on patches by using Grey
                 Level Co-occurrence Matrix (GLCM), Grey Level Run Length Matrix (GLRLM), Grey-Level
                 Size Zone Matrix (GLSZM), Discrete Wavelet Transform (DWT), Fast Fourier Transform
                 (FFT), and Discrete Cosine Transform (DCT) methods and classifies the features extracted. The
                 third phase uses Convolutional Neural Networks (CNN) method to classify the patches. 10-fold
                 cross-validation is implemented in the classification process. The sensitivity, specificity,
                 accuracy, precision, and F-score metrics measure the classification performance. The highest
                 classification accuracy was achieved as 99.15% by the CNN method during the training
                 process. The classification structure, which has the highest classification accuracy, was used
                 during the test performance and had 80.21% mean sensitivity rate, which is the COVID
                 detection performance, on 727 test images.

                 Keywords 1
                 Classification, coronavirus, COVID-19, CT images, deep Learning, feature extraction, machine
                 learning


1. Introduction                                                                             COVID-19 disease has its characteristics.
                                                                                            Therefore, clinical experts know the
                                                                                            characteristics and need lung CT images to
    COVID-19 disease occurred at the end of
                                                                                            diagnose the COVID-19 in the early phase. The
2019 at Wuhan region of China. COVID-19
                                                                                            serial CT examinations help clinical experts to
disease shows fever, cough, fatigue, and
                                                                                            understand the occurrence, development, and
myalgias in the human body during the early
                                                                                            prognosis of the disease. CT imaging can be
phases [1]. The patients have abnormal
                                                                                            sorted into four stages: early stage, progressive
situations in their CT chest images. The
                                                                                            stage, severe stage, and dissipative stage [2].
respiratory problems, heart damages, and
                                                                                            Chest CT imaging modality is one of the key
secondary infection situations were observed as
                                                                                            elements during the diagnose of suspected
complications of the disease. The findings show
                                                                                            patients [3]. A total of 91,636,996 cases have
that the COVID-19 virus spreads from person
                                                                                            been diagnosed with COVID-19 infection;
to person. The infected person needs to be
                                                                                            65,524,142 patients have recovered, and
treated in the intensive care unit. Infected
                                                                                            1,961,037 patients have died by January 12,
people have serious respiratory problems. The
                                                                                            2021.
CT images of the infected people show that

Proccedings of RTA-CSIT 2021, May 2021, Tirana, Albania
EMAIL: mbarstugan@ktun.edu.tr (A. 1); uozkaya@ktun.edu.tr A.
2); saban.ozturk@amasya.edu.tr (A. 3)
             © 2021 Copyright for this paper by its authors. Use permitted under Creative
             Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)
        The development of computer vision                     The study proposed a deep learning-based CT
    systems supports medical applications such as              diagnosis system to identify COVID-19
    increasing the image quality, organ                        patients. The proposed system had 94%
    segmentation, and organ texture classification.            classification accuracy. Prabira and Behera [10]
    The analysis of time series and tumor                      proposed a method to detect infected patients
    characteristics [4], the segmentation and                  by using X-ray scans. The dataset consisted of
    detection [5] of tumor modules are some of the             100 X-ray images. The method classified X-ray
    machine learning applications in biomedical                images with the SVM algorithm by using deep
    image processing field.                                    features. The proposed method, which was
        In the literature, there are not a detailed            obtained as ResNet50+SVM, has 91.41%
    study and dataset on coronavirus disease. Xu et            classification performance on the MCC metric.
    al. [6] classified CT images of COVID-19 into              There are also some studies, which used lung
    three classes as COVID-19, Influenza-A viral               X-ray scans [11-14]. Besides, some studies [15,
    pneumonia, and healthy cases. They obtained                16] used clinical blood test results.
    images from the hospitals in Zhejiang region of                This study created patches from 202 CT
    China. The dataset consisted of a total of 618             images and the samples were labeled as
    images, which includes 219 images from 110                 infected/non-infected. The patches were
    patients with COVID-19, 224 images of 224                  sampled on lung and infected areas. Three
    patients with Influenza-A viral pneumonia, and             different phases were used during the
    175 images of 175 healthy people. Their study              classification of the coronavirus images. The
    classified the images with a 3D-dimensional                findings showed that the highest classification
    deep learning model and achieved an 87.6%                  accuracy was assessed by automatic feature
    overall classification accuracy. Shan et al. [7]           extraction on the images.
    implemented a deep learning-based system for                   This paper is organized as follows. Section
    segmenting and quantification of the infected              2 analyses the images statistically and visually.
    regions as well as the entire lung on chest CT             Section 3 briefly explains the feature extraction
    images. They used 249 COVID-19 patients and                and classification methods. Section 4 presents
    300 new COVID-19 patients for validation in                the classification results. Section 5 concludes
    their study. The study obtained the Dice                   the results.
    similarity coefficient as 91.6%. The regular
    delineation system often takes 1 to 5 hours;               2. Material
    however, their proposed system reduced the
    delineation time to four minutes. Wang et al. [8]
    studied 453 CT images of pathogen-confirmed                    The dataset was taken from [17]. There are
    COVID-19 cases along with previously                       two types of dataset, which were labeled by
                                                               experts, in [17]. The first dataset has 100
    diagnosed with typical viral pneumonia. Their
                                                               different CT images. The second dataset has
    study used the inception migration-learning
    model to create the algorithm. The study                   829 CT images, which were obtained from nine
    proposed achieved 82.9% validation accuracy                patients. The second dataset has non-covid
                                                               images, also. Table 1 presents the original
    and 73.1% test accuracy. Ying et al. [9] used the
                                                               dataset and patch dataset created.
    chest CT images, which were obtained from
    two different hospitals in China, of 88 patients.

       Table 1
       The features of patch dataset
Dataset      Number      Number of       Number of       Number of     Number of infected   Number of non-infected
            of images   covid images   training images   test images    training patches       training patches
Dataset 1       100         100               70              30             28047                   8843
Dataset 2       829         372              132             697             31307                  25670


       202 images were used to obtain the training             images, the grey levels are similar to non-
    patches. 727 images were used for the test                 infected regions. Also, the grey-level diversion
    stage. The images in the dataset have acquired             changes dramatically according to the CT tool.
    from different CT tools. This situation makes              Figure 1 shows the infected areas in images that
    the classification process difficult. In some              were acquired from different CT tools.
                                                     patches were extracted as the FOS features. The
                                                     SVM [26] method classified all features.
                                                     During the classification process, the 10-fold
                                                     cross-validation method was used. In the third
                                                     phase, a deep learning structure was used to
                                                     classify the patches. Figure 3 shows the three
                                                     phases stages of the classification process.


Figure 1: The sample images

3. Method
                                                     Figure 3: The classification processes for three
                                                     phases
   This study classifies the patches into two
classes. The first class was created by the
patches that were sampled in non-infected
areas. The second class was created by the           3.1. The         convolutional          neural
patches that were sampled in infected areas.         networks
Figure 2 shows the sample images of infected
areas.
                                                         CNN is a biologically-inspired multi-layer
                                                     perceptron (MLP) structure, which was
                                                     originated by Hubel and Wiesel who worked on
                                                     the mammalian visual cortex [27]. Fukushima
                                                     [28] introduced CNN in 1980, then LeCun et al.
                                                     [29] progressed in 1989 with a learning deep
                                                     network study.
                                                         CNN has a convolution layer and automatic
                                                     learning capabilities, which provide widely
                                                     usage in image classification, object detection,
                                                     and visual tracking applications. The progress
                                                     in hardware and parallelization increases this
Figure 2: Sample patches for infected and non-
                                                     intensive use of CNN [30]. Before the progress,
infected classes                                     the training process has taken months. After the
                                                     progress, the training takes several days. The
     This study consists of three phases. In the     CNN structure includes one or more
first phase, the patches were transformed into       convolution layers, pooling layers, ReLUs, and
vectors and the SVM classified the vectors. In       softmax regression. The convolutional layer
the second phase, six different feature              reduces the computational cost by reducing the
extraction methods as Grey Level Co-                 number of image parameters. The pooling layer
occurrence Matrix (GLCM) [18-20], Grey               is often used to reduce matrix size without
Level Run Length Matrix (GLRLM) [21], Grey           losing important features. The ReLU is an
Level Size Zone Matrix (GLSZM) [22]                  activation function and generally used
extracted the features. The First Order Statistics   following convolution or pooling layers.
(FOS) features were obtained on patches and          Finally, the softmax layer is used for the output
the transform images of Discrete Wavelet             value regression. Figure 4 shows the CNN
Transform (DWT) [23], Fast Fourier Transform         structure used in this study.
(FFT) [24], and Discrete Cosine Transform
(DCT) [25]. The mean, variance, skewness,
kurtosis, energy, and entropy values of the
Figure 4: The CNN structure used in Phase 3
                                                             GB DDR4 RAM, and NVIDIA GeForce GTX
                                                             1080 graphic card.
4. Experimental Results

    This study presents a coronavirus                        4.1. The classification results of
classification in three phases. Phase 1 classified           the training process
the patches without feature extraction. Phase 2
implemented the feature extraction process on                  The dataset for training consists of 202 CT
all patches and classified the features extracted.           images. The test set consists of 727 images.
Phase 3 used the CNN method to classify the                  59354 infected and 34513 non-infected patches
patches. Five different evaluation metrics.                  were obtained on 202 training images. These
These metrics are sensitivity (SEN), specificity             patches were classified by using three phases.
(SPE), accuracy (ACC), precision (PRE), and                  Table 2 presents the classification results
F-score.                                                     obtained.
    All experiments were trained on a computer
with Intel Core i7-8700K CPU (3.7 GHz), 32

   Table 1
   The features of patch dataset
                                                                          Evaluation Metrics (mean (%) ± std)
            Feature       Classifier    Feature
  Phase                                                SEN          SPE           ACC             PRE      F-score
           Extraction     Structure     Number
 Phase 1       x            SVM          1024        51.85±1.8   61.92±3.1      56.88±1.3    57.7±1.7     54.59±1.2
                          FOS-SVM          6         95.32±0.4   98.12±0.3      97.09±0.4    96.72±0.2    96.01±0.2
                         GLCM-SVM         19         13.22±0.6      100         68.09±0.2       100       23.35±0.8
            Manual       GLRLM-SVM         7         44.85±0.4   93.72±0.4      75.75±0.3    80.62±1.1    57.63±0.4
 Phase 2    Feature      GLSZM-SVM        13         2.92±0.2       100         64.31±0.1       100        5.7±0.3
           Extraction   DWT-FOS-SVM       24         65.25±0.6   92.5±0.4       82.48±0.3    83.5±0.7     73.25±0.4
                        FFT-FOS-SVM        6         65.83±1.1   90.6±0.4       81.5±0.4     80.3±0.7       72.3±
                        DCT-FOS-SVM        6         64.54±0.9   89.96±0.3      80.61±0.3    78.9±0.4     70.99±0.6


   Table 2 shows that the classification                     layer, pooling layer, and the softmax layer, was
accuracy was obtained as 56.88% in Phase 1.                  used. Table 3 shows the highest results for
The best classification performance in Phase 2               Phase 3.
was obtained with the FOS-SVM method as
97.09% with six features. The lowest                             Table 3
performances were obtained with the GLSZM-                       The classification results for Phase 3
SVM structure, which detected infected patches
                                                          Feature Extraction       Classifier Structure    Accuracy (%)
with too low performance as a 2.92%
sensitivity rate. The results show that extracting       Automatic Feature
                                                                                            CNN                 99.51
features      increase      the     classification           Extraction
performance. In Phase 3, a CNN structure,
which consists of a convolution layer, ReLu
   Table 3 shows that the classification           Table 4
accuracy was obtained as 99.15% in Phase 3         The classification performance of test process
during the training process. Table 2 and Table
3 show that the highest classification
                                                                      Evaluation Metrics (%)
performance was obtained by the CNN method.
Phase 2 achieved maximum classification            Method     SEN      SPE    ACC     PRE      F-score
accuracy as 97.09% with manual feature
extraction methods. Phase 3 achieved 99.15%        Phase 3   80.21    99.47   98.3    90.8     85.18
classification accuracy by extracting features
automatically.
                                                       Table 4 shows that the proposed method
                                                   detects the infected images with 80.2% mean
                                                   sensitivity rate on 727 images. The non-
4.2. The classification results of                 infected patches are classified with a 99.47%
the test process                                   mean specificity rate. The CNN structure has
                                                   99.15% classification performance during
    This study used the trained structure, which   training; however, it has 80.21% performance
has high classification accuracy, to detect the    on test images, which were not used for
infected areas in images that were not used        training.
during the training stage. Figure 5 presents the
test stage implemented.
                                                   5. Conlusion
                                                       COVID-19 was firstly encountered in
Figure 5: The test structure to diagnose the       Wuhan region in China and has been
infected                                           threatening the public health, trade, and world
                                                   economy. The virus shows partially similar
    The trained CNN structure classified the       behaviors with other viral pneumonias.
32x32 divided patches, which were not              Therefore, the spreading rate of the virus made
overlapped. During the test stage, a threshold     the situation difficult to be under control. CT
value was determined as “1”. If one patch is       imaging results of COVID-19 show different
classified as infected, the image was classified   findings according to other clinical studies.
as infected. If there were not any infected        Some situations, such as the bronchiectasis,
patches in the image, the image was classified     lesion swelling symptoms, and different
as non-infected. When the threshold value was      shadowiness in CT images provide to diagnose
determined as “2”, the classification              COVID-19, easily. This study compared
performance reduced. The reason is that some       manual feature extraction-based SVM and
of images have only one patch size of infected     CNN that automatically extracts features, and
area. When the threshold value was taken           achieved that CNN has better performance than
bigger than “1”, even these areas were             SVM method. In this study, the coronavirus
classified as infected, the image was classified   image set has a different type of images, which
as non-infected. This caused classification        were acquired with different CT tools.
performance to reduce. 727 test images were        Therefore, different feature extraction methods
classified during test process. Table 4 presents   and classifiers were implemented to find the
the mean results of test process.                  method that separates the infected patches.
                                                   Table 5 presents the literature studies and their
                                                   classification performances on different
                                                   coronavirus dataset.
Table 5
The literature comparison
   Study                        Method                         Dataset         Number of classes     Performance (%)
      [6]                  3D-deep learning                 CT, 618 images             3                    87.6
      [8]                  Migration-learning               CT, 453 images             2                    82.9
     [10]                   ResNet50+SVM                   X-ray, 100 images           2                   91.41
     [13]                     COVIDX-Net                   X-ray, 50 images            2                     90
                                                            Blood Test, 404
     [16]                       XG-boost                                               2                    97
                                                                samples
                                                             Blood Test, 49
     [15]                    Random Forest                                             2                   95.12
                                                                samples
     [30]         Feature Selection and Lasso Regression      Clinical Data            2                    84.1
  This study                       CNN                      CT, 727 images             2                   80.21


    There are different ways to diagnose                       [8] S. Wang et al., "A deep learning algorithm using
coronavirus. According to the literature studies                   CT images to screen for Corona Virus Disease
in Table 5, CT imaging, X-ray imaging, blood                       (COVID-19)," 2020.
test, and clinical data are used to detect the                 [9] Y. Song et al., "Deep learning Enables Accurate
                                                                   Diagnosis of Novel Coronavirus (COVID-19)
coronavirus. These datasets were examined by
                                                                   with CT images," 2020.
different artificial intelligence methods. This                [10] P. K. Sethy and S. K. Behera, "Detection of
study used a CT dataset and CNN structure.                         coronavirus Disease (COVID-19) based on Deep
There are not enough CT chest images to train                      Features," 2020.
the deep learning methods in the literature. To                [11] I. D. Apostolopoulos, T. A. J. P. Mpesiana, and
avoid this problem, the patch classification                       E. S. i. Medicine, "Covid-19: automatic
method was used to overcome the lack of data.                      detection from x-ray images utilizing transfer
If the number of COVID-19 images is increased                      learning with convolutional neural networks," p.
and a dataset that has data diversity is created,                  1, 2020.
high dimensional deep learning methods may                     [12]B. Ghoshal and A. J. a. p. a. Tucker, "Estimating
                                                                   uncertainty and interpretability in deep learning
be used, and a classifier structure, which gives
                                                                   for coronavirus (COVID-19) detection," 2020.
higher detection performance, is created.                      [13] E. E.-D. Hemdan, M. A. Shouman, and M. E. J.
                                                                   a. p. a. Karar, "COVIDX-Net: A Framework of
                                                                   Deep Learning Classifiers to Diagnose COVID-
6. Reference                                                       19 in X-Ray Images," 2020.
                                                               [14] A. Narin, C. Kaya, and Z. J. a. p. a. Pamuk,
                                                                   "Automatic detection of coronavirus disease
[1] C. Huang et al., "Clinical features of patients                (COVID-19) using X-ray images and deep
    infected with 2019 novel coronavirus in Wuhan,                 convolutional neural networks," 2020.
    China," vol. 395, no. 10223, pp. 497-506, 2020.            [15] J. Wu et al., "Rapid and accurate identification
[2] M. Li et al., "Coronavirus Disease (COVID-19):                 of COVID-19 infection through machine
    Spectrum of CT Findings and Temporal                           learning based on clinical available blood test
    Progression of the Disease," 2020.                             results," 2020.
[3] L. Fan et al., "Progress and prospect on imaging           [16] L. Yan et al., "A machine learning-based model
    diagnosis of COVID-19," pp. 1-10, 2020.                        for survival prediction in patients with severe
[4] P. Huang et al., "Added value of computer-aided                COVID-19 infection," 2020.
    CT image features for early lung cancer                    [17]MedSeg.http://medicalsegmentation.com/covid
    diagnosis with small pulmonary nodules: a                      19/ (accessed 20.04, 2020).
    matched case-control study," vol. 286, no. 1, pp.          [18] D. A. J. C. J. o. r. s. Clausi, "An analysis of co-
    286-295, 2018.                                                 occurrence texture statistics as a function of grey
[5]A. Esteva et al., "Dermatologist-level                          level quantization," vol. 28, no. 1, pp. 45-62,
    classification of skin cancer with deep neural                 2002.
    networks," vol. 542, no. 7639, pp. 115-118,                [19] R. M. Haralick, K. Shanmugam, I. H. J. I. T. o.
    2017.                                                          s. Dinstein, man,, and cybernetics, "Textural
[6] X. Xu et al., "Deep learning system to screen                  features for image classification," no. 6, pp. 610-
    coronavirus disease 2019 pneumonia," 2020.                     621, 1973.
[7] F. Shan+ et al., "Lung Infection Quantification            [20] L.-K. Soh, C. J. I. T. o. g. Tsatsoulis, and r.
    of COVID-19 in CT Images with Deep                             sensing, "Texture analysis of SAR sea ice
    Learning," 2020.                                               imagery using gray level co-occurrence
                                                                   matrices," vol. 37, no. 2, pp. 780-795, 1999.
[21] A. S. M. Sohail, P. Bhattacharya, S. P. Mudur,
   and S. Krishnamurthy, "Local relative GLRLM-
   based texture feature extraction for classifying
   ultrasound medical images," in 2011 24th
   Canadian Conference on Electrical and
   Computer Engineering (CCECE), 2011: IEEE,
   pp. 001092-001095.
[22] G. Thibault, J. Angulo, and F. J. I. T. o. B. E.
   Meyer, "Advanced statistical matrices for
   texture characterization: application to cell
   classification," vol. 61, no. 3, pp. 630-637, 2013.
[23] M. J. J. I. T. o. s. p. Shensa, "The discrete
   wavelet transform: wedding the a trous and
   Mallat algorithms," vol. 40, no. 10, pp. 2464-
   2482, 1992.
[24] H. J. Nussbaumer, "The fast Fourier transform,"
   in Fast Fourier Transform and Convolution
   Algorithms: Springer, 1981, pp. 80-111.
[25] G. J. S. r. Strang, "The discrete cosine
   transform," vol. 41, no. 1, pp. 135-147, 1999.
[26] S. R. Kulkarni and G. J. W. I. R. C. S. Harman,
   "Statistical learning theory: a tutorial," vol. 3, no.
   6, pp. 543-556, 2011.
[27] D. H. Hubel and T. N. J. T. J. o. p. Wiesel,
   "Receptive fields and functional architecture of
   monkey striate cortex," vol. 195, no. 1, pp. 215-
   243, 1968.
[28] K. Fukushima, "Neocognitron: A self-
   organizing neural network model for a
   mechanism of pattern recognition unaffected by
   shift in position," Biological Cybernetics, vol.
   36, no. 4, pp. 193-202, 1980/04/01 1980, doi:
   10.1007/BF00344251.
[29] Y. LeCun et al., "Backpropagation Applied to
   Handwritten Zip Code Recognition," vol. 1, no.
   4,       pp.       541-551,         1989,         doi:
   10.1162/neco.1989.1.4.541.
[30] C. Feng et al., "A Novel Triage Tool of
   Artificial Intelligence Assisted Diagnosis Aid
   System for Suspected COVID-19 pneumonia In
   Fever Clinics," 2020.