Coronavirus (Covid-19) Classification Using CT Images by Machine Learning Methods Mücahid Barstuğana, Umut Özkayaa, and Şaban Öztürkb a Konya Technical University, Electrical and Electronics Engineering, Konya, 42250, Turkey b Amasya University, Electrical and Electronics Engineering, Amasya, 05000, Turkey Abstract This study detected the Coronavirus (COVID-19) disease by implementing artificial learning methods. Coronavirus disease occurs in the lungs and can cause death. The detection process was performed on chest Computed Tomography (CT) images. The training process was implemented by using 32x32 patches that were obtained from CT images. This study includes three phases: The first phase classifies patches by the SVM algorithm without implementing the feature extraction methods. The second phase extracts features on patches by using Grey Level Co-occurrence Matrix (GLCM), Grey Level Run Length Matrix (GLRLM), Grey-Level Size Zone Matrix (GLSZM), Discrete Wavelet Transform (DWT), Fast Fourier Transform (FFT), and Discrete Cosine Transform (DCT) methods and classifies the features extracted. The third phase uses Convolutional Neural Networks (CNN) method to classify the patches. 10-fold cross-validation is implemented in the classification process. The sensitivity, specificity, accuracy, precision, and F-score metrics measure the classification performance. The highest classification accuracy was achieved as 99.15% by the CNN method during the training process. The classification structure, which has the highest classification accuracy, was used during the test performance and had 80.21% mean sensitivity rate, which is the COVID detection performance, on 727 test images. Keywords 1 Classification, coronavirus, COVID-19, CT images, deep Learning, feature extraction, machine learning 1. Introduction COVID-19 disease has its characteristics. Therefore, clinical experts know the characteristics and need lung CT images to COVID-19 disease occurred at the end of diagnose the COVID-19 in the early phase. The 2019 at Wuhan region of China. COVID-19 serial CT examinations help clinical experts to disease shows fever, cough, fatigue, and understand the occurrence, development, and myalgias in the human body during the early prognosis of the disease. CT imaging can be phases [1]. The patients have abnormal sorted into four stages: early stage, progressive situations in their CT chest images. The stage, severe stage, and dissipative stage [2]. respiratory problems, heart damages, and Chest CT imaging modality is one of the key secondary infection situations were observed as elements during the diagnose of suspected complications of the disease. The findings show patients [3]. A total of 91,636,996 cases have that the COVID-19 virus spreads from person been diagnosed with COVID-19 infection; to person. The infected person needs to be 65,524,142 patients have recovered, and treated in the intensive care unit. Infected 1,961,037 patients have died by January 12, people have serious respiratory problems. The 2021. CT images of the infected people show that Proccedings of RTA-CSIT 2021, May 2021, Tirana, Albania EMAIL: mbarstugan@ktun.edu.tr (A. 1); uozkaya@ktun.edu.tr A. 2); saban.ozturk@amasya.edu.tr (A. 3) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) The development of computer vision The study proposed a deep learning-based CT systems supports medical applications such as diagnosis system to identify COVID-19 increasing the image quality, organ patients. The proposed system had 94% segmentation, and organ texture classification. classification accuracy. Prabira and Behera [10] The analysis of time series and tumor proposed a method to detect infected patients characteristics [4], the segmentation and by using X-ray scans. The dataset consisted of detection [5] of tumor modules are some of the 100 X-ray images. The method classified X-ray machine learning applications in biomedical images with the SVM algorithm by using deep image processing field. features. The proposed method, which was In the literature, there are not a detailed obtained as ResNet50+SVM, has 91.41% study and dataset on coronavirus disease. Xu et classification performance on the MCC metric. al. [6] classified CT images of COVID-19 into There are also some studies, which used lung three classes as COVID-19, Influenza-A viral X-ray scans [11-14]. Besides, some studies [15, pneumonia, and healthy cases. They obtained 16] used clinical blood test results. images from the hospitals in Zhejiang region of This study created patches from 202 CT China. The dataset consisted of a total of 618 images and the samples were labeled as images, which includes 219 images from 110 infected/non-infected. The patches were patients with COVID-19, 224 images of 224 sampled on lung and infected areas. Three patients with Influenza-A viral pneumonia, and different phases were used during the 175 images of 175 healthy people. Their study classification of the coronavirus images. The classified the images with a 3D-dimensional findings showed that the highest classification deep learning model and achieved an 87.6% accuracy was assessed by automatic feature overall classification accuracy. Shan et al. [7] extraction on the images. implemented a deep learning-based system for This paper is organized as follows. Section segmenting and quantification of the infected 2 analyses the images statistically and visually. regions as well as the entire lung on chest CT Section 3 briefly explains the feature extraction images. They used 249 COVID-19 patients and and classification methods. Section 4 presents 300 new COVID-19 patients for validation in the classification results. Section 5 concludes their study. The study obtained the Dice the results. similarity coefficient as 91.6%. The regular delineation system often takes 1 to 5 hours; 2. Material however, their proposed system reduced the delineation time to four minutes. Wang et al. [8] studied 453 CT images of pathogen-confirmed The dataset was taken from [17]. There are COVID-19 cases along with previously two types of dataset, which were labeled by experts, in [17]. The first dataset has 100 diagnosed with typical viral pneumonia. Their different CT images. The second dataset has study used the inception migration-learning model to create the algorithm. The study 829 CT images, which were obtained from nine proposed achieved 82.9% validation accuracy patients. The second dataset has non-covid images, also. Table 1 presents the original and 73.1% test accuracy. Ying et al. [9] used the dataset and patch dataset created. chest CT images, which were obtained from two different hospitals in China, of 88 patients. Table 1 The features of patch dataset Dataset Number Number of Number of Number of Number of infected Number of non-infected of images covid images training images test images training patches training patches Dataset 1 100 100 70 30 28047 8843 Dataset 2 829 372 132 697 31307 25670 202 images were used to obtain the training images, the grey levels are similar to non- patches. 727 images were used for the test infected regions. Also, the grey-level diversion stage. The images in the dataset have acquired changes dramatically according to the CT tool. from different CT tools. This situation makes Figure 1 shows the infected areas in images that the classification process difficult. In some were acquired from different CT tools. patches were extracted as the FOS features. The SVM [26] method classified all features. During the classification process, the 10-fold cross-validation method was used. In the third phase, a deep learning structure was used to classify the patches. Figure 3 shows the three phases stages of the classification process. Figure 1: The sample images 3. Method Figure 3: The classification processes for three phases This study classifies the patches into two classes. The first class was created by the patches that were sampled in non-infected areas. The second class was created by the 3.1. The convolutional neural patches that were sampled in infected areas. networks Figure 2 shows the sample images of infected areas. CNN is a biologically-inspired multi-layer perceptron (MLP) structure, which was originated by Hubel and Wiesel who worked on the mammalian visual cortex [27]. Fukushima [28] introduced CNN in 1980, then LeCun et al. [29] progressed in 1989 with a learning deep network study. CNN has a convolution layer and automatic learning capabilities, which provide widely usage in image classification, object detection, and visual tracking applications. The progress in hardware and parallelization increases this Figure 2: Sample patches for infected and non- intensive use of CNN [30]. Before the progress, infected classes the training process has taken months. After the progress, the training takes several days. The This study consists of three phases. In the CNN structure includes one or more first phase, the patches were transformed into convolution layers, pooling layers, ReLUs, and vectors and the SVM classified the vectors. In softmax regression. The convolutional layer the second phase, six different feature reduces the computational cost by reducing the extraction methods as Grey Level Co- number of image parameters. The pooling layer occurrence Matrix (GLCM) [18-20], Grey is often used to reduce matrix size without Level Run Length Matrix (GLRLM) [21], Grey losing important features. The ReLU is an Level Size Zone Matrix (GLSZM) [22] activation function and generally used extracted the features. The First Order Statistics following convolution or pooling layers. (FOS) features were obtained on patches and Finally, the softmax layer is used for the output the transform images of Discrete Wavelet value regression. Figure 4 shows the CNN Transform (DWT) [23], Fast Fourier Transform structure used in this study. (FFT) [24], and Discrete Cosine Transform (DCT) [25]. The mean, variance, skewness, kurtosis, energy, and entropy values of the Figure 4: The CNN structure used in Phase 3 GB DDR4 RAM, and NVIDIA GeForce GTX 1080 graphic card. 4. Experimental Results This study presents a coronavirus 4.1. The classification results of classification in three phases. Phase 1 classified the training process the patches without feature extraction. Phase 2 implemented the feature extraction process on The dataset for training consists of 202 CT all patches and classified the features extracted. images. The test set consists of 727 images. Phase 3 used the CNN method to classify the 59354 infected and 34513 non-infected patches patches. Five different evaluation metrics. were obtained on 202 training images. These These metrics are sensitivity (SEN), specificity patches were classified by using three phases. (SPE), accuracy (ACC), precision (PRE), and Table 2 presents the classification results F-score. obtained. All experiments were trained on a computer with Intel Core i7-8700K CPU (3.7 GHz), 32 Table 1 The features of patch dataset Evaluation Metrics (mean (%) ± std) Feature Classifier Feature Phase SEN SPE ACC PRE F-score Extraction Structure Number Phase 1 x SVM 1024 51.85±1.8 61.92±3.1 56.88±1.3 57.7±1.7 54.59±1.2 FOS-SVM 6 95.32±0.4 98.12±0.3 97.09±0.4 96.72±0.2 96.01±0.2 GLCM-SVM 19 13.22±0.6 100 68.09±0.2 100 23.35±0.8 Manual GLRLM-SVM 7 44.85±0.4 93.72±0.4 75.75±0.3 80.62±1.1 57.63±0.4 Phase 2 Feature GLSZM-SVM 13 2.92±0.2 100 64.31±0.1 100 5.7±0.3 Extraction DWT-FOS-SVM 24 65.25±0.6 92.5±0.4 82.48±0.3 83.5±0.7 73.25±0.4 FFT-FOS-SVM 6 65.83±1.1 90.6±0.4 81.5±0.4 80.3±0.7 72.3± DCT-FOS-SVM 6 64.54±0.9 89.96±0.3 80.61±0.3 78.9±0.4 70.99±0.6 Table 2 shows that the classification layer, pooling layer, and the softmax layer, was accuracy was obtained as 56.88% in Phase 1. used. Table 3 shows the highest results for The best classification performance in Phase 2 Phase 3. was obtained with the FOS-SVM method as 97.09% with six features. The lowest Table 3 performances were obtained with the GLSZM- The classification results for Phase 3 SVM structure, which detected infected patches Feature Extraction Classifier Structure Accuracy (%) with too low performance as a 2.92% sensitivity rate. The results show that extracting Automatic Feature CNN 99.51 features increase the classification Extraction performance. In Phase 3, a CNN structure, which consists of a convolution layer, ReLu Table 3 shows that the classification Table 4 accuracy was obtained as 99.15% in Phase 3 The classification performance of test process during the training process. Table 2 and Table 3 show that the highest classification Evaluation Metrics (%) performance was obtained by the CNN method. Phase 2 achieved maximum classification Method SEN SPE ACC PRE F-score accuracy as 97.09% with manual feature extraction methods. Phase 3 achieved 99.15% Phase 3 80.21 99.47 98.3 90.8 85.18 classification accuracy by extracting features automatically. Table 4 shows that the proposed method detects the infected images with 80.2% mean sensitivity rate on 727 images. The non- 4.2. The classification results of infected patches are classified with a 99.47% the test process mean specificity rate. The CNN structure has 99.15% classification performance during This study used the trained structure, which training; however, it has 80.21% performance has high classification accuracy, to detect the on test images, which were not used for infected areas in images that were not used training. during the training stage. Figure 5 presents the test stage implemented. 5. Conlusion COVID-19 was firstly encountered in Figure 5: The test structure to diagnose the Wuhan region in China and has been infected threatening the public health, trade, and world economy. The virus shows partially similar The trained CNN structure classified the behaviors with other viral pneumonias. 32x32 divided patches, which were not Therefore, the spreading rate of the virus made overlapped. During the test stage, a threshold the situation difficult to be under control. CT value was determined as “1”. If one patch is imaging results of COVID-19 show different classified as infected, the image was classified findings according to other clinical studies. as infected. If there were not any infected Some situations, such as the bronchiectasis, patches in the image, the image was classified lesion swelling symptoms, and different as non-infected. When the threshold value was shadowiness in CT images provide to diagnose determined as “2”, the classification COVID-19, easily. This study compared performance reduced. The reason is that some manual feature extraction-based SVM and of images have only one patch size of infected CNN that automatically extracts features, and area. When the threshold value was taken achieved that CNN has better performance than bigger than “1”, even these areas were SVM method. In this study, the coronavirus classified as infected, the image was classified image set has a different type of images, which as non-infected. This caused classification were acquired with different CT tools. performance to reduce. 727 test images were Therefore, different feature extraction methods classified during test process. Table 4 presents and classifiers were implemented to find the the mean results of test process. method that separates the infected patches. Table 5 presents the literature studies and their classification performances on different coronavirus dataset. Table 5 The literature comparison Study Method Dataset Number of classes Performance (%) [6] 3D-deep learning CT, 618 images 3 87.6 [8] Migration-learning CT, 453 images 2 82.9 [10] ResNet50+SVM X-ray, 100 images 2 91.41 [13] COVIDX-Net X-ray, 50 images 2 90 Blood Test, 404 [16] XG-boost 2 97 samples Blood Test, 49 [15] Random Forest 2 95.12 samples [30] Feature Selection and Lasso Regression Clinical Data 2 84.1 This study CNN CT, 727 images 2 80.21 There are different ways to diagnose [8] S. Wang et al., "A deep learning algorithm using coronavirus. According to the literature studies CT images to screen for Corona Virus Disease in Table 5, CT imaging, X-ray imaging, blood (COVID-19)," 2020. test, and clinical data are used to detect the [9] Y. Song et al., "Deep learning Enables Accurate Diagnosis of Novel Coronavirus (COVID-19) coronavirus. These datasets were examined by with CT images," 2020. different artificial intelligence methods. This [10] P. K. Sethy and S. K. Behera, "Detection of study used a CT dataset and CNN structure. coronavirus Disease (COVID-19) based on Deep There are not enough CT chest images to train Features," 2020. the deep learning methods in the literature. To [11] I. D. Apostolopoulos, T. A. J. P. Mpesiana, and avoid this problem, the patch classification E. S. i. Medicine, "Covid-19: automatic method was used to overcome the lack of data. detection from x-ray images utilizing transfer If the number of COVID-19 images is increased learning with convolutional neural networks," p. and a dataset that has data diversity is created, 1, 2020. high dimensional deep learning methods may [12]B. Ghoshal and A. J. a. p. a. Tucker, "Estimating uncertainty and interpretability in deep learning be used, and a classifier structure, which gives for coronavirus (COVID-19) detection," 2020. higher detection performance, is created. [13] E. E.-D. Hemdan, M. A. Shouman, and M. E. J. a. p. a. Karar, "COVIDX-Net: A Framework of Deep Learning Classifiers to Diagnose COVID- 6. Reference 19 in X-Ray Images," 2020. [14] A. Narin, C. Kaya, and Z. J. a. p. a. Pamuk, "Automatic detection of coronavirus disease [1] C. Huang et al., "Clinical features of patients (COVID-19) using X-ray images and deep infected with 2019 novel coronavirus in Wuhan, convolutional neural networks," 2020. China," vol. 395, no. 10223, pp. 497-506, 2020. [15] J. Wu et al., "Rapid and accurate identification [2] M. Li et al., "Coronavirus Disease (COVID-19): of COVID-19 infection through machine Spectrum of CT Findings and Temporal learning based on clinical available blood test Progression of the Disease," 2020. results," 2020. [3] L. Fan et al., "Progress and prospect on imaging [16] L. Yan et al., "A machine learning-based model diagnosis of COVID-19," pp. 1-10, 2020. for survival prediction in patients with severe [4] P. Huang et al., "Added value of computer-aided COVID-19 infection," 2020. CT image features for early lung cancer [17]MedSeg.http://medicalsegmentation.com/covid diagnosis with small pulmonary nodules: a 19/ (accessed 20.04, 2020). matched case-control study," vol. 286, no. 1, pp. [18] D. A. J. C. J. o. r. s. Clausi, "An analysis of co- 286-295, 2018. occurrence texture statistics as a function of grey [5]A. Esteva et al., "Dermatologist-level level quantization," vol. 28, no. 1, pp. 45-62, classification of skin cancer with deep neural 2002. networks," vol. 542, no. 7639, pp. 115-118, [19] R. M. Haralick, K. Shanmugam, I. H. J. I. T. o. 2017. s. Dinstein, man,, and cybernetics, "Textural [6] X. Xu et al., "Deep learning system to screen features for image classification," no. 6, pp. 610- coronavirus disease 2019 pneumonia," 2020. 621, 1973. [7] F. Shan+ et al., "Lung Infection Quantification [20] L.-K. Soh, C. J. I. T. o. g. Tsatsoulis, and r. of COVID-19 in CT Images with Deep sensing, "Texture analysis of SAR sea ice Learning," 2020. imagery using gray level co-occurrence matrices," vol. 37, no. 2, pp. 780-795, 1999. [21] A. S. M. Sohail, P. Bhattacharya, S. P. Mudur, and S. Krishnamurthy, "Local relative GLRLM- based texture feature extraction for classifying ultrasound medical images," in 2011 24th Canadian Conference on Electrical and Computer Engineering (CCECE), 2011: IEEE, pp. 001092-001095. [22] G. Thibault, J. Angulo, and F. J. I. T. o. B. E. Meyer, "Advanced statistical matrices for texture characterization: application to cell classification," vol. 61, no. 3, pp. 630-637, 2013. [23] M. J. J. I. T. o. s. p. Shensa, "The discrete wavelet transform: wedding the a trous and Mallat algorithms," vol. 40, no. 10, pp. 2464- 2482, 1992. [24] H. J. Nussbaumer, "The fast Fourier transform," in Fast Fourier Transform and Convolution Algorithms: Springer, 1981, pp. 80-111. [25] G. J. S. r. Strang, "The discrete cosine transform," vol. 41, no. 1, pp. 135-147, 1999. [26] S. R. Kulkarni and G. J. W. I. R. C. S. Harman, "Statistical learning theory: a tutorial," vol. 3, no. 6, pp. 543-556, 2011. [27] D. H. Hubel and T. N. J. T. J. o. p. Wiesel, "Receptive fields and functional architecture of monkey striate cortex," vol. 195, no. 1, pp. 215- 243, 1968. [28] K. Fukushima, "Neocognitron: A self- organizing neural network model for a mechanism of pattern recognition unaffected by shift in position," Biological Cybernetics, vol. 36, no. 4, pp. 193-202, 1980/04/01 1980, doi: 10.1007/BF00344251. [29] Y. LeCun et al., "Backpropagation Applied to Handwritten Zip Code Recognition," vol. 1, no. 4, pp. 541-551, 1989, doi: 10.1162/neco.1989.1.4.541. [30] C. Feng et al., "A Novel Triage Tool of Artificial Intelligence Assisted Diagnosis Aid System for Suspected COVID-19 pneumonia In Fever Clinics," 2020.