Research on CT Image Classification Algorithm of COVID-19 Based on Improved ResNet 1 Xipei Chen, Yicong Zhao, Yanqiu Wang, Bin Yang, Xiaofei Yan College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China Abstract The classification of COVID-19 and other viral pneumonias will help doctors to diagnose new coronary patients more accurately and quickly. Aiming at the classification problem of CT in patients with COVID-19, this paper proposes a CT image classification method based on an improved ResNet50 network based on the traditional convolutional neural network classification model. This paper uses the multiscale feature fusion strategy, combined with the improved attention mechanism to obtain the correlation coefficient between the internal feature points of the feature map, and finally achieves the effect of enhancing the representation ability of the feature map. Through the analysis and comparison of the technical principle, classification accuracy, and other parameters, it shows that the improved algorithm has better adaptive ability and classification ability. Through experiments, the improved ResNet50 classification model has a certain improvement in accuracy, time complexity, and spatial complexity compared with the traditional classification model, and the accuracy rate can reach 90.1 %. Keywords ResNet50 model; COVID-19; CT image classification 1 Introduction At present, nucleic acid detection is the most common method to diagnose COVID-19 [1]. Reverse transcription polymerase reaction (RT-PCR) [2] has become the most mainstream COVID-19 detection technology. RT-PCR can detect RNA viruses in samples obtained from pharyngeal swabs, nasal swabs, sputum, bronchial lavage fluid, alveolar lavage fluid, etc. However, various studies have shown that the accuracy of RT-PCR detection is relatively low, and it usually requires multiple tests to be more accurate. Due to the low sample quality and pharyngeal virus load, nucleic acid detection by pharyngeal swabs is prone to false negatives, with a high retest rate and a long time to wait for nucleic acid detection results [3]. CT imaging technology plays an important role in the detection of COVID-19. Chest CT image is an effective tool to help doctors quickly diagnose COVID-19. However, because the lung characteristics of patients in the early stage of infection are not obvious in CT images, inexperienced doctors will not be able to accurately identify the CT image characteristics of COVID-19, which may lead to misdiagnosis. Using deep learning to analyze lung images (CT images) can reveal many insignificant features in the images and then give clear detection results. Therefore, integrating deep learning into medical images, image processing, target analysis, and other work on CT images, accurately extracting key focus areas and texture features, and screening for the performance characteristics of COVID-19, such as ground glass shadow, paving stone sign, lung consolidation and so on[4]. Aiming at the classification problem of CT images of COVID-19, this paper takes resnet50 based on the improved attention mechanism as the training model, and uses Softmax classifier to build a classification model to assist clinicians in diagnosis and analysis, to reduce clinicians' work intensity and pressure and improve work efficiency. AIoTC2022@International Conference on Artificial Intelligence, Internet of Things and Cloud Computing Technology EMAIL: Corresponding author: yanxf_sytu@163.com (Xiaofei Yan) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 46 2 Method of this paper 2.1 Data enhancement Image data augmentation is the technical processing of existing data to make the data realize greater value without adding data[5]. For the CT image axial plane that has been preprocessed, the data enhancement methods used include flipping and rotation (such as horizontal or vertical flipping, random angle rotation), image transformation (such as color transformation or affine transformation). Random rotation is to randomly select different angles to rotate right or left within the specified angle range; color transformation is to randomly select different brightness, contrast, saturation, or hue within a certain range to adjust the image; affine transformation is to randomly select different rotation angles, stagger angles, translation distances, scaling factors, etc. within a certain range to adjust the image. Some data enhancement results. Some data enhanced images are shown in Figure 1. Figure 1. Partial data enhanced image 2.2 Characteristic pyramid structure In computer vision, the detection of multiscale objects usually takes the image as the input after scaling at different scales, which is used to generate the feature combination of different scale information[6]. This method can effectively express various scale features of pictures, but it requires high computing power and memory of computers. Feature pyramid network (FPN) network is to establish a feature expression structure of different dimensions of pictures of the same size at all levels from bottom to top in the convolution neural network. It can effectively act on the typical convolution neural network model, to generate a feature map with more effective representation ability. In essence, it is a method to strengthen the feature expression of the backbone network. In the convolutional neural network of ResNet50, conv2_3, conv3_4, conv4_6, conv5_3 in ResNet 50 are used to rebuild FPN. The FPN network structure is shown in Figure 2. Figure 2. FPN structure diagram of ResNet50 47 2.3 Improved attention mechanism In computer vision, the attention mechanism calculates the correlation of different pixels or pixel blocks in a picture to obtain the salient feature information in the picture. Its essence is to obtain the weight distribution of image features, and the core purpose is to obtain key information[7]. Add the attention module before the residual block output. It can effectively calculate the correlation characteristics between feature maps, so that the feature map output to the next residual block contains the correlation characteristics between long-distance features. Because the attention module does not change the characteristics of the size between the input and output. Therefore, the parameter setting of the original network structure is not changed. As shown in Figure 3, the improved attention module is mainly added to the last residual block of conv3, conv4, and conv5 to realize the effective combination of attention module and residual network module. Figure 3. Residual structure of improved attention mechanism 2.4 Loss function and classification function of the model In the multi-classification medical image classification task using neural network, Softmax is generally used as the activation function of the output layer[8], and category is used as a loss function, cross entropy (multi-category cross entropy loss function) is defined in formula (1). 𝐿𝑜𝑠𝑠 = − ∑ 𝑦 log (𝑆(𝑙 )) (1) Where 𝑆(𝑙 ) represents the neuron corresponding to the output layer activated by the softmax function, 𝑦 represents the label corresponding to one hot coding, and the output layer contains k neurons corresponding to K categories[9]. Softmax is defined as formula(2), where Z is a vector and 𝑧 and 𝑧 is one of the elements. ( ) 𝑆𝑜𝑓𝑡𝑚𝑎𝑥 (𝑧 ) = ∑ (2) 3 Experiments and results 3.1 Experimental environment The experimental environment of this paper is set up in the Windows Server 2019 operating system. 48 The deep learning framework is TensorFlow (GPU version) 2.4.0, and the development language is Python 3.7. The computer is HP Z8G4 workstation, and the main experimental environment configuration is shown in Table 1. Table 1. experimental environment configuration Items details Processor Intel Xeon (R) Gold 6226, 2.7 GHz Hardware environment NVIDIA RTX 2080Ti GPU,Video memory 11GB Operating System Windows Server 2019 Deep learning framework Tensorflow (GPU version) 2.4.0 Keras General parallel computing architecture CUDA 11.0 3.2 Experimental process In the experiment, when the number of training iterations exceeds 35, the loss value of the verification set tends to be stable, and the number of training iterations in this paper is set to 35. As shown in Figure 4. Accuracy indicates the proportion of the correct quantity predicted by the model in the total quantity. The “acc” in the figure refers to the model training accuracy, val_acc refers to the accuracy of the model on the validation set. (a) accuracy change trend Figure (b) loss trend Figure 4. Model parameter training diagram Loss represents the loss value of the training set, val_loss represents the loss value of the validation set. In this experiment, we collected about 3000 CT images from the Internet. We showed 1350 CT images of COVID-19 infection, 300 CT images of other viruses, and 1350 CT images of healthy people to the model to test the robustness of the model. We make model predictions on these random images and record the prediction performance of the proposed model. Figure 5 shows the classification of CT images. Figure 5. CT image classification The model successfully detected 2903 CT images, including 1298 images of COVID-19 infection, 1221 images of viral pneumonia, and 184 normal images. The accuracy of the model is 90.1%. As the confusion matrix is shown in Figure 6. 49 Figure 6. Confusion matrix of the model 3.3 Comparison with the original RESNET classification network We also tested other neural network methods using the same data set. We established the same experimental setup for the baseline and the proposed method and trained and tested the method on similar data sets. After that, we compared these methods and recorded the speed, accuracy, and other performance indicators. When testing the speed of various methods, we find that the processing speed of the proposed method is 12.75 FPS on CPU and 39.56 FPS on GPU. However, the original ResNet50 model was 7.89 FPS on the CPU and 23.52 FPS on the GPU. This method is superior to the initial model in speed. The accuracy of the model proposed in this paper reaches 90.1% after epoch=35, just as shown in Table 2, while ResNet50 model shows 87% accuracy after epoch=35. The proposed model is 3.1% higher than the classical ResNet classification model. Table 2. Improved ResNet and ResNet comparison Model CPU speed(FPS) GPU speed(FPS) Accuracy (%) ResNet50 7.89 23.52 87 Improved ResNet50 12.75 39.56 90.1 4 Conclusions Accurate and rapid detection of COVID-19 is a challenging diagnostic task. This paper first uses ResNet50 as a pre-training model to study the classification of CT images of COVID-19. Then, based on ResNet50 model, feature fusion and improved attention mechanism are added. The improved ResNet50 is a lightweight and fast feature extraction model. The experimental results show that, compared with ResNet50 model, the improved network structure has fast training speed and 90.1% accuracy, which can meet the detection needs and reduce the pressure of medical workers to a certain extent. 5 Acknowledgments This work is supported by the Shandong Provincial Natural Science Foundation, China (No. ZR2020QF110) 6 References [1] WHO, Clinical management of severe acute respiratory infection when novel coronavirus ( 2019-nCoV ) infection is suspected: interim guidance, https://apps. who. int/ir- is/handle/10665/330893, 2021. 50 [2] LI Shixue, SHAN Ying, Review of research progress in COVID-19. Journal of Shandong University(Medical Edition),vol. 58 no. 3, pp. 19-25,2020. [3] WANG W, XU Y, GAO R, et al., Detection of SARS-CoV-2 in different types of clinical specimens, JAMA The Journal of the American Medical Association, vol.323 no.18, pp.1843- 1844.2020. [4] Tavare A N, Braddy A, Brill S, et al., Managing high clinical suspicion COVID-19 inpatients with negative RT-PCR: a pragmatic and limited role for thoracic CT, Thorax, vol. 75 no. 7, pp. 537-514, 2020. [5] Lei J,Li J,Li X,et al.CT Imaging of the 2019 Novel Coronavirus(2019- nCoV)Pneumonia[J].Radiology,2020, 295(1):18. [6] Pan Y, Guan H, Zhou S, et al., Initial CT findings and temporal changes in patients with the novel coronavirus pneumonia (2019-nCoV):a study of 63 patients in Wuhan, China. Eur Radiol, vol.30 no.6 pp. 3306-3309 ,2020. [7] He Kaiming, Zhang Xiangyu, Ren Shaoqiang, et al., Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 37 no. 9, pp. 1904-1919, 2014. [8] https://zhuanlan.zhihu.com/p/353235794. [9] WANG S H, FERNANDES S, ZHU Z, et al., AVNC: attention-based VGG-style network for COVID-19 diagnosis by CBAM, IEEE Sensors Journal, vol. 99 no.1.2020 51