Segmentation of Lungs, Lesions, and Lesion Types on Chest CT Scans of Patients with Covid-19? Daria Lashchenova1[0000−0002−1894−9877] , Alexander 1,3[0000−0001−9818−3770] Gromov , Anton Konushin1,2[0000−0002−6152−0021] , and Anna Mesheryakova3[0000−0002−2409−0018] 1 Lomonosov Moscow State University, Moscow, Russia {daria.laschenova, alexander.gromov, anton.konushin}@graphics.cs.msu.ru 2 NRU Higher School of Economics, Moscow, Russia 3 Third Opinion Platform LLC, Moscow, Russia {alexander.gromov, ceo}@3opinion.ai Abstract. The covid-19 pandemic has quickly spread all over the world, over- whelming public healthcare systems in many countries. In this situation demand for automatic assistance systems, to facilitate and accelerate a doctor’s job has rapidly increased. Antibody tests were introduced for diagnosing covid-19, but physicians still need tools for quantification of disease severity, since treatment choice strongly depends on it. To estimate the severity of the disease physicians use computer tomography scans. It provides physicians with information about lung lesions and their types and they use this information to determine proper treatment. In this paper we made an attempt to build a system that uses pa- tients’ computer tomography scans for lung and lesion segmentation and for seg- mentation of specific types of lesions (i.e. pulmonary consolidation and “crazy- paving”). Models for lung, lesions, consolidation, and “crazy-paving” segmenta- tion performed with 0.96, 0.65, 0.48, 0.45 Dice coefficients respectively. Also it was shown that removing images with inaccurate ground-truth from the training subset can improve the quality of models trained on it. Keywords: Covid-19, CT, Lesion segmentation, Lung segmentation, Deep learn- ing 1 Introduction Covid-19 is an infectious disease caused by severe acute respiratory syndrome coro- navirus 2 (SARS-CoV-2) that has a considerable mortality rate. Quick spread of this disease caused a pandemic, which overwhelmed healthcare systems in a large number Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). ? Supported by Third Opinion Platform, LLC. Publication supported by RFBR grant 20-01- 00547 2 D. Lashchenova et al. of countries. As of 1 July 2020, 10.5M cases were confirmed worldwide and 512 thou- sands of patients died, so the average mortality rate was about 5%. In Russia it was 1.5% (9.5 thousands deaths for 654 thousands cases). Early diagnosis can reduce the time and intensity of medical treatment. This can be achieved by the use of computer tomography (CT). CT is a radiography in which a three-dimensional image of a body structure is constructed by computer from a se- ries of plane cross-sectional images made along an axis. CT provides physicians with information about the lesions in lungs. Then they can calculate the percentage of lung opacity and identify the type of the lesion so they can select appropriate treatment for the patient and monitor the course of the disease. Computed tomography (CT) is considered to be a primary tool for giving diagnosis of covid-19 and evaluation of the disease progression. CT is a radiography in which a three-dimensional image of a body structure is constructed by computer from a series of plane cross-sectional images made along an axis. CT provides radiologists with in- formation about disease features, or radiographic findings. These features are examined in order to identify their type and volume. Then this information is used for selection of appropriate treatment for the patient and monitoring the course of the disease. The primary CT findings of covid-19 have been reported in [10]. They include pul- monary consolidation, “ground-glass” opacity and “crazy-paving” pattern. Pulmonary consolidation is a region of normally compressible lung tissue filled with liquid instead of air. “Ground-glass” opacity is a descriptive term referring to an area of increased attenuation in the lung on CT scans with preserved bronchial and vascular markings. “Crazy-paving” pattern refers to the appearance of “ground-glass” opacity with super- imposed interlobular septal thickening and intralobular septal thickening. The lack of sufficient method for lesion volume estimation and an enormous amount of CT scans for analysis increased demand for supervising systems. The aim of this study was to create a solution for segmentation of lungs and ab- normal regions of lungs and for detecting regions with pulmonary consolidation and “crazy-paving” pattern. 2 Related work As covid-19 has quickly been spreading, computer vision scientists started to search for solutions that could help physicians diagnose their patients. There were several areas to research. Before the mass use of antibody tests some scientists would try to find out if a patient had covid-19, using only their CT scans. Linda Wang [9] introduced COVID- Net, a neural network architecture that could classify if a person was ill and distinguish covid-19 from non-covid-19 pneumonia. Xuehai He [2] used CRNet [5] to detect covid- 19. Xiaolong Qi [6] estimated how much time the patient would spend in a hospital using only their CT. But those studies could not be applied for tracking patients’ condition. To achieve that, some scientists concentrated on a segmentation task, detecting lungs and areas with lesions. This information could then be used for measuring the percentage of lung opacity to quantify severity of the disease. Segmentation could be performed on 2D Segmentation of Lungs and Lesions on Chest CT Scans 3 horizontal slices of CT or on full 3D scans. Lu Huang [3] used U-net [7] architecture on horizontal slides of CT scans. Shuo Jin [4] used 3D Unet++ [11] on CTs with a different slice thickness. Fei Shan [8] proposed 3D-model VB-Net for segmentation of lesions, lung lobes and lung segments, training the model with human-in-the-loop strategy. 3 Dataset The dataset was provided by Third Opinion Platform [1]. It contained 529 studies of pa- tients with and without covid-19. Studies came in dicom format, clipping to the window (according to window height and window width dicom parameters) was not performed. Fig. 1: Examples of horizontal slices of CT. 10-20 horizontal slices from each study were assessed giving us 10454 images. Radiol- ogists assessed areas with lungs, regions of pulmonary consolidation and regions with “ground-glass” opacity, particularly with “crazy-paving”. Fig. 2: Examples of horizontal slices of CT. The dataset was split into training and testing subsets, containing 85% and 15% studies respectively. 4 D. Lashchenova et al. Table 1: Dataset statistics. Class Number of images with the class Number of studies with the class lung 7873 528 lesion 5303 464 “ground-glass” 4990 461 consolidation 489 287 “crazy-paving” 1691 114 Radiologists used an assessment tool to draw polygons around regions of interest. As they tend to draw areas with smoothed boundaries, while models for segmentation calculate more precise masks, it is not expected to get results close to ideal according to quality metrics. 4 Metrics For model evaluation three metrics were used: mean AP over test studies, IoU and Dice coefficient. Model’s recall is calculated as the number of true positive pixels divided by the number of all positive pixels. Model’s precision is calculated as the number of true positive pixels divided by the number of pixels, predicted as positive. AP is calculated as an area under precision-recall curve, where each point on the curve plot represents precision and recall of results with a threshold corresponding to the point. IoU measures how much ground truth and predicted areas overlap: TP IoU = (1) FN + TP + FP Dice coefficient is often used to evaluate the quality of segmentation of medical images: 2T P Dice = (2) F N + 2T P + F P TP is the number of true positive pixels, FN – number of false negative pixels, FP – number of false positive pixels. 5 Proposed method Due to the fact that only 20% horizontal slices from each CT were assessed, it was necessary to use a neural network for 2D segmentation of images. In this work U-net [7] was used. It takes horizontal slices of lung CT and returns probabilities of belonging to a certain class for each pixel. At first, image normalization is performed on each slice image: Iorig − µ Inorm = (3) σ Segmentation of Lungs and Lesions on Chest CT Scans 5 where mu is the mean of the image and sigma is its standard deviation. It was done to unify information from different X-ray machines, as due to different settings they can produce information in different ranges of values. Then random spatial transformations (horizontal flipping, shifting, scaling, rotating) were applied on every slice to augment images. Segmentation of lungs and regions with lesions. The first task was to segment lesions, so it could be possible to calculate the percentage of lung opacity. Diseased areas and lungs are segmented on each horizontal slice. Several losses were used to train U-net. For training with binary cross entropy loss and Dice loss models predicted two masks: probability of lung in the pixel and proba- bility of affected lung (either pulmonary consolidation or “ground-glass” opacity). An- other model was trained using softmax cross entropy loss. It predicted 3 classes: back- ground, healthy lung and affected lung, so the mask with background was added and lung class was replaced by healthy lung. In the table 2 results for lung segmentation and lesion segmentation are shown. Table 2: Results for segmentation of lungs and lesions. mAP IoU Dice mAP IoU Dice Experiment (lung) (lung) (lung) (lesion) (lesion) (lesion) BCE loss 0.99591 0.94131 0.9694 0.784 0.51041 0.63645 Dice loss 0.99494 0.94038 0.96898 0.78404 0.52256 0.65046 Softmax CE loss 0.992 0.93689 0.96713 0.78494 0.50343 0.62957 The best result for lesion class by all metrics was shown by the model that was trained with Dice loss. Models with BCE loss and Dice loss showed similar results for lung class. Segmentation of regions with pulmonary consolidation and “crazy-paving”. The second task was to identify different types of pathologies. Experiments showed that the use of samplers was necessary in this task, as the majority of images did not con- tain consolidation or “crazy-paving”, so training was unstable as the model received an enormous amount of negative examples. Several models were trained. The first one (U4) predicted four binary masks for the following classes: lungs, consolidation, “ground-glass” opacity, and “crazy-paving”. During training batch size was divisible by 4. In every four elements of batch the first element contained image with consolidation, the second — “crazy-paving”, the third — “ground-glass” opacity and the fourth contained image with healthy lungs with 95% probability and image without lungs (i.e. slices from the top or the bottom of CT) with 5% probability. The second model (C) predicted two binary masks: mask for lung and for consolida- tion probabilities. Sampler was used so each sample in a batch during training contained 6 D. Lashchenova et al. consolidation with 95% probability, lungs without consolidation with 2.5% probability and no lungs with 2.5% probability. The third model (CP) was similar to the second, but made predictions for “crazy-paving”. All three models were trained with binary cross entropy loss, Dice loss, and softmax cross entropy loss. For training with CE loss, the model was supposed to predicted the background class and unaffected lung class instead of the lung class. Table 3: Results for consolidation segmentation. Experiment mAP IoU Dice U4 BCE 0.51862 0.42743 0.48593 U4 Softmax CE 0.5064 0.40548 0.47363 C BCE 0.54077 0.2652 0.33648 C Dice 0.49091 0.25771 0.32907 C Softmax 0.52354 0.24553 0.31573 Table 4: Results for “crazy-paving” segmentation. Experiment mAP IoU Dice U4 BCE 0.3785 0.42743 0.45657 U4 Softmax CE 0.36422 0.42337 0.44795 CP BCE 0.386 0.2231 0.25564 CP Dice 0.3333 0.17061 0.20176 CP Softmax 0.3724 0.1869 0.21889 Tables 3, 4 show results for consolidation and “crazy-paving”. The best result was shown by the united model with four classes for prediction with binary cross entropy loss. 6 Analysis of results For U4 BCE model confusion matrices for train and test subsets were calculated. As figure 4 shows, on both train and test subsets the model confuses lesion classes and healthy lungs. This can be explained by the fact that some assessors tend to draw inaccurate masks, while models produce precise result. Also figure 4 shows, that on both train and test subsets the model significantly confuses “ground-glass” opacities that are not “crazy-paving” and the “crazy-paving”. It also confuses consolidation and “ground-glass”. This could be explained by the presence of ambiguous examples in the dataset and/or noisy assessment, examples are presented on figure 5. To check the latter hypothesis three models (U4 BCE, C BCE, C Dice) were applied to the train dataset. Images from the train dataset were blocklisted if IoU of Segmentation of Lungs and Lesions on Chest CT Scans 7 Fig. 3: Examples of U4 model’s work. Red color is used for consolidation, blue for “crazy-paving”, green for “ground-glass”. On the left there is an original image, in the middle ground truth marking. On the right result of U4 BCE model’s work. Train set CM Test set CM 0.996 0.004 0.000 0.000 0.000 0.997 0.003 0.000 0.000 0.000 B B 0.8 0.8 0.025 0.962 0.010 0.001 0.001 0.028 0.952 0.016 0.001 0.002 L L Ground truth Ground truth 0.6 0.6 0.010 0.325 0.583 0.030 0.051 0.009 0.367 0.539 0.034 0.051 G G 0.4 0.4 0.022 0.206 0.019 0.751 0.002 0.012 0.098 0.393 0.403 0.094 CP CP 0.2 0.2 0.039 0.121 0.169 0.011 0.661 0.074 0.138 0.271 0.061 0.456 C C B L G CP C B L G CP C Model predictions Model predictions Fig. 4: Confusion matrices of U4 BCE model for train and test subsets. B refers to background, L to healthy lungs, G to “ground-glass”, CP to “crazy-paving”, and C to consolidation. Table 5: Results for consolidation segmentation after dataset reduction. Experiment mAP IoU Dice C BCE 0.54077 0.2652 0.33648 C BCE (*) 0.54998 0.30107 0.37059 8 D. Lashchenova et al. Fig. 5: Examples of inaccurate assessment. consolidation class was lower than 0.2 in any of those models. Then new model C BCE (*) was trained on a new dataset. As results show even rough cleaning of the dataset could improve the result of training. Training of a “crazy-paving” model and a united model was not performed, because after dataset reduction too few examples of “crazy-paving” remained. 7 Conclusion In this study we presented a solution for segmentation of lungs, lesions, pulmonary con- solidation and “crazy-paving” with reasonable quality. Experiments showed that binary cross entropy loss works better for training models that could distinguish different types of pathologies, but gives slightly inferior results than the model trained with Dice loss for segmentation of general classes such as lungs and lesions. Then it was shown that usage of noisy data during training can decrease the quality of a model and discarding such data from the training subset can improve the quality of a model. References 1. “Third Opinion Platform” Limited Liability Company, https://thirdopinion.ai/ 2. He, X., Yang, X., Zhang, S., Zhao, J., Zhang, Y., Xing, E., Xie, P.: Sample-efficient deep learning for covid-19 diagnosis based on ct scans. medRxiv (2020) 3. Huang, L., Han, R., Ai, T., Yu, P., Kang, H., Tao, Q., Xia, L.: Serial quantitative chest ct assessment of covid-19: Deep-learning approach. Radiology: Cardiothoracic Imaging 2(2), e200075 (2020) 4. Jin, S., Wang, B., Xu, H., Luo, C., Wei, L., Zhao, W., Hou, X., Ma, W., Xu, Z., Zheng, Z., et al.: Ai-assisted ct imaging analysis for covid-19 screening: Building and deploying a medical ai system in four weeks. medRxiv (2020) 5. Liu, W., Zhang, C., Lin, G., Liu, F.: Crnet: Cross-reference networks for few-shot segmenta- tion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition. pp. 4165–4173 (2020) 6. Qi, X., Jiang, Z., Yu, Q., Shao, C., Zhang, H., Yue, H., Ma, B., Wang, Y., Liu, C., Meng, X., et al.: Machine learning-based ct radiomics model for predicting hospital stay in patients with pneumonia associated with sars-cov-2 infection: A multicenter study. medRxiv (2020) Segmentation of Lungs and Lesions on Chest CT Scans 9 7. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical im- age segmentation. In: International Conference on Medical image computing and computer- assisted intervention. pp. 234–241. Springer (2015) 8. Shan, F., Gao, Y., Wang, J., Shi, W., Shi, N., Han, M., Xue, Z., Shi, Y.: Lung infection quantification of covid-19 in ct images with deep learning. arXiv preprint arXiv:2003.04655 (2020) 9. Wang, L., Wong, A.: Covid-net: A tailored deep convolutional neural network design for de- tection of covid-19 cases from chest x-ray images. arXiv preprint arXiv:2003.09871 (2020) 10. Zheng, C.: Time course of lung changes at chest ct during recovery from coronavirus disease 2019 (covid-19). Radiology 295, 715–721 (2020) 11. Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: Unet++: A nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multi- modal Learning for Clinical Decision Support, pp. 3–11. Springer (2018)