Region Proposal Network for Lung Nodule Detection and Segmentation Mohammad Hesam Hesamian1, Wenjing Jia, Xiangjian He, Paul Kennedy Abstract. Lung nodule detection and segmentation play a critical sample. In such a case, the number of samples (pixels) corresponding role in detecting and determining the stage of lung cancer. This to the tumour is significantly lower than the rest of the lung area. The paper proposes a two-stage segmentation method which is capable class imbalance issue affects the fully convolutional networks [23] of improving the accuracy of detecting and segmentation of lung more than others. Moreover, the ratio of tumour pixels to background nodules from 2D CT images. The first stage of our approach pixels significantly varies from sample to sample. To address this proposes multiple regions, potentially containing the tumour, and problem, we propose an adaptive weighted loss to alleviate the class the second stage performs the pixel-level segmentation from the imbalance issue at the sample level. resultant regions. Moreover, we propose an adaptive weighting The main contributions of this paper can be summarized in three loss to effectively address the issue of class imbalance in lung CT folds. First, we propose a two-stage network to reduce the depen- image segmentation. We evaluate our proposed solution on a widely dency on dense sliding window searching for accurate segmentation adopted benchmark dataset of LIDC. We have achieved a promising of the lung nodules. Second, we propose an adaptive weighted loss result of 92.78% for average DCS that puts our method among the function to address the class imbalance issue to improve segmenta- top lung nodule segmentation methods. tion accuracy. Third, we perform a far distant transfer learning strat- egy in which the weights are transferred from a general object detec- key words: Nodule segmentation, Deep learning, Region proposal tion model. network 2 Related works 1 Introduction Two major categories of studies have explored lung image segmen- Lung cancer, as one of the deadliest cancers, is responsible for major tation. The first one is the whole lung segmentation, and the second cancer deaths worldwide [16]. Early detection and accurate classifi- is the lung tumour segmentation. Whole lung segmentation aims to cation of the lung nodules plays a significant role in increasing the distinguish the border of the entire lung from the rest of the elements survival rate of the patients. Manual process of detection and seg- appearing in a cardiac CT image [2, 5]. It helps to determine the mentation of lung nodule is a challenging task which requires lots size and shape of the lung. It is also often used as the first stage for of time, proficiency and yet various types of errors may occur. With the lung tumour segmentation with the purpose of reducing the false the increasing growth in the availability of medical images such as positive cases caused by non-lung areas of CT image. computed tomography (CT) scans, automatic nodule detection and Recently, many CAD systems based on deep learning are pro- segmentation has become a reliable tool to help the radiologist in posed for automatic lung cancer detection. For example, ZNET [24] their tough task of lung image analysis. employed U-Net fully convolutional network architecture for candi- Recently, convolutional neural networks (CNNs) have shown the date selection on axial slices. For the subsequent false positive reduc- capability to effectively extract image features for successful pattern tion, three orthogonal slices of each candidate were fed to the same detection and segmentation across a variety of situations from scene wide residual network. Wang et al. in [22] proposed a model that can to medical images [18, 3, 4]. Similarly, deep learning approaches capture a set of nodule-sensitive features from CT images. The 2D have been used for various tasks of medical image analysis, including branch of the model learns multi-scale 2D features from 2D patches. organ detection, lesion classification and tumour segmentation [13, 6, In the 3D branch, a novel central pooling layer helped the model 7]. Among all those applications, lung nodule segmentation is known to select the features around the target voxels effectively. In another to be a challenging task due to the heterogeneous appearance of the study, a region CNN (R-CNN) is proposed for lung nodule segmen- lung tumour and also the great similarity between tumour and non- tation from 3D lung patches [23]. In this model, Deep Active Self- tumour substances in the lung area. paced Learning (DASL) was introduced to reduce the dependency of Another severe challenge in medical image segmentation is to deal the network to fully annotated data. It utilized unannotated samples with the class imbalance [20]. In a fully annotated CT image of the by taking to account both the knowledge know before training and lung, the area occupied by tumour is much smaller than the rest of the knowledge made during the training. Jiang et al. [12] proposed the lung. This is due to the sparse distribution of pulmonary nodules a residually connected multiple resolution network, which was able in the lung. This issue gets more severe when we are performing a to combine the features in various resolution inputs simultaneously. semantic segmentation task in which each pixel is considered as one Images with different resolution were passed through two separate 1 School of Electrical and Data Engineering, University of Technology Syd- residual networks, and the extracted features were refined and con- ney. Email: mh.hesamian@gmail.com catenated. This technique helped them to improve the localization Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Feature RPN ROI Segmentation Extractor Pooling Figure 1. The proposed network model and negative effect of multiple pooling operation. false positive is reduced [24]. But in our approach, inspired by the Generally, 3D models have a huge demand for memory and high development of general object detection [8], we propose a two-stage processing cost. Due to these limitations, the algorithms that can method capable of nodule detection and segmentation. The model be implemented on 3D image analysis are restricted. Moreover, CT segments the elements of CT scan into two classes of tumour and scans usually have different slice thicknesses which are not recom- background. The general building block of the network is presented mended to be treated uniformly in a 3D model [24]. Moreover, 2D in Fig. 1. The first section of the network is the feature extractor in models require fewer resources for training and also not affected by which general feature maps are created. The ResNet101 [9] is used as the slice thickness. the backbone of the system. The residual connections of the ResNet In two-stage detections methods, region proposal networks (RPN) allow using deeper structures without facing the gradient vanishing attracted lots of attention. RPN was first introduced to general object problem. detection tasks by [19]. In the proposed structure, there is a classifier The extracted feature maps are then passed to the RPN. By this which determines the probability of having a target object at the an- method, the convolution layers are reuses. Thus, it saves lots of com- chor. Then the regression regresses the coordinates of the proposal. putations. This module produces several candidate regions, contain- The candidature multiple anchors and sharing the features make the ing the potential malignant tumours. Since our model has only two RPN efficient in time and detection. Employing the bounding box classes of tumour and background, we limit the number of proposed regression enables the RPN to produce more precise proposals. RPN regions to 300. Having a limited number of regions will help to re- has many successful variation and application[17, 15] yet it has not duce the possibility of false positive as well as speed up the training been explored adequately in lung nodule segmentation task. process. Each of the candidate bounding boxes is then given to the According to all shortcomings of the current methods mentioned ROI pooling. At this stage, the network evaluates the proposed re- above, and the enormous potential of the RPN networks, we were gions according to intersection over union (IoU) value. The IoU is motivated to design and develop a model to combine the benefits of set to 0.5 to reduce the false positive. It means a fixed number of two-stage detection and segmentation models with higher segmenta- the ROIs (50) with the IoU more than 0.5 will be passed to the seg- tion accuracy. mentation module for semantic segmentation task. In case of no ROI satisfying the condition, the input sample is considered as a negative sample. The last block of the network will produce the segmentation 3 The Proposed Method mask for the proposed ROI. There are several image modalities, which can be used for lung abnormality detection such as PET, SPECT, X-Ray, MRI and CT. 3.2 Adaptive Loss Coefficients PET and SPECT are mainly utilized for metabolism characterization. Therefore, they can unveil the functional abnormalities in an organ. One of the main reasons of accuracy drop in nodule segmentation is CT, X-Ray and MRI are structural image modalities which are able the class imbalance of the training samples, due to which, during the to hand out anatomical information about the organ. The simplicity training model will not learn equally from all classes. Therefore, one and lower price of CT scan have changed it to primary image modal- of the main challenges in medical image analysis is how to effec- ity for cancer detection and screening [25]. For this study, we used tively modify the model to overcome the class imbalance issue and the benchmark dataset of LIDC-IDRI [1], consisting of 1024 chest maximize the learning capability of the model. CT scans. Each case is associated with an XML file, denoting the In the segmentation section of the proposed model, samples will boundary pixels of the tumour, marked by four radiologists. For this be segmented in the two classes of background and tumour. Thus, study, the nodules with size > 3mm are selected. These scans are we applied a binary cross-entropy function as the segmentation loss. broken down to slices and converted to JPG format in the original This is shown in Eq. 1. size. 1 X N   L(S) = − yn log yˆn + (1 − yn ) log(1 − yˆn ) (1) N n=1 3.1 Network Structure In this equation, y and ŷ represent the ground truth and the predicted There are some two-stage nodule detection systems, where in the first value of each pixel, and N is the total number of the pixels in the stage the nodule candidates are detected, and in the second stage, the given sample of S. Figure 2. Visualization of the network output. The green color is the ground truth marked by the radiologists. The yellow is the prediction, generated by the proposed model. Best viewed in color. Then the dynamic loss coefficients of γ1 and γ2 are calculated Similar to most of the other medical datasets, our data was also for each of the given ROIs representing the proportion of tumour imbalanced toward the tumour class. Training a model on such data and background pixels. These coefficients compensate the imbalance will lead to learning more features from one class and ignoring the between the numbers of pixels in the two classes. others. As another technique to combat this issue, we have selected Since in our model, each pixel is considered as one sample, the a patch-wise training strategy. For this purpose, we have extracted loss will be calculated at the pixel level. Therefore, the final loss the patches around the tumour and use them as training samples. Ac- function of Eq. 2 is formed by applying the dynamic loss coefficient cording to nodule size distribution statistics, 85% of nodules can be to the Eq. 1. The loss is calculated for each pixel of pi , where pi covered by a patch of 30 × 30 (voxels), and 99% of nodules can be denotes the ith pixel of a given flattened ROI. covered by a 40 × 40 (voxels) patch [24]. Hence, we have extracted  h i the patches of 76 × 76 pixel to ensure almost all of the tumours are   γ1 × − y(pi ) log ŷ(pi ) , if pi ∈ tumour covered. At the next stage, we heavily augment our training data by    using filipping, rotating, zooming and shrinking the input samples. Loss(pi ) = Data augmentation is reported that data augmentation can also help     h i to improve the performance and robustness of CNNs [10]. This data γ2 × − y(pi ) log(1 − ŷ(pi )) , if pi ∈ bg  augmentation serves two purposes of avoiding the overfitting and ex- (2) tending the training dataset size. The proposed weighted loss updates its coefficients for each ROI proposed by ROI selection module. This process helps to address the class imbalance more accurately because the proportion of tumour to 4.2 Training background varies significantly from sample to sample. The common weighting loss such as what proposed in [11] applies a fixed set of We have employed a two-step full network adaptation strategy to the coefficient for all the samples. These coefficients are calculated transfer the weights from a far distant source. In this process, by counting and averaging the tumour pixel and background pixels the weights are initialized from a pre-trained model trained on over the entire dataset. Application of a fixed set of weight to all the COCO [14] dataset for the general object detection task. Transfer- samples does not seem to be an optimal solution while the adaptive ring the weights from such model and using them for a different weighted loss allows the network to address the class imbalance more task of medical object segmentation, is called ‘far transfer learning’. effectively within individual samples. Generally, transfer learning is proven to alleviate the overfitting issue on the small training dataset and improve the convergence speed of the training. Theoretically, transfer learning has better performance 4 Experiments and Discussion when the task of source and target models are more similar [10]. In this section, we experimentally evaluated our solution on the Thus, some believe that far transfer learning may not produce good widely used benchmark dataset LIDC [1]. Our model is implemented results [21] but, our achieved results are demonstrating that far trans- with the Keras library, and all the experiments are conducted on fer learning combined with a careful fine-tuning strategy can deliver Linux (REH7.0) with an Nvidia Quadro P5000 GPU of 16 GB mem- competitive results. ory. As a part of our weight transferring, the weights for the feature extractor layers are initialized from the pre-trained model, and the last layers are initialized randomly. During the first stage of training, 4.1 Data preparation all the network weights except for the feature extractor weights are We evaluated our solution on the publicly available dataset of fixed. At this stage, only the feature extractor is trained on the input LIDC [1], which contains 1024 lung scans each annotated by at least data for a couple of epochs. The rest of the network weights are in- four radiologists. All data are selected and used for training and test- jected at the second stage of the training, and all the network layers ing of the network with the 5-fold cross validation approach. are trained together afterwards. Table 1. The detection rate of nodules of various sizes, obtained with our proposed approach. Measurements (%) Size IoU=0.3 IoU=0.4 IoU=0.5 IoU= 0.6 Tumour size <10mm 97.54 ± 0.56 95.81 ± 0.67 92.32 ± 1.07 84.76 ± 1.55 10 < Tumour size < 30 mm 98.13 ± 0.72 97.56 ± 0.64 95.57 ± 1.06 92.04 ± 0.95 Tumour size > 30mm 95.91 ± 4.56 94.25 ± 4.21 93.26 ± 4.46 90.16 ± 7.95 Average 97.21 ± 1.95 95.87 ± 1.84 93.73 ± 2.19 88.99 ± 3.78 5-fold cross-validation is employed to validate the results of the T P and F P refer to true positive and false positive while T N and testing. The data set is divided into five equal subsets, and each time F N denote the true negative and false negative. four of them are used for training, and the one remaining is used for The results presented in the Table 2 show that the proposed method testing. The LIDC dataset has not a separate set of train and test set. is able to perform the challenging task of lung nodule segmenta- Hence, we divided the data into two main sets of train and test for the tion with higher accuracy than the state-of-the-art methods. The dice purpose of 5-fold cross-validation. Moreover, a portion of training score of our segmentation model is almost 10% higher than the listed data (10%) has already been used for validation at the end of each methods. Similarly, the precision of our method is much higher than epoch. By this method, we will ensure that all the samples are used all the listed methods. In case of the recall, we got slightly (less than at least once for training and testing. 1 percent) lower than CF-CNN, but CF-CNN produced wider stan- dard deviation. It shows that our model was more stable throughout 4.3 Experiment results all the input cases. Fig. 2 shows the qualitative results of tumours detected and seg- mented for various tumour types and sizes. 4.3.3 Discussion To highlight the performance of our proposed model, we quan- titatively evaluate its performance for two tasks, i.e., detection and The main reason for this improvement is the structure of the network. segmentation, respectively. In the proposed structure, at the first stage, some ROIs are extracted as the potential tumour. Then, the most tumour-like ROIs are selected through the ROI pooling module. Then the second stage performs the 4.3.1 Results of tumour detection segmentation on the selected ROIs. This two-step strategy eliminates Our proposed method performs the detection task through a segmen- the necessity to scan the entire input image via a sliding window tation approach. This detection method differentiate our model from that leads to creating the prediction at every position. By performing the pure detection models. For the detection part, we measure the de- these two steps, many of the nodule-like patterns which may confuse tection accuracy of the tumours under various IoU values of 0.3, 0.4, the model are eliminated. Therefore, the accuracy of segmentation 0.5 and 0.6. Table 1 shows the results of tumour detection for three is improved by only focusing on the more relevant tumour features. tumour size categories. The results are presented with various IoU Moreover, the training and testing speed will increase as there would values to analyse the detection performance clearly. For instance, in be no full input search via the sliding window. the case of IoU = 0.5, if the IoU of prediction mask with the ground The second reason for the improved accuracy is the application truth is more than 0.5, the tumour is considered as correctly detected. of novel adaptive loss in training of the segmentation module. This As expected, the detection accuracy is dramatically increased when adaptive loss helps to address the class imbalance issue of the medi- the tumour size increased. cal images within the individual sample. By application of a dynamic The results in Table 1 highlights the significant detection accuracy pair of class weight coefficients, the model can derive more balanced of the proposed model in detecting the small size tumours. features from each sample. The transfer learning strategy used for training of the model helped the model to be more stable and prevent it from overfitting. More- 4.3.2 Results of tumour segmentation over, one of the reasons for achieving a smaller standard deviation To evaluate the segmentation performance, we measure the Dice value compared with other studies could be the application of trans- score of segmentation (DCS), sensitivity and positive predictive fer learning. value (PPV) as defined as in Eqs. 3, 4 and 5, respectively, as: 2T P 5 Conclusion DSC = (3) 2T P + F P + F N We have proposed a two-stage framework for accurate segmentation of the lung nodules. The first stage of the network provides some TP potential nodule areas. The second stage accurately segments the se- Recall = (4) lected ROIs. To effectively address the class imbalance issue of the TP + FN small organ segmentation, we proposed an adaptive weighted loss and function, where the weight coefficients are calculated per sample. TP This approach leads to extract more accurate features and therefore, P recision = (5) TP + FP more precise segmentation of the input. The model was tested on the Table 2. Quantitative evaluation of the achieved results and comparison. Models marked with * use 3D processing. Measurements Methods Dice (%) Recall (%) Precision (%) Nodule R-CNN* [23] 64 ± 0.44 - - Hesamian et al. [11] 81.24 ± 1.24 - 79.75 ± 4.08 CF-CNN* [22] 80.47 ± 10.76 92.75 ± 12.83 75.84 ± 13.14 Jiang et al. [12] 68 ± 0.23 85 ± 0.13 67 ± 0.22 Proposed method 92.78 ± 0.1 92.31 ± 0.27 93.17 ± 0.18 publicly available dataset of LIDC and was able to deliver an aver- lung nodule’, in ICASSP 2019-2019 IEEE International Conference age detection accuracy of 93.73% (IoU = 0.5) and average dice score on Acoustics, Speech and Signal Processing (ICASSP), pp. 1015–1019. IEEE, (2019). of 92.78% for the segmentation part. At last, the achieved results [12] Jue Jiang, Yu-Chi Hu, Chia-Ju Liu, Darragh Halpenny, Matthew D demonstrate that a far distant transfer learning with careful weight Hellmann, Joseph O Deasy, Gig Mageras, and Harini Veeraraghavan, initialization can perform competitive results. ‘Multiple resolution residually connected feature streams for automatic lung tumor segmentation from ct images’, IEEE transactions on medi- cal imaging, 38(1), 134–144, (2018). REFERENCES [13] Yang Lei, Tonghe Wang, Sibo Tian, Xue Dong, Ashesh B Jani, David [1] Samuel G Armato III, Geoffrey McLennan, Luc Bidaut, Michael F Schuster, Walter J Curran, Pretesh Patel, Tian Liu, and Xiaofeng Yang, McNitt-Gray, Charles R Meyer, Anthony P Reeves, Binsheng Zhao, ‘Male pelvic multi-organ segmentation aided by cbct-based synthetic Denise R Aberle, Claudia I Henschke, Eric A Hoffman, et al., ‘The mri’, Physics in Medicine & Biology, 65(3), 035013, (2020). lung image database consortium LIDC and image database resource [14] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Per- initiative (idri): a completed reference database of lung nodules on ct ona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick, ‘Microsoft scans’, Medical physics, 38(2), 915–931, (2011). coco: Common objects in context’, in European conference on com- [2] Geng Chen, Dehui Xiang, Bin Zhang, Haihong Tian, Xiaoling Yang, puter vision, pp. 740–755. Springer, (2014). Fei Shi, Weifang Zhu, Bei Tian, and Xinjian Chen, ‘Automatic patho- [15] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott logical lung segmentation in low-dose ct image using eigenspace sparse Reed, Cheng-Yang Fu, and Alexander C Berg, ‘Ssd: Single shot multi- shape composition’, IEEE transactions on medical imaging, (2019). box detector’, in European conference on computer vision, pp. 21–37. [3] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Mur- Springer, (2016). phy, and Alan L Yuille, ‘Deeplab: Semantic image segmentation with [16] Kimberly D Miller, Ann Goding Sauer, Ana P Ortiz, Stacey A Fedewa, deep convolutional nets, atrous convolution, and fully connected crfs’, Paulo S Pinheiro, Guillermo Tortolero-Luna, Dinorah Martinez-Tyson, IEEE transactions on pattern analysis and machine intelligence, 40(4), Ahmedin Jemal, and Rebecca L Siegel, ‘Cancer statistics for hispan- 834–848, (2018). ics/latinos, 2018’, CA: a cancer journal for clinicians, 68(6), 425–445, [4] Noel CF Codella, David Gutman, M Emre Celebi, Brian Helba, (2018). Michael A Marchetti, Stephen W Dusza, Aadi Kalloo, Konstantinos Li- [17] Joseph Redmon and Ali Farhadi, ‘Yolo9000: better, faster, stronger’, opyris, Nabin Mishra, Harald Kittler, et al., ‘Skin lesion analysis toward in Proceedings of the IEEE conference on computer vision and pattern melanoma detection: A challenge at the 2017 international symposium recognition, pp. 7263–7271, (2017). on biomedical imaging ISBI, hosted by the international skin imaging [18] Joseph Redmon and Ali Farhadi, ‘Yolov3: An incremental improve- collaboration ISIC’, in Biomedical Imaging (ISBI 2018), 2018 IEEE ment’, arXiv preprint arXiv:1804.02767, (2018). 15th International Symposium on, pp. 168–172. IEEE, (2018). [19] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, ‘Faster r-cnn: [5] Yu Gordienko, Peng Gang, Jiang Hui, Wei Zeng, Yu Kochura, Towards real-time object detection with region proposal networks’, in O Alienin, O Rokovyi, and S Stirenko, ‘Deep learning with lung seg- Advances in neural information processing systems, pp. 91–99, (2015). mentation and bone shadow exclusion techniques for chest X-Ray anal- [20] Skylar Stolte and Ruogu Fang, ‘A survey on medical image analysis in ysis of lung cancer’, in International Conference on Theory and Appli- diabetic retinopathy’, Medical Image Analysis, 101742, (2020). cations of Fuzzy Systems and Soft Computing, pp. 638–647. Springer, [21] Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and (2018). Manohar Paluri, ‘Deep end2end voxel2voxel prediction’, in Proceed- [6] Nazia Hameed, Antesar M Shabut, Miltu K Ghosh, and M Alamgir ings of the IEEE Conference on Computer Vision and Pattern Recogni- Hossain, ‘Multi-class multi-level classification algorithm for skin le- tion Workshops, pp. 17–24, (2016). sions classification using machine learning techniques’, Expert Systems [22] Shuo Wang, Mu Zhou, Zaiyi Liu, Zhenyu Liu, Dongsheng Gu, Yali with Applications, 141, 112961, (2020). Zang, Di Dong, Olivier Gevaert, and Jie Tian, ‘Central focused con- [7] Sardar Hamidian, Berkman Sahiner, Nicholas Petrick, and Aria volutional neural networks: Developing a data-driven model for lung Pezeshk, ‘3D convolutional neural network for automatic detection of nodule segmentation’, Medical image analysis, 40, 172–183, (2017). lung nodules in chest CT’, in Medical Imaging 2017: Computer-Aided [23] Wenzhe Wang, Yifei Lu, Bian Wu, Tingting Chen, Danny Z Chen, and Diagnosis, volume 10134, p. 1013409. International Society for Optics Jian Wu, ‘Deep active self-paced learning for accurate pulmonary nod- and Photonics, (2017). ule segmentation’, in International Conference on Medical Image Com- [8] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick, ‘Mask puting and Computer-Assisted Intervention, pp. 723–731. Springer, r-cnn’, in Proceedings of the IEEE international conference on com- (2018). puter vision, pp. 2961–2969, (2017). [24] Hongtao Xie, Dongbao Yang, Nannan Sun, Zhineng Chen, and Yong- [9] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, ‘Deep resid- dong Zhang, ‘Automated pulmonary nodule detection in ct images us- ual learning for image recognition’, in Proceedings of the IEEE confer- ing deep convolutional neural networks’, Pattern Recognition, 85, 109– ence on computer vision and pattern recognition, pp. 770–778, (2016). 119, (2019). [10] Mohammad Hesam Hesamian, Wenjing Jia, Xiangjian He, and Paul [25] Junjie Zhang, Yong Xia, Hengfei Cui, and Yanning Zhang, ‘Pulmonary Kennedy, ‘Deep learning techniques for medical image segmenta- nodule detection in medical images: a survey’, Biomedical Signal Pro- tion: Achievements and challenges’, Journal of digital imaging, 1–15, cessing and Control, 43, 138–147, (2018). (2019). [11] Mohammad Hesam Hesamian, Wenjing Jia, Xiangjian He, and Paul J Kennedy, ‘Atrous convolution for binary semantic segmentation of