=Paper=
{{Paper
|id=Vol-2886/paper1
|storemode=property
|title=Multi-Center Polyp Segmentation withDouble Encoder-Decoder Networks
|pdfUrl=https://ceur-ws.org/Vol-2886/paper1.pdf
|volume=Vol-2886
|authors=Adrian Galdran,Gustavo Carneiro,Miguel A. Gonzรกlez Ballester
|dblpUrl=https://dblp.org/rec/conf/isbi/GaldranCB21
}}
==Multi-Center Polyp Segmentation withDouble Encoder-Decoder Networks==
Multi-Center Polyp Segmentation with Double Encoder-Decoder Networks Adrian Galdrana , Gustavo Carneirob and Miguel A. Gonzรกlez Ballesterc,d a Bournemouth University, Bournemouth, United Kingdom b University of Adelaide, Adelaide, Australia c BCN Medtect, Dept. of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain d Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain Abstract Polyps are among the earliest sign of Colorectal Cancer, with their detection and segmentation repre- senting a key milestone for automatic colonoscopy analysis. This works describes our solution to the EndoCV 2021 challenge, within the sub-track of polyp segmentation. We build on our recently devel- oped framework of pretrained double encoder-decoder networks, which has achieved state-of-the-art results for this task, but we enhance the training process to account for the high variability and hetero- geneity of the data provided in this competition. Specifically, since the available data comes from six different centers, it contains highly variable resolutions and image appearances. Therefore, we intro- duce a center-sampling training procedure by which the origin of each image is taken into account for deciding which images should be sampled for training. We also increase the representation capability of the encoder in our architecture, in order to provide a more powerful encoding step that can better cap- ture the more complex information present in the data. Experimental results are promising and validate our approach for the segmentation of polyps in a highly heterogeneous data scenarios. Keywords Polyp segmentation, multi-center data, double encoder-decoders 1. Introduction Colorectal Cancer (CRC) is among the most concerning diseases affecting the human gastroin- testinal tract, representing the second most common cancer type in women and third most common for men [1]. CRC treatment begins with colorectal lesion detection, which is typically performed during colonoscopic screenings. In these procedures, a flexible tube equipped with a camera is introduced through the rectum to look for such lesions throughout the colon. Early detection of CRC is known to substantially increase survival rates. Unfortunately, it is estimated that around 6-27% of pathologies are missed during a colonoscopic examination [2]. Colono- scopic image analysis and decision support systems have shown great promise in improving examination effectiveness and decreasing the amount of missed lesions [3]. 3rd International Workshop and Challenge on Computer Vision in Endoscopy (EndoCV2021) in conjunction with the 18th IEEE International Symposium on Biomedical Imaging ISBI2021, April 13th, 2021, Nice, France " agaldran@bournemouth.ac.uk (A. Galdran); gustavo.carneiro@adelaide.edu (G. Carneiro); ma.gonzalez@upf.edu (M. A. Gonzรกlez Ballester) ยฉ 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) (b) (c) (a) Figure 1: Polyp visual aspects have a wide variability in terms of shape, appearances, and boundaries. In this figure, polyps sampled from different databases, acquited in different centers and with different equipment: (a) Kvasir-Seg [11], (b) CVC-ColonDB [12], (c) ETIS [13]. Gastrointestinal polyps are among the most relevant pathologies to be found in colonoscopies, since they are one of the main early signs of CRC [4]. However, their correct identification and accurate segmentation are challenging tasks for both clinicians and computational techniques, due to their wide and highly variable range of shapes and visual appearances, as illustrated in Fig. 1. For this reason, computer-aided polyp detection has been extensively explored in recent years as a supplementary tool for colonoscopic procedures to improve detection rates, enable early treatment, and increase survival rates. Polyp segmentation is often approached by means of encoder-decoder convolutional neural networks. In [5] an encoder-decoder network containing multi-resolution, multi-classification, and fusion sub-networks was introduced, whereas [6] explored several combinations of different encoder and decoder architectures. In [7] an architecture with a shared encoder and two inter- depending decoders was proposed to model polyp areas and boundaries respectively, and in [8] ensembles of instance-segmentation architectures were studied. More recently, in [9] the authors proposed parallel reverse attention layers to model the relationship between polyp areas and their boundaries. A recent review of different approaches to polyp segmentation (and detection) on gastroendoscopic images can be found in [10]. This work describes our solution to the EndoCV 2021 challenge on the polyp segmentation track [14]. The proposed approach is based on our recently introduced solution for polyp segmentation [15], consisting of a cascaded double encoder-decoder Convolutional Neural Network, which achieved the first position on the EndoTect Challenge [16]. We improve upon our previous approach by 1) increasing the representation capability of the pre-trained encoder, and 2) adopting a multi-site sampling scheme to better capture the varying nature of endoscopic data during training. Our approach is straightforward to implement, yet it delivers outstanding performance for the task of polyp segmentation. Our experimental analysis, even if limited due to the final results of the competition not being released at the time of writing, demonstrate that he proposed technique is highly effective and can reliably generate accurate polyp segmentations on endoscopic images of a highly varying visual aspect. Figure 2: Double encoder-decoder networks for polyp segmentation. The first encoder-decoder ๐ (1) generates an initial attempt to segment the polyp, which is supplied to the second encoder-decoder ๐ (2) together with the original image ๐ฅ. This guides the learning of ๐ (2) towards more challenging regions within ๐ฅ. 2. Methodology 2.1. Double Encoder-Decoder Networks Dense semantic segmentation tasks are nowadays typically approached with encoder-decoder networks [17] equipped with skip connections that produce pixel-wise probabilities. The encoder acts as a feature extractor downsampling spatial resolutions while increasing the number of channels by learning convolutional filters. The decoder then upsamples this representation back to the original input size. Double encoder-encoders are a direct extension of encoder-decoder architectures in which two encoder-decoder networks are sequentially combined [18]. Being ๐ฅ an input RGB image, ๐ (1) the first network, and ๐ (2) the second network, in a double encoder- decoder, the output ๐ (1) (๐ฅ) of the first network is fed to the second network together with ๐ฅ, behaving like an attention map that allows ๐ (2) to focus on the most interesting areas of the image: ๐ (๐ฅ) = ๐ (2) (๐ฅ, ๐ (1) (๐ฅ)), (1) where ๐ฅ and ๐ (1) (๐ฅ) are stacked so that the input to ๐ (2) has four channels, as illustrated in Fig. 2. In this work we employ the same structure in both ๐ (1) and ๐ (2) : we select a Feature Pyramid Network architecture as the decoder [19]. In addition, in order to take into account the more complex data characteristics in this challenge, we increase the encoder capability (as compared to [15]) by leveraging the more powerful DPN92 architecture instead of the DPN68 CNN [20]. Note also that during training we apply pixel-wise supervision on both ๐ (1) and ๐ (2) by computing the Cross-Entropy loss between ๐ (2) (๐ฅ) and the corresponding manual segmentation ๐ฆ, but also between ๐ (1) (๐ฅ) and ๐ฆ. 2.2. Multi-Center Sampling The nature of the provided database of segmented polyps for this competition is highly hetero- geneous, with samples collected from 6 different centers. This leads to a widely variable training set containing images of varying resolutions, visual quality, and even diverse annotation styles. Figure 3: Multi-Center training data sampling strategy. Instead of a regular data sampling in which each image in the training set is shown to the model based on a uniform probability distribution (resulting in center-imbalanced batches, bottom left), in this work we implement a center-based sam- pling technique in which each mini-batch of images contains a proportionate representation of images from the different centers on which the data was collected (center-balanced batches, bottom right). In this work, we attempt to facilitate the training of the CNN described in the previous section on such irregular dataset by considering the origin of each sample (its center) when designing our training sampling approach. Modified sampling strategies are typically used in classification problems when there is a high class imbalance during training, the most typical scheme being oversampling under-represented categories. In our case, we consider the set of different centers provided by the organization, ๐1 , . . . , ๐6 as our categories. We denote our training set as ๐ = {(๐ฅ๐ , ๐ฆ๐ , ๐๐ ), ๐ = 1, ..., ๐ }, where ๐ฅ๐ is an image containing a polyp, ๐ฆ๐ its manual segmentation, and ๐๐ โโ๏ธ {1, . . . , 6} its original center. In our case, each class/center ๐ contains ๐๐ examples, so that 6๐=1 ๐๐ = ๐ . With this notation, most data sampling strategies can be described with a single equation as follows: ๐๐๐ ๐๐ = โ๏ธ6 ๐, (2) ๐=1 ๐๐ where ๐๐ is the probability of sampling an image from center ๐ during training. By specifying ๐ = 1, we are defining a sampling scheme akin to selecting examples with a probability equal to the frequency of their center in the training set (conventional sampling), while setting ๐ = 0 leads to a uniform probability ๐๐ = 1/6 of sampling from each center, this is, oversampling of minority centers in order to supply our CNN with mini-batches containing representative Table 1 Performance comparison for polyp segmentation of different double encoder-decoder networks on the hidden validation set in terms of mean and standard deviation of Dice score FPN/ResNet34 FPN/DPN68 FPN/DPN92 Dice (Mean ยฑ Std) 79.12 ยฑ 4.32 81.69 ยฑ 2.34 81.81 ยฑ 1.19 images from all sites. The latter is the sampling strategy implemented in this work. 2.3. Training Details Our models are trained on training data (five different center data) provided by EndoCV2021 challenge organisers [14]. Here, we minimized the cross-entropy loss using Stochastic Gradient Descent with a batch-size of 4 and a learning rate of ๐ = 0.01, which is cyclically decayed following a cosine law from its initial value to ๐ = 1๐ โ 8 during 25 epochs, which defines a training cycle. We repeat this process for 20 cycles, resetting the learning rate at the start of each cycle. Images are re-sampled to 640 ร 512 and during training they are augmented with stan- dard operations(random rotations, vertical/horizontal flipping, contrast/saturation/brightness changes). The mean Dice score is monitored on a separate validation set and the best model is kept for testing purposes. In test time, we generate four different versions of each image by horizontal/vertical flipping, predict on each of them, and average the results. 3. Experimental Results At the time of writing, final results on this challenge have not been released yet. An offline validation phase on unseen images (a subset of the final test set) was run by the organization for the participants to be able to perform model selection. This allows us to compare internally the performance of different versions of our approach1 . Table 1 shows the performance of the system described in the previous sections when using three different double encoder-decoder networks, all of them trained with the center-sampling approach. It can be appreciated that increasing the complexity of the encoder correlates with a greater performance in terms of average Dice score. In addition, we can also observe a substantial decrease in standard deviation measured across centers when the more powerful DPN92 encoder architecture is employed, as compared to the smaller DPN68 or a more simple ResNet34, highlighting the relevance of this design decision. 4. Conclusions In this work, we have detailed our solution for the EndoCV 2021 challenge on the polyp segmentation track. The proposed approach consists of a double encoder-network, enhanced with an improved encoder architecture and a training data sampling strategy specifically designed to deal with the multi-site nature of the data associated to this competition. The 1 Details on performance analysis metrics for this problem can be found in [21]. limited experimental results show that our method achieves a consistently high Dice score with a remarkably low standard deviation, which indicates that it is suitable for polyp segmentation on endoscopic images, and it has enough generalization capability to perform well on images collected from different centers. Acknowledgments Adrian Galdran was funded by a Marie Skลodowska-Curie Global Fellowship (No 892297). References [1] F. A. Haggar, R. P. Boushey, Colorectal Cancer Epidemiology: Incidence, Mortality, Survival, and Risk Factors, Clinics in Colon and Rectal Surgery 22 (2009) 191โ197. [2] S. B. Ahn, D. S. Han, J. H. Bae, T. J. Byun, J. P. Kim, C. S. Eun, The Miss Rate for Colorectal Adenoma Determined by Quality-Adjusted, Back-to-Back Colonoscopies, Gut and Liver 6 (2012) 64โ70. [3] T. K. Lui, C. K. Hui, V. W. Tsui, K. S. Cheung, M. K. Ko, D. aCC Foo, L. Y. Mak, C. K. Yeung, T. H. Lui, S. Y. Wong, W. K. Leung, New insights on missed colonic lesions during colonoscopy through artificial intelligenceโassisted real-time detection (with video), Gastrointestinal Endoscopy (2020). [4] D. Vรกzquez, J. Bernal, F. J. Sรกnchez, G. Fernรกndez-Esparrach, A. M. Lรณpez, A. Romero, M. Drozdzal, A. Courville, A benchmark for endoluminal scene segmentation of colonoscopy images, Journal of healthcare engineering 2017 (2017). [5] Y. Guo, J. Bernal, B. J. Matuszewski, Polyp Segmentation with Fully Convolutional Deep Neural NetworksโExtended Evaluation Study, Journal of Imaging 6 (2020) 69. [6] L. F. Sรกnchez-Peralta, A. Picรณn, J. A. Antequera-Barroso, J. F. Ortega-Morรกn, F. M. Sรกnchez- Margallo, J. B. Pagador, Eigenloss: Combined PCA-Based Loss Function for Polyp Segmen- tation, Mathematics 8 (2020) 1316. [7] Y. Fang, C. Chen, Y. Yuan, K.-y. Tong, Selective Feature Aggregation Network with Area-Boundary Constraints for Polyp Segmentation, in: Medical Image Computing and Computer Assisted Intervention - MICCAI 2019, Lecture Notes in Computer Science, Springer International Publishing, Cham, 2019, pp. 302โ310. [8] J. Kang, J. Gwak, Ensemble of Instance Segmentation Models for Polyp Segmentation in Colonoscopy Images, IEEE Access 7 (2019) 26440โ26447. [9] D.-P. Fan, G.-P. Ji, T. Zhou, G. Chen, H. Fu, J. Shen, L. Shao, PraNet: Parallel Reverse Attention Network for Polyp Segmentation, in: Medical Image Computing and Computer Assisted Intervention โ MICCAI 2020, 2020, pp. 263โ273. [10] S. Ali, M. Dmitrieva, N. Ghatwary, S. Bano, G. Polat, A. Temizel, A. Krenzer, A. Hekalo, Y. B. Guo, B. Matuszewski, M. Gridach, I. Voiculescu, V. Yoganand, A. Chavan, A. Raj, N. T. Nguyen, D. Q. Tran, L. D. Huynh, N. Boutry, S. Rezvy, H. Chen, Y. H. Choi, A. Subramanian, V. Balasubramanian, X. W. Gao, H. Hu, Y. Liao, D. Stoyanov, C. Daul, S. Realdon, R. Cannizzaro, D. Lamarque, T. Tran-Nguyen, A. Bailey, B. Braden, J. E. East, J. Rittscher, Deep learning for detection and segmentation of artefact and dis- ease instances in gastrointestinal endoscopy, Medical Image Analysis 70 (2021) 102002. doi:10.1016/j.media.2021.102002. [11] D. Jha, P. H. Smedsrud, M. A. Riegler, P. Halvorsen, T. de Lange, D. Johansen, H. D. Johansen, Kvasir-seg: A segmented polyp dataset, in: International Conference on Multimedia Modeling, 2020, pp. 451โ462. [12] J. Bernal, J. Sรกnchez, F. Vilariรฑo, Towards automatic polyp detection with a polyp appear- ance model, Pattern Recognition 45 (2012) 3166โ3182. [13] J. Silva, A. Histace, O. Romain, X. Dray, B. Granado, Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer, International Journal of Computer Assisted Radiology and Surgery 9 (2014) 283โ293. [14] S. Ali, D. Jha, N. Ghatwary, S. Realdon, R. Cannizzaro, M. A. Riegler, P. Halvorsen, C. Daul, J. Rittscher, O. E. Salem, D. Lamarque, T. de Lange, J. E. East, Polypgen: A multi-center polyp detection and segmentation dataset for generalisability assessment, arXiv (2021). [15] A. Galdran, G. Carneiro, M. A. Gonzรกlez Ballester, Double Encoder-Decoder Networks for Gastrointestinal Polyp Segmentation, in: Pattern Recognition. ICPR International Workshops and Challenges, Lecture Notes in Computer Science, Springer International Publishing, Cham, 2021, pp. 293โ307. doi:10.1007/978-3-030-68763-2_22. [16] S. A. Hicks, D. Jha, V. Thambawita, P. Halvorsen, H. L. Hammer, M. A. Riegler, The EndoTect 2020 Challenge: Evaluation and Comparison of Classification, Segmentation and Inference Time for Endoscopy, in: Pattern Recognition. ICPR International Workshops and Challenges, Lecture Notes in Computer Science, Springer International Publishing, Cham, 2021, pp. 263โ274. doi:10.1007/978-3-030-68793-9_18. [17] O. Ronneberger, P. Fischer, T. Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation, in: Medical Image Computing and Computer-Assisted Intervention โ MICCAI 2015, Lecture Notes in Computer Science, Springer International Publishing, Cham, 2015, pp. 234โ241. [18] A. Galdran, A. Anjos, J. Dolz, H. Chakor, H. Lombaert, I. B. Ayed, The Little W-Net That Could: State-of-the-Art Retinal Vessel Segmentation with Minimalistic Models, arXiv:2009.01907 (2020). [19] T.-Y. Lin, P. Dollรกr, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature Pyramid Net- works for Object Detection, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 936โ944. ISSN: 1063-6919. [20] Y. Chen, J. Li, H. Xiao, X. Jin, S. Yan, J. Feng, Dual path networks, in: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPSโ17, Curran Associates Inc., Red Hook, NY, USA, 2017, pp. 4470โ4478. [21] S. Ali, F. Zhou, B. Braden, A. Bailey, S. Yang, G. Cheng, P. Zhang, X. Li, M. Kayser, R. D. Soberanis-Mukul, S. Albarqouni, X. Wang, C. Wang, S. Watanabe, I. Oksuz, Q. Ning, S. Yang, M. A. Khan, X. W. Gao, S. Realdon, M. Loshchenov, J. A. Schnabel, J. E. East, G. Wagnieres, V. B. Loschenov, E. Grisan, C. Daul, W. Blondel, J. Rittscher, An objective comparison of detection and segmentation algorithms for artefacts in clinical endoscopy, Scientific Reports 10 (2020) 2748. doi:10.1038/s41598-020-59413-5.