=Paper=
{{Paper
|id=Vol-2882/MediaEval_20_paper_75
|storemode=property
|title=Automatic
Polyp Segmentation Using U-Net-ResNet50
|pdfUrl=https://ceur-ws.org/Vol-2882/paper75.pdf
|volume=Vol-2882
|authors=Saruar Alam,Nikhil Kumar Tomar,Aarati Thakur,Debesh Jha,Ashish Rauniyar
|dblpUrl=https://dblp.org/rec/conf/mediaeval/AlamTTJR20
}}
==Automatic
Polyp Segmentation Using U-Net-ResNet50==
Automatic Polyp Segmentation Using U-Net-ResNet50
Saruar Alam1 , Nikhil Kumar Tomar2 , Aarati Thakur3 , Debesh Jha2,4 , Ashish Rauniyar5,6
1 University of Bergen, Norway
2 SimulaMet, Norway
3 Nepal Medical College, Kathmandu University, Nepal
4 UiT The Arctic University of Norway
5 University of Oslo, Norway
6 Oslo Metropolitan University, Norway
saruar.alam@uib.no,nikhilroxtomar@gmail.com,absolute2iti@gmail.com,debesh@simula.no
ashish@oslomet.no
ABSTRACT The automatic polyp segmentation plays an important role in the
Polyps are the predecessors to colorectal cancer which is considered identification and localization of the polyps in the affected regions.
as one of the leading causes of cancer-related deaths worldwide. It helps in analyzing the images or even video frames and classify
Colonoscopy is the standard procedure for the identification, lo- each pixel into polyp or non-polyp class instances. This allows the
calization, and removal of colorectal polyps. Due to variability in clinician in easy, fast, and more accurate identification of the polyp
shape, size, and surrounding tissue similarity, colorectal polyps are in the affected region. The automated polyp segmentation can help
often missed by the clinicians during colonoscopy. With the use of in the development of a Computer-Aided Diagnosis (CADx) system,
an automatic, accurate, and fast polyp segmentation method during which is specially designed for colonoscopy procedures.
the colonoscopy, many colorectal polyps can be easily detected and The “Medico Automatic Polyp Segmentation Challenge” [6] con-
removed. The “Medico automatic polyp segmentation challenge” sists of two tasks. The first task is “Polyp segmentation task” and
provides an opportunity to study polyp segmentation and build an the second is “Algorithm efficiency task”. We have submitted our
efficient and accurate segmentation algorithm. We use the U-Net model in task 1 only.
with pre-trained ResNet50 as the encoder for the polyp segmenta-
tion. The model is trained on Kvasir-SEG dataset provided for the
challenge and tested on the organizer’s dataset and achieves a dice
coefficient of 0.8154, Jaccard of 0.7396, recall of 0.8533, precision of 2 RELATED WORKS
0.8532, accuracy of 0.9506, and F2 score of 0.8272, demonstrating
For semantic segmentation task, encoder-decoder networks like
the generalization ability of our model.
FCN [9], U-Net [10], etc are mostly preferred over other approaches.
U-Net and its variants are used for both natural image segmentation
1 INTRODUCTION and biomedical image segmentation. In general, the encoder uses
multiple convolutions to learn and capture the essential semantic
Identification and removal of polyps during colonoscopy have be-
features ranging from low-level to high-level. These upscaled fea-
come a standard procedure. It is often challenging to detect polyps,
tures are then concatenated with the features from the encoder
as they are often hard to differentiate from surrounding normal
using the skip connections and then followed by convolution layers
tissue. These polyps are usually covered with stool, mucosa, and
to generate the final output in the form of a binary mask.
other materials that can obscure the correct diagnosis. This is espe-
The encoder acts as a feature extractor, where the decoder uses
cially true for the small, flat, and sessile polyps that are typically not
features extracted from the input to produce to desired segmenta-
visible during colonoscopy. Moreover, this increases the miss-rate
tion mask. The encoder can be replaced by a pre-trained network
of polyps up-to 25% [8] and increases the risk of colorectal cancer
such as VGG16 [12], VGG19 [12], etc. These pre-trained networks
in the affected patient. An increase in the 1% adenoma detection
are already trained on the ImageNet [11] dataset and have the nec-
rate leads to a 3% decrease in the risk of colorectal cancer [3]. Re-
essary feature extraction capabilities. Architectures like SegNet [2]
cently, deep learning techniques have been developed to overcome
and TernausNet [5] use pre-trained VGG16 and VGG11 respectively
these challenges and improve polyp detection accuracy during
for segmentation task.
colonoscopy. Polyp segmentation based deep learning methods
With the success of the residual network [4], ResNet50 is one
has been successfully applied for automatic polyp detection in a
of the commonly used architecture for any transfer learning task.
real-time.
The residual network uses two 3 × 3 convolutional layers and an
Copyright 2020 for this paper by its authors. Use permitted under Creative Commons identity mapping. Each convolution layer is followed by a batch
License Attribution 4.0 International (CC BY 4.0). normalization layer and a Rectified Linear Unit (ReLU) activation
MediaEval’20, 14-15 December 2020, Online
function. The identity mapping is the shortcut connection connect-
ing the input and output of the convolutional layer. The identity
mapping helps in building a deeper neural network by eliminating
the problem of vanishing gradients and exploding gradients.
MediaEval’20, December 14-15 2020, Online S. Alam et. al.
Figure 1: The proposed U-Net-ResNet50 architecture
3 APPROACH Table 1: Quantitative Results on Kvasir-SEG and Test Set
(Challenge) Dataset for Task 1.
Figure 1 shows an overview of the proposed U-Net-ResNet50 archi-
tecture. It is an encoder-decoder based architecture, where ResNet50
trained on ImageNet dataset [11] is used . The use of a pre-trained Dataset Jaccard DSC Recall Prec. Acc. F2
encoder helps the model to converge easily. The input image is
fed into the pre-trained ResNet50 encoder, consisting of a series of Kvasir-SEG 0.7871 0.8926 0.8433 0.9207 0.9639 0.8585
residual blocks as their main component. These residual blocks help
the encoder extract the important features from the input image,
Test Set 0.7396 0.8154 0.8533 0.8532 0.9506 0.8272
which are then passed to the decoder. The decoder starts a trans-
pose convolution that upscales the incoming feature maps into the
desired shape. Next, these upscaled feature maps are concatenated a dice coefficient of 0.8154, Jaccard of 0.7396, recall of 0.8533, pre-
with the specific shape feature maps from the pre-trained encoder cision of 0.8532, accuracy of 0.9506 and F2 score of 0.827 on the
via skip connections. These skip connections help the model to organiser’s test dataset which can be seen from the table 1. These
get all the low-level semantic information from the encoder, which results demonstrate the generalization ability of our model. More-
allows the decoder to generate the desired feature maps. After that, over Table 1 also shows that the recall value of the organizer’s test
it is followed by the two 3 × 3 convolution layer, where each layer is dataset is 1.00% higher than the Kvasir-SEG test dataset. This shows
followed by a batch normalization layer and a ReLU non-linearity. that the model is not overfitting.
The last decoder block’s output is passed to a 1×1 convolution layer,
which is further passed to a sigmoid activation function, finally 5 CONCLUSION & FUTURE WORK
generating the desired binary mask.
With our U-Net-ResNet50, we achieved competitive performance on
The FastAI (version 2.0) library [1] is used to train and evaluate
the organizer’s dataset with a dice coefficient of 0.8154. By replacing
our model. We have employed resizing, flipping, rotating, zoom-
the U-Net encoder with a pre-trained ResNet50 and employing a
ing, lightning, warping, and normalizing intensity based on the
one-cycle policy during training, we are able to converge the model
ImageNet dataset to augment the input images for training. The
in a short time. Thus, it helps in reducing the training time as
model uses Adam optimizer with an initial learning rate of 10−2 ,
the encoder weights are not initialized from scratch. This is an
and cross-entropy loss as its loss function. We have employed the
important step towards faster convergence, which would be useful
one-cycle policy where the learning rate changes during training
when the availability of high-performance computing resources is
and achieves super-convergence [13]. We have run just 50 epochs
limited.
for training, and the model has converged.
In the future, we would like to experiment with more than one
pre-trained encoder by fusing their feature maps and using them
4 RESULTS AND ANALYSIS for training our model.
The Medico Automatic Polyp Segmentation challenge [6] provides
an opportunity to study the potential and challenges of automated ACKNOWLEDGMENTS
polyp segmentation. This study aims at building a model that per- The computations in this paper were performed on the equipment
forms well on the organizer’s dataset while training on a separate provided by the Experimental Infrastructure for Exploration of
Kvasir-SEG dataset [7]. Exascale Computing (eX3), which is financially supported by the
Table 1 shows the overall results of the U-Net-ResNet50 architec- Research Council of Norway under the contract 270053.
ture on the Kvasir-SEG test dataset and the organizer’s test dataset The authors would also like to thank the machine learning group
provided for the final evaluation of the model. For the evaluation of Mohn Medical Imaging and Visualization (MMIV) Centre, Nor-
of the model, the Jaccard index, Sørensen-Dice coefficient (DSC), way, for providing the computing infrastructure for the experi-
recall, precision (Prec.), accuracy (Acc.), and the F2 are used as the ments.
evaluation metrics. Our trained U-Net-ResNet50 model achieved
Medico Multimedia Task MediaEval’20, December 14-15 2020, Online
REFERENCES [8] Sheila Kumar, Nirav Thosani, Uri Ladabaum, Shai Friedland, Ann M
[1] 2020. FastAI Library. (2020). https://docs.fast.ai/. Chen, Rajan Kochar, and Subhas Banerjee. 2017. Adenoma miss rates
[2] Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. Segnet: associated with a 3-minute versus 6-minute colonoscopy withdrawal
A deep convolutional encoder-decoder architecture for image segmen- time: a prospective, randomized trial. Gastrointestinal endoscopy 85, 6
tation. IEEE transactions on pattern analysis and machine intelligence (2017), 1273–1280.
39, 12 (2017), 2481–2495. [9] Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully
[3] Douglas A Corley, Christopher D Jensen, Amy R Marks, Wei K Zhao, convolutional networks for semantic segmentation. In Proceedings of
Jeffrey K Lee, Chyke A Doubeni, Ann G Zauber, Jolanda de Boer, the IEEE conference on computer vision and pattern recognition. 3431–
Bruce H Fireman, Joanne E Schottinger, and others. 2014. Adenoma 3440.
detection rate and risk of colorectal cancer and death. New england [10] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net:
journal of medicine 370, 14 (2014), 1298–1306. Convolutional networks for biomedical image segmentation. In Inter-
[4] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep national Conference on Medical image computing and computer-assisted
residual learning for image recognition. In Proceedings of the IEEE intervention. Springer, 234–241.
conference on computer vision and pattern recognition. 770–778. [11] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev
[5] Vladimir Iglovikov and Alexey Shvets. 2018. Ternausnet: U-net with Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla,
vgg11 encoder pre-trained on imagenet for image segmentation. arXiv Michael Bernstein, and others. 2015. Imagenet large scale visual recog-
preprint arXiv:1801.05746 (2018). nition challenge. International journal of computer vision 115, 3 (2015),
[6] Debesh Jha, Steven A. Hicks, Krister Emanuelsen, Håvard D. Jo- 211–252.
hansen, Dag Johansen, Thomas de Lange, Michael A. Riegler, and [12] Karen Simonyan and Andrew Zisserman. 2014. Very deep convo-
Pål Halvorsen. Medico Multimedia Task at MediaEval 2020:Automatic lutional networks for large-scale image recognition. arXiv preprint
Polyp Segmentation. arXiv:1409.1556 (2014).
[7] Debesh Jha, Pia H Smedsrud, Michael A Riegler, Pål Halvorsen, [13] Leslie N Smith and Nicholay Topin. 2019. Super-convergence: Very
Thomas de Lange, Dag Johansen, and Håvard D Johansen. 2020. Kvasir- fast training of neural networks using large learning rates. In Artificial
SEG: A Segmented Polyp Dataset. In Proc. of International Conference Intelligence and Machine Learning for Multi-Domain Operations Ap-
on Multimedia Modeling (MMM). 451–462. plications, Vol. 11006. International Society for Optics and Photonics,
1100612.