=Paper=
{{Paper
|id=Vol-3181/paper71
|storemode=property
|title=HCMUS at MediaEval2021: PointRend with Attention Fusion Refinement for
Polyps Segmentation
|pdfUrl=https://ceur-ws.org/Vol-3181/paper71.pdf
|volume=Vol-3181
|authors=E-Ro Nguyen,Hai-Dang Nguyen,Minh-Triet
Tran
|dblpUrl=https://dblp.org/rec/conf/mediaeval/NguyenNT21
}}
==HCMUS at MediaEval2021: PointRend with Attention Fusion Refinement for
Polyps Segmentation==
HCMUS at MediaEval2021: PointRend with
Attention Fusion Refinement for Polyps Segmentation
E-Ro Nguyen1,3 , Hai-Dang Nguyen1,3 , Minh-Triet Tran1,2,3
1 University of Science, VNU-HCM, 2 John von Neumann Institute, VNU-HCM
3 Vietnam National University, Ho Chi Minh city, Vietnam
{nero,nhdang}@selab.hcmus.edu.vn
tmtriet@fit.hcmus.edu.vn
ABSTRACT 2 APPROACH
The Medico task in MediaEval 2021 explores the challenge of build- 2.1 Attention Fusion Refinement
ing accurate and high-performance algorithms to detect all types of
Current popular medical image segmentation networks usually rely
polyps in endoscopic images. This paper introduces our approach
on a U-Net architecture (e.g., U-Net [9], U-Net++ [13], ResUNet [7],
for the automatic segmentation of polyp images. We employ a
etc). These models are essentially encoder-decoder frameworks,
ResNeXt as an encoder backbone with a UNet decoder. Further, the
which aggregate all multi-level features extracted with a simple
addition of PointRend and Attention Fusion Refinement on the net-
decoder, which does not effectively leverage these features. Woo
work improves our segmentation performance. The experimental
et al. introduce a Convolutional Block Attention Module (CBAM)
results show the efficiency of the proposed method, which achieves
[11], which applies attention-based feature refinement with two
a Jaccard index of 0.7572, an accuracy of 0.9634, and a dice score of
distinctive modules, channel and spatial, to learn what and where to
0.8326.
emphasize or suppress and refines intermediate features effectively.
We propose an Attention Fusion Refinement(AFR) module to
better aggregate high-level features and focus on important regions,
1 INTRODUCTION combining high-level features with upsampled features by CBAM
as a core module. More specifically, for an input image, five levels
Medico: Transparency in Medical Image Segmentation 2021[6] task
of features {ππ , π = 1, .., 5} can be extracted from a ResNeXt [5, 12]
aims to develop automatic segmentation systems for segmenting
backbone network. We introduce a new decoder component, AFR,
polyps in images taken from endoscopies that are transparent and
to aggregate the high-level features with upsampled features. As
explainable, and reduce the chance that diagnosticians overlook a
shown in Fig. 1, An AFR module inputs a high-level feature ππ with
polyp during a colonoscopy. A modified version of the segmenta-
the previous upsampled feature ππ+1 and we obtain the upsampled
tion part of HyperKvasir [2] is given with more than 1000 training
feature ππ .
polyp images with their corresponding masks labeled by medical
experts and 200 testing polyp images to challenge the participants
for the robust, transparent, and efficient algorithms for polyp seg-
2.2 PointRend
mentation. The U-Net [9, 13] model gives decent accuracy. However, it still has
In recent years, the task of automatic polyp segmentation using some drawbacks like predicting classes with very near distinguish-
deep learning-based [1, 3, 4] methods has gained a lot of achieve- able features, not being able to predict precise boundaries, etc. We
ments. Especially, the appearance of attention strategies [3] effec- have used the PointRend [8] module to address these drawbacks.
tively improves polyp detection and segmentation performance. PointRend constructs point-wise features at selected points by
However, it still has some challenges, including the varieties of concatenating two features, fine-grained to render fine segmenta-
polypβs appearance (size, texture, and color). The boundary be- tion details and coarse prediction features to gain more contextual
tween a polyp and its neighbor regions is usually blurred and hard and semantic information. We use the features π2 as our fine-grained
to be segmented. features and select top πΎ = 3136 uncertain points in each subdi-
In this paper, we propose an accurate and real-time framework vision step. In general, the uncertain points are located near the
PointRend with Attention Fusion Refinement (PRAFNet) for the boundary of classes, so it can help refine the polypβs boundary
polyp segmentation. Fig. 1 shows the overview of our proposed effectively. As shown in Fig. 1, we use two subdivision steps of
framework. PRAFNet utilizes the Attention Fusion Refinement PointRend to obtain the final segmentation, which is the same size
to decode an effective high-level semantic segmentation, and the as the input image. We plot the uncertain points used in PointRend
PointRend [8] module to generate high-quality polyp segmentation as blue dots in the coarse predictions π 2, π 1 .
from the colonoscopy images. The following section will introduce
our approach and elaborate details about our network. 2.3 Training strategy
We apply the Bootstrapped Cross Entropy loss to prevent the mod-
els from overfitting on simple pixels and force them to focus on
Copyright 2021 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).
more challenging cases. With the Bootstrapped Cross Entropy, we
MediaEvalβ21, December 13-15 2021, Online calculate the loss for the top πΎ percent pixels with the largest losses
at each step in the training process. We would also add a "warm-up"
MediaEvalβ21, December 13-15 2021, Online E-Ro et al.
PointRend
πππππππ‘πππ
Input image π%
PointRend
π%
π!
Re
s
π!
Ne
π"
Xt
π"
a
sE
nc
od
e
Attention Fusion Refinement
r
π# π&
π#
π$ CB π&
Concat Res Block Res Block
AM
Res Block π$
π&'%
Max-Pooling
Attention Fusion Refinement
CB Convolutional Block Attention module Res Block Residual Block
Up-sampling ResNeXt block AM
Figure 1: Overview of our proposed method PRAFNet, which consists of three attention fusion refinement (AFR) modules with
two adaptive subdividion steps of PointRend module. Please refer to section 2 for more details.
Method Acc Jaccard Dice F1 R P For task 1, we submit five runs from Method 2 to Method 6. For
2 0.9580 0.7252 0.8059 0.8059 0.7942 0.8871 task 2, we submit two runs. In the first run, we use Method 4. And
3 0.9595 0.7283 0.8093 0.8093 0.7941 0.8831 the second run is Method 1 for the lightweight architecture.
4 0.9608 0.7441 0.8188 0.8188 0.8110 0.8741 Table 1 and 2 shows our results on task 1 and task 2, respectively.
5 0.9613 0.7497 0.8290 0.8290 0.8352 0.8639 Method 2 is slightly better than method 1 in all metrics, which
6 0.9634 0.7572 0.8326 0.8326 0.8153 0.8956 shows that PointRend helps improve the results. In method 3, we
Table 1: Medico polyp segmentation task 1βs result. Acc de- use AFR, and the results also improve compared to method 2. With
notes the accuracy, R and P denote the recall and precision, a stronger backbone (ResNeXt101 instead of ResNeXt50) in method
respectively. 4, the results are improved with a Jaccard index of 0.7441. Method 5
with an EfficientNetB6 backbone is better than method 4 in several
metrics except for precision. In method 6, we ensemble our methods
3, 4, 5 to achieve our best result in this task with the Jaccard index
Method FPS Accuracy Jaccard Dice F1 of 0.7572.
1 76.38 0.9580 0.7210 0.8054 0.8054 In task 2, although our method 1 is 1.5 faster than method 2,
4 47.86 0.9608 0.7441 0.8188 0.8188 method 2 has higher accuracy with a real-time efficiency (48 FPS)
Table 2: Medico polyp segmentation task 2βs result.
4 CONCLUSION
This paper presents a fast and accurate method for automatic polyps
segmentation. The proposed methods use an encoder-decoder ar-
period to the loss with πΎ = 100 such that the network can learn to chitecture. ResNeXt is used as an encoder backbone with the UNet
adapt to the easy regions first. Then transit to the harder areas by decoder. Further, PointRend and Attention Fusion Refinement are
gradually decaying K to 15 in a polynomial manner. applied to improve the segmentation result. PointRend helps refine
the uncertainty points, especially with the boundary regions. The
3 RESULTS AND ANALYSIS Attention Fusion Refinement enhances the fusion between high-
We performed experiments on six different settings for two tasks: level features and upsampled features in the decoder. In the future,
Method 1 uses the UNet with ResNeXt50 [12] backbone as a baseline we plan to apply better architecture such as ResUnet++ or PraNet
model. Method 2 extends Method 1 with the PointRend. Method 3 for our work and further improve the results.
extends Method 2 with the Attention Fusion Refinement. Method 4
uses ResNeXt101 as a backbone with the same settings as Method ACKNOWLEDGMENTS
3. Method 5 uses EfficientNetB6 [10] as as backbone with the same This work was funded by Gia Lam Urban Development and Invest-
setting as Method 3. Method 6 ensembles the results of Method 3, ment Company Limited, Vingroup and supported by Vingroup In-
Method 4, and Method 5 together. novation Foundation (VINIF) under project code VINIF.2019.DA19.
Medico: Transparency in Medical Image Segmentation MediaEvalβ21, December 13-15 2021, Online
REFERENCES
[1] Mojtaba Akbari, Majid Mohrekesh, Ebrahim Nasr Esfahani, S.M.Reza
Soroushmehr, Nader Karimi, Shadrokh Samavi, and Kayvan Najarian.
2018. Polyp Segmentation in Colonoscopy Images Using Fully Con-
volutional Network. Conference proceedings: ... Annual International
Conference of the IEEE Engineering in Medicine and Biology Society.
IEEE Engineering in Medicine and Biology Society. Conference 2018,
69β72. https://doi.org/10.1109/EMBC.2018.8512197
[2] Hanna Borgli, Vajira Thambawita, Pia H Smedsrud, Steven Hicks,
Debesh Jha, Sigrun L Eskeland, Kristin Ranheim Randel, Konstantin
Pogorelov, Mathias Lux, Duc Tien Dang Nguyen, Dag Johansen,
Carsten Griwodz, HΓ₯kon K Stensland, Enrique Garcia-Ceja, Peter T
Schmidt, Hugo L Hammer, Michael A Riegler, PΓ₯l Halvorsen, and
Thomas de Lange. 2020. HyperKvasir, a comprehensive multi-class
image and video dataset for gastrointestinal endoscopy. Scientific Data
7, 1 (2020), 283. https://doi.org/10.1038/s41597-020-00622-y
[3] Deng-Ping Fan, Ge-Peng Ji, Tao Zhou, Geng Chen, Huazhu Fu, Jianbing
Shen, and Ling Shao. 2020. PraNet: Parallel Reverse Attention Network
for Polyp Segmentation. (2020). arXiv:eess.IV/2006.11392
[4] Yuqi Fang, Cheng Chen, Yixuan Yuan, and Kai-yu Tong. 2019. Selective
Feature Aggregation Network with Area-Boundary Constraints for Polyp
Segmentation. 302β310. https://doi.org/10.1007/978-3-030-32239-7_34
[5] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
2015. Deep Residual Learning for Image Recognition. (2015).
arXiv:cs.CV/1512.03385
[6] Steven Hicks, Debesh Jha, Vajira Thambawita, Hugo Hammer, Thomas
de Lange, Sravanthi Parasa, Michael Riegler, and PΓ₯l Halvorsen. 2021.
Medico Multimedia Task at MediaEval 2021: Transparency in Medical
Image Segmentation. In Proceedings of MediaEval 2021 CEUR Work-
shop.
[7] Debesh Jha, Pia H. Smedsrud, Michael A. Riegler, Dag Johansen,
Thomas De Lange, PΓ₯l Halvorsen, and HΓ₯vard D. Johansen. 2019. Re-
sUNet++: An Advanced Architecture for Medical Image Segmentation.
In 2019 IEEE International Symposium on Multimedia (ISM). 225β2255.
https://doi.org/10.1109/ISM46123.2019.00049
[8] Alexander Kirillov, Yuxin Wu, Kaiming He, and Ross Girshick.
2020. PointRend: Image Segmentation as Rendering. (2020).
arXiv:cs.CV/1912.08193
[9] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net:
Convolutional Networks for Biomedical Image Segmentation. (2015).
arXiv:cs.CV/1505.04597
[10] Mingxing Tan and Quoc V. Le. 2020. EfficientNet: Rethink-
ing Model Scaling for Convolutional Neural Networks. (2020).
arXiv:cs.LG/1905.11946
[11] Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon.
2018. CBAM: Convolutional Block Attention Module. (2018).
arXiv:cs.CV/1807.06521
[12] Saining Xie, Ross Girshick, Piotr DollΓ‘r, Zhuowen Tu, and Kaiming
He. 2017. Aggregated Residual Transformations for Deep Neural
Networks. (2017). arXiv:cs.CV/1611.05431
[13] Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh,
and Jianming Liang. 2018. UNet++: A Nested U-Net Architecture for
Medical Image Segmentation. (2018). arXiv:cs.CV/1807.10165