Real-Time Polyp Segmentation Using U-Net with IoU Loss
                                                                 George Batchkala 1, Sharib Ali2
                                       1Department of Computer Science, University of Oxford, Oxford, UK
                           2 Institute of Biomedical Engineering, Department of Engineering Science, Oxford, UK
                                                                    george.batchkala@gmail.com
                                                                      sharib.ali@eng.ox.ac.uk
ABSTRACT                                                                             scientists to benchmark their methods. Among these challenges,
Colonoscopy is the third leading cause of cancer deaths worldwide.                   the very first challenge on polyp segmentation2 was introduced
While automated segmentation methods can help detect polyps and                      in 2015 with comprehensive single images and video data. This
consequently improve their surgical removal, the clinical usability                  dataset has been widely used by the researchers. GIANA dataset3
of these methods requires a trade-off between accuracy and speed.                    was introduced in 2017 with the added detection task [6].
In this work, we exploit the traditional U-Net methods and compare                      Kvasir-SEG dataset [10], released in 2020, contains 1000 pairs of
different segmentation-loss functions. Our results demonstrate that                  colonoscopy images and their ground-truth segmentation masks4 .
IoU loss results in an improved segmentation performance (nearly                     Similarly, multi-class endoscopy disease detection and segmenta-
3% improvement on Dice) for real-time polyp segmentation.                            tion challenge [3] includes polyps as one of its five disease cate-
                                                                                     gories. A comprehensive comparison of deep learning methods on
                                                                                     this dataset can be found in [1]. Likewise, [8] provides an extensive
1    INTRODUCTION                                                                    comparison of the state-of-the-art methods for Kvasir-SEG dataset.
Colorectal cancer (CRC) is the commonly diagnosed malignancy
and the third leading cause of cancer-related deaths worldwide [4].                  3     APPROACH
Colorectal polyps are abnormal protrusions from the mucosa that
                                                                                     U-Net [12] is an established encoder-decoder architecture with
are usually identified during standard medical procedure referred
                                                                                     skip-connections. Classically, binary cross-entropy (BCE) is used
to as colonoscopy; the associated malignancy is classified through
                                                                                     for binary segmentation tasks [8, 12]. While preserving the stan-
histopathological examinations [13]. Patients with conventional
                                                                                     dard U-Net design, we used intersection-over-union loss L𝐼𝑜𝑈 and
adenomas or serrated polyps are advised to undergo polypectomy,
                                                                                     experimented with a combination of BCE and IoU losses. To boost
which is a non-invasive surgical procedure usually done during
                                                                                     the performance on this dataset, we have also added augmentation
colonoscopy surveillance to prevent CRC [7]. While detection and
                                                                                     techniques that include random rotations (up to 180 degrees in
segmentation of polyps are critical, missed detection and inaccurate
                                                                                     each direction) and random horizontal flips (with probability 0.5)
removal of polyps can lead to subsequent risk of CRC. Due to
                                                                                     followed by cropping to return the rotated images to their original
advancements in hardware and algorithmic revolutions such as
                                                                                     sizes. Here, we have directly used negative of IoU instead of classi-
deep learning, building accurate real-time systems is now possible.
                                                                                     cally used 1 − 𝐼𝑜𝑈 as shown in Eq. 1, where 𝑀 𝑝 and 𝑀 𝐺𝑇 are the
However, a trade-off between accuracy and speed is still vital for
                                                                                     predicted and ground-truth masks, respectively.
the use of automated systems during CRC surveillance and surgical
removal of polyps.                                                                                                | 𝑀 𝑝 ∩ 𝑀 𝐺𝑇 |
   Medico automatic polyp segmentation challenge1 held in 2020                                                  L𝐼𝑜𝑈 = −                          (1)
                                                                                                                  | 𝑀 𝑝 ∪ 𝑀 𝐺𝑇 |
aims to address the automated delineation of polyps and evaluate
the capability of built models for real-time performance that di-                      During the training stage, the IoU loss computation showed
rectly implicates clinical utility of the methods. We participated                   convergence already at 55 epochs providing validation IoU value
in both polyp segmentation and algorithm efficiency sub-tasks in                     over 70% (refer Figure 1).
this challenge. To this end, we have investigated the successful and
widely used for semantic segmentation U-Net architecture [12]. In
                                                                                     4 EXPERIMENTS
this paper, we propose and shed light on U-Net-based deep learning                   4.1 Dataset and set-up
architecture and evaluate it using different loss functions and data                 We split our training data into 88% training and 12% for validation
augmentation strategies for polyp segmentation.                                      on the 1000 training images provided by the organisers [9]. The
                                                                                     resolution of images varies from 332 × 487 to 1920 × 1072 pixels,
2    RELATED WORK                                                                    so we resized all the images to 256 × 256 pixels for training pur-
In the past, several biomedical challenges related to the endoscopy                  poses. A hidden test dataset that included additional 160 images
data have been accomplished [1, 2, 5, 6]. These challenges curate                    was provided.
endoscopy video image frames and provide to the computational                           We used Adam optimiser [11] for minimisation of our loss func-
1 https://multimediaeval.github.io/editions/2020/tasks/medico/                       tion with a learning rate of 1𝑒 −4 and default weight decay of 1𝑒 −8 .
                                                                                     For each experiment, we trained our network for 100 epochs with
Copyright 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).                                   2 https://polyp.grand-challenge.org/Home/
MediaEval’20, December 14-15 2020, Online                                            3 https://giana.grand-challenge.org
                                                                                     4 https://datasets.simula.no/kvasir-seg/
MediaEval’20, December 14-15 2020, Online                                                                                            G. Batchkala and S. Ali

                                                                                   Table 2: Results on the previously unseen test dataset (pro-
                                                                                   vided by the organisers)

                                                                                    Model              mIoU      DSC       Rec.    Prec.    Acc.       F2    FPS
                                                                                    U-Net + IoU loss   0.6351   0.7328   0.7500   0.8229   0.9422   0.7361   197


Figure 1: IoU loss computation for training and validation.
Red line shows the achieved stopping criteria reached on the
54th epoch (starting from 0) with validation IoU of 0.703.

Table 1: Results on the validation subset of the provided
Kvasir-SEG training dataset
                                                                                            Figure 2: Quantitative results on validation set
 Model                 IoU     DSC       Rec.     Prec.     Acc.       F2    FPS
 U-Net
 + IoU loss          0.6761   0.7703   0.8360    0.7967    0.9346   0.7772   241
 + BCE loss          0.6639   0.7556   0.8373    0.7769    0.9304   0.7714   221
 + BCE + Iou loss    0.6497   0.7415   0.8275    0.7745    0.9298   0.7509   223
 + IoU loss, aug    0.7005    0.7868   0.8307   0.8435    0.9391    0.7820   252
 + IoU loss, subm   0.6928    0.7821   0.8686   0.7895    0.9391    0.8026   243


early stopping (patience 10) and a batch size of 20. The implemented                        Figure 3: Qualitative results on (unseen) test set
code is available at https://github.com/GeorgeBatch/kvasir-seg. We
implemented the network architecture in PyTorch (1.7.0) and ran
the computations on Tesla V100 32GB GPU.                                              Figure 2 shows polyps with different size present at various
                                                                                   locations in the colon. Additionally, there are different textures
4.2      Evaluation metrics                                                        present on these protrusions. We see that our method is able to seg-
We used standard computer vision metrics for evaluating semantic                   ment these polyps accurately with IoU of nearly over 0.95. Figure 3
segmentation of polyps: intersection-of-union (IoU), Dice similarity               presents predicted masks from our trained network on the unseen
coefficient (DSC), recall (Rec.), precision (Prec.), overall accuracy              test dataset. For this data, no ground truth was provided. Visual
(Acc.) and F2-error (F2). Additionally, we demonstrated the real-                  inspection suggests that our method is able to segment the most
time application of our approach using frames-per-second (FPS)                     protruded polyps accurately. However, the method confuses the
measurement.                                                                       large polyp structure with the colon folds.

4.3      Results                                                                   5   CONCLUSION
                                                                                   We have presented different loss combinations and showed that
Table 1 shows the quantitative results of the U-Net model with
                                                                                   using widely used U-Net with IoU loss results in a descent segmen-
different loss functions and augmentation. It can be observed that
                                                                                   tation performance on the Kvasir-SEG dataset. Additionally, our
using IoU loss as a minimization objective is better than using the
                                                                                   method provides strong clinical applicability due to its real-time
BCE loss or the combined (IoU + BCE) loss. Furthermore, using IoU
                                                                                   capability. In future, we will work on improving the segmenta-
loss and data augmentation results in the best DSC of 0.7868, the
                                                                                   tion accuracy using attention mechanism and apply shape context
best IoU of 0.7005, and the best trade-off between precision (0.8435)
                                                                                   information to boost performance.
and recall (0.8307). It is worth noting that our method with IoU loss
has the highest FPS on our hardware of over 240.
   Table 2 presents the results of our method on the unseen test
                                                                                   ACKNOWLEDGMENTS
dataset provided by the challenge organisers. We have achieved the                 G.B. is funded by a full Health Data Science Studentship through
DSC of 0.7328 and precision of 0.8229. Again, it can be observed that              Professor Fergus Gleeson’s A2 research funds and S.A. is supported
our method has an FPS of 197, which is sufficient to be used in clin-              by Oxford NIHR BRC.
ical practice. In general, with available high-definition colonoscopy
equipment, the required rate is below 100 FPS.
Medico Multimedia Task                                                         MediaEval’20, December 14-15 2020, Online


REFERENCES
 [1] Sharib Ali, Mariia Dmitrieva, Noha M. Ghatwary, Sophia Bano,
     Gorkem Polat, Alptekin Temizel, and others. 2020. A translational
     pathway of deep learning methods in GastroIntestinal Endoscopy.
     CoRR abs/2010.06034 (2020). https://arxiv.org/abs/2010.06034
 [2] Sharib Ali, Felix Zhou, Barbara Braden, and others. 2020. An objective
     comparison of detection and segmentation algorithms for artefacts in
     clinical endoscopy. Scientific Reports 10 (2020), 2748.
 [3] Sharib Ali et al. 2020. Endoscopy disease detection challenge 2020.
     arXiv preprint arXiv:2003.03376 (2020). https://arxiv.org/abs/2003.
     03376
 [4] Melina Arnold, Mónica S Sierra, Mathieu Laversanne, Isabelle Soerjo-
     mataram, Ahmedin Jemal, and Freddie Bray. 2017. Global patterns and
     trends in colorectal cancer incidence and mortality. Gut 66, 4 (2017),
     683–691. https://doi.org/10.1136/gutjnl-2015-310912
 [5] Jorge Bernal and others. 2017. Comparative validation of polyp de-
     tection methods in video colonoscopy: results from the MICCAI 2015
     endoscopic vision challenge. IEEE Trans. Med. Imag 36, 6 (2017), 1231–
     1249.
 [6] Jorge Bernal and others. 2018. Polyp detection benchmark in
     colonoscopy videos using gtcreator: A novel fully configurable tool for
     easy and fast annotation of image databases. In Proc. Comput. Assist.
     Radiol. Surg. (CARS).
 [7] Xiaosheng He, Dong Hang, Kana Wu, Jennifer Nayor, David A Drew,
     Edward L Giovannucci, Shuji Ogino, Andrew T Chan, and Mingyang
     Song. 2020. Long-term Risk of Colorectal Cancer After Removal of
     Conventional Adenomas and Serrated Polyps. Gastroenterology 158, 4
     (March 2020), 852–861.e4. https://doi.org/10.1053/j.gastro.2019.06.039
 [8] Debesh Jha, Sharib Ali, Håvard D. Johansen, Dag D. Johansen, Jens
     Rittscher, Michael A. Riegler, and Pål Halvorsen. 2020. Real-Time Polyp
     Detection, Localisation and Segmentation in Colonoscopy Using Deep
     Learning. CoRR abs/2011.07631 (2020). https://arxiv.org/abs/2011.
     07631
 [9] Debesh Jha, Steven A. Hicks, Krister Emanuelsen, Håvard Johansen,
     Dag Johansen, Thomas de Lange, Michael A. Riegler, and Pål
     Halvorsen. 2020. Medico Multimedia Task at MediaEval 2020: Auto-
     matic Polyp Segmentation. In Proc. of the MediaEval 2020 Workshop.
[10] Debesh Jha, Pia H Smedsrud, Michael A Riegler, Pål Halvorsen,
     Thomas de Lange, Dag Johansen, and Håvard D Johansen. 2020. Kvasir-
     seg: A segmented polyp dataset. In International Conference on Multi-
     media Modeling. Springer, 451–462.
[11] Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for
     Stochastic Optimization. In 3rd International Conference on Learn-
     ing Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015,
     Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.).
     http://arxiv.org/abs/1412.6980
[12] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net:
     Convolutional networks for biomedical image segmentation. In MIC-
     CAI. 234–241.
[13] J. G. Williams, R. D. Pullan, J. Hill, P. G. Horgan, E. Salmo, G. N.
     Buchanan, S. Rasheed, S. G. McGee, and N. Haboubi. 2013. Manage-
     ment of the malignant colorectal polyp: ACPGBI position statement.
     Colorectal Disease 15, s2 (2013), 1–38. https://doi.org/10.1111/codi.
     12262