Medico Multimedia Task at MediaEval 2020:
                                Automatic Polyp Segmentation
                        Debesh Jha1,2 , Steven A. Hicks1,3 , Krister Emanuelsen1 , Håvard Johansen2
                        Dag Johansen2 , Thomas de Lange4,5,6 , Michael A. Riegler1 , Pål Halvorsen1,3
       1 SimulaMet, Norway                    2 UiT The Arctic University of Norway                  3 Oslo Metropolitan University, Norway
        4 Augere Medical AS, Norway                       5 Sahlgrenska University Hospital, Sweden                  6 Bærum Hospital, Norway


ABSTRACT
Colorectal cancer is the third most common cause of cancer world-
wide. According to Global cancer statistics 2018, the incidence of
colorectal cancer is increasing in both developing and developed
countries. Early detection of colon anomalies such as polyps is im-
portant for cancer prevention, and automatic polyp segmentation
can play a crucial role for this. Regardless of the recent advancement
in early detection and treatment options, the estimated polyp miss
rate is still around 20%. Support via an automated computer-aided
diagnosis system could be one of the potential solutions for the                     Figure 1: Polyps and corresponding masks from Kvasir-SEG
overlooked polyps. Such detection systems can help low-cost design
solutions and save doctors time, which they could for example use
to perform more patient examinations. In this paper, we introduce                    colonoscopy is a prerequisite for early cancer detection and pre-
the 2020 Medico challenge, provide some information on related                       vention of CRC. Regardless of the achievement of colonoscopy
work and the dataset, describe the task and evaluation metrics, and                  examinations, the estimated polyp miss rate is still around 20% [12],
discuss the necessity of organizing the Medico challenge.                            and there are large inter-observer variabilities [13]. An automated
                                                                                     computer-aided diagnosis (CADx) system detecting and highlight-
                                                                                     ing polyps could be of great help to improve the average endoscopist
1    INTRODUCTION                                                                    performance.
The goal of Medico automatic polyp segmentation challenge the                           In recent years, convolutional neural networks (CNNs) have
benchmarking of polyp segmentation algorithms on new test im-                        advanced medical image segmentation algorithms. However, it
ages for automatic polyp segmentation that can detect and mask                       is essential to understand the strengths and weaknesses of the
out polyps (including irregular, small or flat polyps) with high                     different approaches via performance comparison on a common
accuracy. The main goal of the challenge is to benchmark dif-                        dataset. There are a large number of available studies on automatic
ferent computer vision and machine learning algorithms on the                        polyp segmentation [3–5, 8, 9, 11, 14, 20]. However, most of the
same dataset that could promote to build novel methods which                         conducted studies are performed on a restricted dataset which
could be potentially useful in clinical settings. Moreover, we em-                   makes it difficult for benchmarking, algorithm development and
phasize on robustness and generalization of the methods to solve                     reproducible results. Our challenge is utilizing the publicly available
the limitations related to data availability and method compari-                     Kvasir-SEG dataset [10]. The entire Kvasir-SEG dataset is used for
son. The detailed challenge description can be found here https:                     training and an additional and unseen test dataset for benchmarking
//multimediaeval.github.io/editions/2020/tasks/medico/.                              the algorithms.
   After three years of organizing the Medico Multimedia Task [6,                       In summary, the Medico 2020 challenge can support building fu-
17, 18], we present the fourth iteration in the series. With a focus on              ture systems and foster open, comparable and reproducible results
assessing human semen quality last year [6], this year we build on                   where the objective of the task is to find efficient solutions automatic
the 2017 [18] and 2018 [17] challenges of automatically detecting                    polyp segmentation, both in terms of pixel-wise accuracy and pro-
anomalies in video and image data from the GI tract. We introduce                    cessing speed.
a new task for automatic polyp segmentation. In the prior gastroin-                     For the clinical translation of technologies, it is essential to design
testinal (GI) challenges, we classified the images into various classes.             methods on multi-centered and multi-modal datasets. We have
We are now interested in identifying each pixel of the lesions from                  recently released several gastrointestinal endoscopy [1, 15, 16],
the provided polyp images in this challenge.                                         wireless capsule endoscopy [19], endoscopic instrument [7], and
   The task is important because colorectal cancer (CRC) is the third                polyp datasets [10]. Thus, we have put in significant effort to address
most leading cause of cancer and fourth most prevailing strain in                    the challenges related to lack of public available datasets in the field
terms of cancer incidence globally [2]. Regular screening through                    of GI endoscopy.

Copyright 2020 for this paper by its authors. Use permitted under Creative Commons   2    DATASET
License Attribution 4.0 International (CC BY 4.0).
MediaEval’20, December 14-15 2020, Online                                            The Kvaris-SEG [10] training dataset can be downloaded from https:
                                                                                     //datasets.simula.no/kvasir-seg/. It contains 1,000 polyp images and
MediaEval’20, December 14-15 2020, Online                                                                                           Jha et. al.


                                                                           Moreover, in the polyp image segmentation task (i.e., a binary
                                                                        segmentation task), precision (positive predictive value) shows
                                                                        over-segmentation, and recall (true positive rate) shows under-
                                                                        segmentation. Over-segmentation means that the predicted image
                                                                        covers more area than the ground truth in some part of the frame.
                                                                        The under-segmentation implies that the algorithm has predicted
                                                                        less polyp content in some portion of the image compared to its
                                                                        corresponding ground truth. We also encourage participants to
                                                                        calculate precision and recall, and these are given by:
       Figure 2: Examples polyps from the test images                                                          𝑡𝑝
                                                                                                Precision =
                                                                                                            𝑡𝑝 + 𝑓 𝑝
                                                                                                             𝑡𝑝
their corresponding ground truth mask as shown in Figure 1. The                                  Recall =          .
                                                                                                          𝑡𝑝 + 𝑓 𝑛
dataset was collected from real routine clinical examinations at
                                                                           The main metric for evaluation and ranking of the teams is mIoU.
Bærum Hospital in Norway by expert gastroenterologists. The
                                                                        There is a direct correlation between mIoU and DSC. Therefore,
resolution of images varies from 332 × 487 to 1920 × 1072 pixels.
                                                                        we have only used one metric. If the teams have the same mIoU
Some of the images contain a green thumbnail in the lower-left
                                                                        values, then the teams will be further evaluated on the basis of the
corner of the images showing the scope position marking from the
                                                                        higher value of the DSC. For the evaluation, we ask the participants
ScopeGuide (Olympus) (see Figure 2). We annotate another separate
                                                                        to submit the predicted masks in a zip file. The resolution of the
dataset consisting of 160 new polyp images and use the resulting
                                                                        predicted masks must be equal to the test images.
dataset as the test set to benchmark the participants’ approaches.
Figure 2 shows some examples of test images used in the challenge.      3.2    The algorithm speed efficiency task
3     TASK DESCRIPTION                                                  Real-time polyp detection is required for live patient examinations
                                                                        in the clinic. It can gain gastroenterologist attention to the region
The participants are invited to submit their solutions for the two      of interest. Thus, we also ask participants to participate in the effi-
following tasks: segmentation and efficiency (speed).                   ciency task. The algorithm efficiency task is similar to the previous
                                                                        task, but it puts a stronger emphasis on the algorithm’s speed in
3.1    The automatic polyp segmentation task                            terms of frames-per-second.
This task invites participants to develop new algorithms for segmen-       Submissions for this task will be evaluated based on both the al-
tation of polyps. The main focus is to develop an efficient system      gorithm’s speed and segmentation performance. The segmentation
in terms of diagnostic ability and processing speed and accurately      performance (the segmentation accuracy) will be measured using
segment the maximum polyp area in a frame from the provided             the same mIoU metric as described above for the first task, whereas
colonoscopic images.                                                    speed will be measured by frames-per-second (FPS) according
   There are several ways to evaluate the segmentation accuracy.        to the following formula:
The most commonly used metrics by the wider medical imaging                                               #𝑓 𝑟𝑎𝑚𝑒𝑠
community are the correct Dice similarity coefficient (DSC) or                                     𝐹 𝑃𝑆 =
                                                                                                             𝑠𝑒𝑐
overlap index, and the mean Intersection over Union (mIoU),             For this task, we require participants to submit their proposed
also known as the Jaccard index. In clinical applications, the gas-     algorithm as part of a Docker image so that we can evaluate it on
troenterologists are interested in pixel-wise detail information ex-    our hardware. We evaluate the performance of the algorithm on
traction from the potential lesions. The metrics such as DSC and        the Nvidia GeForce GTX 1080 system. For the team ranking, we set
mIoU are used to compare the pixel-wise similarity between the          a certain mIoU as threshold for considering it as a valid efficient
predicted segmentation maps and the original ground truth of the        segmentation solution and rank according to the FPS.
lesions.
   The DSC is a metric for comparison of the similarities between       4     DISCUSSION AND OUTLOOK
two given samples. If tp, tn, fp, and fn represent the number of true   Currently, there is a growing interest in the development of CADx
positive, true negative, false positive and false negative per-pixel    systems that could act as a second observer and digital assistant
predictions for an image, respectively, then the DSC is given as        for the endoscopists. Algorithmic benchmarking is an efficient ap-
                                    2 · 𝑡𝑝                              proach to analyze the results of different methods. A comparison
                      DSC =                                             of different approaches can help us to identify challenging cases in
                              2 · 𝑡𝑝 + 𝑓 𝑝 + 𝑓 𝑛
                                                                        the data. We then can discriminate the image frames into simple,
Furthermore, the IoU is then defined as the ratio of intersection of    moderate, and challenging images. Later on, we can target to de-
two metrics over a union of two corresponding metrics. The mean         velop models on the challenging images that are usually missed out
IoU computes IoU of each semantic class of an image and calculate       during a routine examination to design better CADx systems. We
the mean over each classes. The IoU is defined as:                      hope that this approach would help us to design better performing
                                   𝑡𝑝                                   algorithms/models that may increase the efficiency of the health
                       IoU =
                              𝑡𝑝 + 𝑓 𝑝 + 𝑓 𝑛                            system.
Medico Multimedia Task                                                                                                 MediaEval’20, December 14-15 2020, Online


REFERENCES                                                                                       (2020).
 [1] Hanna Borgli, Vajira Thambawita, Pia H Smedsrud, Steven Hicks, Debesh Jha,             [20] Pu Wang, Xiao Xiao, Jeremy R Glissen Brown, Tyler M Berzin, Mengtian Tu,
     Sigrun L Eskeland, Kristin Ranheim Randel, Konstantin Pogorelov, Mathias Lux,               Fei Xiong, Xiao Hu, Peixi Liu, Yan Song, Di Zhang, et al. 2018. Development
     Duc Tien Dang Nguyen, et al. 2020. HyperKvasir, a comprehensive multi-class                 and validation of a deep-learning algorithm for the detection of polyps during
     image and video dataset for gastrointestinal endoscopy. Scientific Data 7, 1 (2020),        colonoscopy. Nature biomedical engineering 2, 10 (2018), 741–748.
     1–14.
 [2] Freddie Bray, Jacques Ferlay, Isabelle Soerjomataram, Rebecca L Siegel, Lindsey A
     Torre, and Ahmedin Jemal. 2018. Global cancer statistics 2018: GLOBOCAN
     estimates of incidence and mortality worldwide for 36 cancers in 185 countries.
     CA: a cancer journal for clinicians 68, 6 (2018), 394–424.
 [3] Deng-Ping Fan, Ge-Peng Ji, Tao Zhou, Geng Chen, Huazhu Fu, Jianbing Shen,
     and Ling Shao. 2020. Pranet: Parallel reverse attention network for polyp seg-
     mentation. arXiv preprint arXiv:2006.11392 (2020).
 [4] Yunbo Guo, Jorge Bernal, and Bogdan J Matuszewski. 2020. Polyp Segmentation
     with Fully Convolutional Deep Neural Networks—Extended Evaluation Study.
     Journal of Imaging 6, 7 (2020), 69.
 [5] Yun Bo Guo and Bogdan Matuszewski. 2019. GIANA Polyp Segmentation with
     Fully Convolutional Dilation Neural Networks. In Proc. of International Joint
     Conference on Computer Vision, Imaging and Computer Graphics Theory and
     Applications. 632–641.
 [6] Steven Hicks, Michael Riegler, Pia Smedsrud, Trine B Haugen, Kristin Ranheim
     Randel, Konstantin Pogorelov, Håkon Kvale Stensland, Duc-Tien Dang-Nguyen,
     Mathias Lux, Andreas Petlund, et al. 2019. Acm multimedia biomedia 2019 grand
     challenge overview. In Proc. of the ACM International Conference on Multimedia.
     2563–2567.
 [7] Debesh Jha, Sharib Ali, Krister Emanuelsen, Steven Hicks, Vajira Thambawita,
     Riegler Michael A Garcia-Ceja, Enrique, Lange Thomas de, Peter T. Schmidt,
     Johansen Håvard, Dag Johansen, and Halvorsen Pål. 2021. Kvasir-Instrument:
     Diagnostic and Therapeutictool Segmentation Dataset in Gastrointestinal En-
     doscopy. In Proc. of International Conference on Multimedia Modeling.
 [8] Debesh Jha, Sharib Ali, Håvard D. Johansen, Dag Johansen, Jens Rittscher,
     Michael A. Riegler, and Pål Halvorsen. 2020. Real-Time Polyp Detection, Locali-
     sation and Segmentation in Colonoscopy Using Deep Learning. arXiv preprint
     arXiv:2006.11392 (2020).
 [9] Debesh Jha, Michael A Riegler, Dag Johansen, Pål Halvorsen, and Håvard D
     Johansen. 2020. DoubleU-Net: A Deep Convolutional Neural Network for Medical
     Image Segmentation. In Proc. of International Symposium on Computer-Based
     Medical Systems. 558–564.
[10] Debesh Jha, Pia H Smedsrud, Michael A Riegler, Pål Halvorsen, Thomas de Lange,
     Dag Johansen, and Håvard D Johansen. 2020. Kvasir-SEG: A segmented polyp
     dataset. In Proc. of International Conference on Multimedia Modeling. 451–462.
[11] Debesh Jha, Pia H Smedsrud, Michael A Riegler, Dag Johansen, Thomas De Lange,
     Pål Halvorsen, and Håvard D Johansen. 2019. ResUNet++: An Advanced Archi-
     tecture for Medical Image Segmentation. In Proc. of International Symposium on
     Multimedia. 225–230.
[12] Michal F Kaminski, Jaroslaw Regula, Ewa Kraszewska, Marcin Polkowski, Urszula
     Wojciechowska, Joanna Didkowska, Maria Zwierko, Maciej Rupinski, Marek P
     Nowacki, and Eugeniusz Butruk. 2010. Quality indicators for colonoscopy and
     the risk of interval cancer. New England Journal of Medicine 362, 19 (2010),
     1795–1803.
[13] Nadim Mahmud, Jonah Cohen, Kleovoulos Tsourides, and Tyler M Berzin. 2015.
     Computer vision and augmented reality in gastrointestinal endoscopy. Gastroen-
     terology report 3, 3 (2015), 179–184.
[14] Tanvir Mahmud, Bishmoy Paul, and Shaikh Anowarul Fattah. 2020. PolypSegNet:
     A Modified Encoder-Decoder Architecture for Automated Polyp Segmentation
     from Colonoscopy Images. Computers in Biology and Medicine (2020), 104119.
[15] Konstantin Pogorelov, Kristin Ranheim Randel, Thomas de Lange, Sigrun Losada
     Eskeland, Carsten Griwodz, Dag Johansen, Concetto Spampinato, Mario
     Taschwer, Mathias Lux, Peter Thelin Schmidt, et al. 2017. Nerthus: A Bowel
     Preparation Quality Video Dataset. In Proceedings of the ACM on Multimedia
     Systems Conference. 170–174.
[16] Konstantin Pogorelov, Kristin Ranheim Randel, Carsten Griwodz, Sigrun Losada
     Eskeland, Thomas de Lange, Dag Johansen, Concetto Spampinato, Duc-Tien
     Dang-Nguyen, Mathias Lux, Peter Thelin Schmidt, et al. 2017. Kvasir: A multi-
     class image dataset for computer aided gastrointestinal disease detection. In Proc.
     of the ACM on Multimedia Systems Conference. 164–169.
[17] Konstantin Pogorelov, Michael Riegler, Pål Halvorsen, Steven Hicks, Kristin Ran-
     heim Randel, Duc Tien Dang Nguyen, Mathias Lux, Olga Ostroukhova, and
     Thomas de Lange. 2018. Medico multimedia task at MediaEval 2018. In Proc. of
     MediaEval 2018 CEUR Workshop.
[18] Michael Riegler, Konstantin Pogorelov, Pål Halvorsen, Carsten Griwodz, Thomas
     Lange, Kristin Randel, Sigrun Eskeland, Duc Tien Dang Nguyen, Mathias Lux,
     and Concetto Spampinato. 2017. Multimedia for medicine: the medico task at
     Mediaeval 2017. In Proc. CEUR Worksh. Multim. Bench. Worksh.
[19] Pia H Smedsrud, Henrik L Gjestang, Oda O Nedrejord, Espen Næss, Vajira
     Thambawita, Steven Hicks, Hanna Borgli, Debesh Jha, Tor Jan Derek Berstad,
     Sigrun L Eskeland, et al. 2020. Kvasir-Capsule, a video capsule endoscopy dataset.