Medico Multimedia Task at MediaEval 2020: Automatic Polyp Segmentation Debesh Jha1,2 , Steven A. Hicks1,3 , Krister Emanuelsen1 , Håvard Johansen2 Dag Johansen2 , Thomas de Lange4,5,6 , Michael A. Riegler1 , Pål Halvorsen1,3 1 SimulaMet, Norway 2 UiT The Arctic University of Norway 3 Oslo Metropolitan University, Norway 4 Augere Medical AS, Norway 5 Sahlgrenska University Hospital, Sweden 6 Bærum Hospital, Norway ABSTRACT Colorectal cancer is the third most common cause of cancer world- wide. According to Global cancer statistics 2018, the incidence of colorectal cancer is increasing in both developing and developed countries. Early detection of colon anomalies such as polyps is im- portant for cancer prevention, and automatic polyp segmentation can play a crucial role for this. Regardless of the recent advancement in early detection and treatment options, the estimated polyp miss rate is still around 20%. Support via an automated computer-aided diagnosis system could be one of the potential solutions for the Figure 1: Polyps and corresponding masks from Kvasir-SEG overlooked polyps. Such detection systems can help low-cost design solutions and save doctors time, which they could for example use to perform more patient examinations. In this paper, we introduce colonoscopy is a prerequisite for early cancer detection and pre- the 2020 Medico challenge, provide some information on related vention of CRC. Regardless of the achievement of colonoscopy work and the dataset, describe the task and evaluation metrics, and examinations, the estimated polyp miss rate is still around 20% [12], discuss the necessity of organizing the Medico challenge. and there are large inter-observer variabilities [13]. An automated computer-aided diagnosis (CADx) system detecting and highlight- ing polyps could be of great help to improve the average endoscopist 1 INTRODUCTION performance. The goal of Medico automatic polyp segmentation challenge the In recent years, convolutional neural networks (CNNs) have benchmarking of polyp segmentation algorithms on new test im- advanced medical image segmentation algorithms. However, it ages for automatic polyp segmentation that can detect and mask is essential to understand the strengths and weaknesses of the out polyps (including irregular, small or flat polyps) with high different approaches via performance comparison on a common accuracy. The main goal of the challenge is to benchmark dif- dataset. There are a large number of available studies on automatic ferent computer vision and machine learning algorithms on the polyp segmentation [3–5, 8, 9, 11, 14, 20]. However, most of the same dataset that could promote to build novel methods which conducted studies are performed on a restricted dataset which could be potentially useful in clinical settings. Moreover, we em- makes it difficult for benchmarking, algorithm development and phasize on robustness and generalization of the methods to solve reproducible results. Our challenge is utilizing the publicly available the limitations related to data availability and method compari- Kvasir-SEG dataset [10]. The entire Kvasir-SEG dataset is used for son. The detailed challenge description can be found here https: training and an additional and unseen test dataset for benchmarking //multimediaeval.github.io/editions/2020/tasks/medico/. the algorithms. After three years of organizing the Medico Multimedia Task [6, In summary, the Medico 2020 challenge can support building fu- 17, 18], we present the fourth iteration in the series. With a focus on ture systems and foster open, comparable and reproducible results assessing human semen quality last year [6], this year we build on where the objective of the task is to find efficient solutions automatic the 2017 [18] and 2018 [17] challenges of automatically detecting polyp segmentation, both in terms of pixel-wise accuracy and pro- anomalies in video and image data from the GI tract. We introduce cessing speed. a new task for automatic polyp segmentation. In the prior gastroin- For the clinical translation of technologies, it is essential to design testinal (GI) challenges, we classified the images into various classes. methods on multi-centered and multi-modal datasets. We have We are now interested in identifying each pixel of the lesions from recently released several gastrointestinal endoscopy [1, 15, 16], the provided polyp images in this challenge. wireless capsule endoscopy [19], endoscopic instrument [7], and The task is important because colorectal cancer (CRC) is the third polyp datasets [10]. Thus, we have put in significant effort to address most leading cause of cancer and fourth most prevailing strain in the challenges related to lack of public available datasets in the field terms of cancer incidence globally [2]. Regular screening through of GI endoscopy. Copyright 2020 for this paper by its authors. Use permitted under Creative Commons 2 DATASET License Attribution 4.0 International (CC BY 4.0). MediaEval’20, December 14-15 2020, Online The Kvaris-SEG [10] training dataset can be downloaded from https: //datasets.simula.no/kvasir-seg/. It contains 1,000 polyp images and MediaEval’20, December 14-15 2020, Online Jha et. al. Moreover, in the polyp image segmentation task (i.e., a binary segmentation task), precision (positive predictive value) shows over-segmentation, and recall (true positive rate) shows under- segmentation. Over-segmentation means that the predicted image covers more area than the ground truth in some part of the frame. The under-segmentation implies that the algorithm has predicted less polyp content in some portion of the image compared to its corresponding ground truth. We also encourage participants to calculate precision and recall, and these are given by: Figure 2: Examples polyps from the test images 𝑡𝑝 Precision = 𝑡𝑝 + 𝑓 𝑝 𝑡𝑝 their corresponding ground truth mask as shown in Figure 1. The Recall = . 𝑡𝑝 + 𝑓 𝑛 dataset was collected from real routine clinical examinations at The main metric for evaluation and ranking of the teams is mIoU. Bærum Hospital in Norway by expert gastroenterologists. The There is a direct correlation between mIoU and DSC. Therefore, resolution of images varies from 332 × 487 to 1920 × 1072 pixels. we have only used one metric. If the teams have the same mIoU Some of the images contain a green thumbnail in the lower-left values, then the teams will be further evaluated on the basis of the corner of the images showing the scope position marking from the higher value of the DSC. For the evaluation, we ask the participants ScopeGuide (Olympus) (see Figure 2). We annotate another separate to submit the predicted masks in a zip file. The resolution of the dataset consisting of 160 new polyp images and use the resulting predicted masks must be equal to the test images. dataset as the test set to benchmark the participants’ approaches. Figure 2 shows some examples of test images used in the challenge. 3.2 The algorithm speed efficiency task 3 TASK DESCRIPTION Real-time polyp detection is required for live patient examinations in the clinic. It can gain gastroenterologist attention to the region The participants are invited to submit their solutions for the two of interest. Thus, we also ask participants to participate in the effi- following tasks: segmentation and efficiency (speed). ciency task. The algorithm efficiency task is similar to the previous task, but it puts a stronger emphasis on the algorithm’s speed in 3.1 The automatic polyp segmentation task terms of frames-per-second. This task invites participants to develop new algorithms for segmen- Submissions for this task will be evaluated based on both the al- tation of polyps. The main focus is to develop an efficient system gorithm’s speed and segmentation performance. The segmentation in terms of diagnostic ability and processing speed and accurately performance (the segmentation accuracy) will be measured using segment the maximum polyp area in a frame from the provided the same mIoU metric as described above for the first task, whereas colonoscopic images. speed will be measured by frames-per-second (FPS) according There are several ways to evaluate the segmentation accuracy. to the following formula: The most commonly used metrics by the wider medical imaging #𝑓 𝑟𝑎𝑚𝑒𝑠 community are the correct Dice similarity coefficient (DSC) or 𝐹 𝑃𝑆 = 𝑠𝑒𝑐 overlap index, and the mean Intersection over Union (mIoU), For this task, we require participants to submit their proposed also known as the Jaccard index. In clinical applications, the gas- algorithm as part of a Docker image so that we can evaluate it on troenterologists are interested in pixel-wise detail information ex- our hardware. We evaluate the performance of the algorithm on traction from the potential lesions. The metrics such as DSC and the Nvidia GeForce GTX 1080 system. For the team ranking, we set mIoU are used to compare the pixel-wise similarity between the a certain mIoU as threshold for considering it as a valid efficient predicted segmentation maps and the original ground truth of the segmentation solution and rank according to the FPS. lesions. The DSC is a metric for comparison of the similarities between 4 DISCUSSION AND OUTLOOK two given samples. If tp, tn, fp, and fn represent the number of true Currently, there is a growing interest in the development of CADx positive, true negative, false positive and false negative per-pixel systems that could act as a second observer and digital assistant predictions for an image, respectively, then the DSC is given as for the endoscopists. Algorithmic benchmarking is an efficient ap- 2 · 𝑡𝑝 proach to analyze the results of different methods. A comparison DSC = of different approaches can help us to identify challenging cases in 2 · 𝑡𝑝 + 𝑓 𝑝 + 𝑓 𝑛 the data. We then can discriminate the image frames into simple, Furthermore, the IoU is then defined as the ratio of intersection of moderate, and challenging images. Later on, we can target to de- two metrics over a union of two corresponding metrics. The mean velop models on the challenging images that are usually missed out IoU computes IoU of each semantic class of an image and calculate during a routine examination to design better CADx systems. We the mean over each classes. The IoU is defined as: hope that this approach would help us to design better performing 𝑡𝑝 algorithms/models that may increase the efficiency of the health IoU = 𝑡𝑝 + 𝑓 𝑝 + 𝑓 𝑛 system. Medico Multimedia Task MediaEval’20, December 14-15 2020, Online REFERENCES (2020). [1] Hanna Borgli, Vajira Thambawita, Pia H Smedsrud, Steven Hicks, Debesh Jha, [20] Pu Wang, Xiao Xiao, Jeremy R Glissen Brown, Tyler M Berzin, Mengtian Tu, Sigrun L Eskeland, Kristin Ranheim Randel, Konstantin Pogorelov, Mathias Lux, Fei Xiong, Xiao Hu, Peixi Liu, Yan Song, Di Zhang, et al. 2018. Development Duc Tien Dang Nguyen, et al. 2020. HyperKvasir, a comprehensive multi-class and validation of a deep-learning algorithm for the detection of polyps during image and video dataset for gastrointestinal endoscopy. Scientific Data 7, 1 (2020), colonoscopy. Nature biomedical engineering 2, 10 (2018), 741–748. 1–14. [2] Freddie Bray, Jacques Ferlay, Isabelle Soerjomataram, Rebecca L Siegel, Lindsey A Torre, and Ahmedin Jemal. 2018. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer journal for clinicians 68, 6 (2018), 394–424. [3] Deng-Ping Fan, Ge-Peng Ji, Tao Zhou, Geng Chen, Huazhu Fu, Jianbing Shen, and Ling Shao. 2020. Pranet: Parallel reverse attention network for polyp seg- mentation. arXiv preprint arXiv:2006.11392 (2020). [4] Yunbo Guo, Jorge Bernal, and Bogdan J Matuszewski. 2020. Polyp Segmentation with Fully Convolutional Deep Neural Networks—Extended Evaluation Study. Journal of Imaging 6, 7 (2020), 69. [5] Yun Bo Guo and Bogdan Matuszewski. 2019. GIANA Polyp Segmentation with Fully Convolutional Dilation Neural Networks. In Proc. of International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. 632–641. [6] Steven Hicks, Michael Riegler, Pia Smedsrud, Trine B Haugen, Kristin Ranheim Randel, Konstantin Pogorelov, Håkon Kvale Stensland, Duc-Tien Dang-Nguyen, Mathias Lux, Andreas Petlund, et al. 2019. Acm multimedia biomedia 2019 grand challenge overview. In Proc. of the ACM International Conference on Multimedia. 2563–2567. [7] Debesh Jha, Sharib Ali, Krister Emanuelsen, Steven Hicks, Vajira Thambawita, Riegler Michael A Garcia-Ceja, Enrique, Lange Thomas de, Peter T. Schmidt, Johansen Håvard, Dag Johansen, and Halvorsen Pål. 2021. Kvasir-Instrument: Diagnostic and Therapeutictool Segmentation Dataset in Gastrointestinal En- doscopy. In Proc. of International Conference on Multimedia Modeling. [8] Debesh Jha, Sharib Ali, Håvard D. Johansen, Dag Johansen, Jens Rittscher, Michael A. Riegler, and Pål Halvorsen. 2020. Real-Time Polyp Detection, Locali- sation and Segmentation in Colonoscopy Using Deep Learning. arXiv preprint arXiv:2006.11392 (2020). [9] Debesh Jha, Michael A Riegler, Dag Johansen, Pål Halvorsen, and Håvard D Johansen. 2020. DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation. In Proc. of International Symposium on Computer-Based Medical Systems. 558–564. [10] Debesh Jha, Pia H Smedsrud, Michael A Riegler, Pål Halvorsen, Thomas de Lange, Dag Johansen, and Håvard D Johansen. 2020. Kvasir-SEG: A segmented polyp dataset. In Proc. of International Conference on Multimedia Modeling. 451–462. [11] Debesh Jha, Pia H Smedsrud, Michael A Riegler, Dag Johansen, Thomas De Lange, Pål Halvorsen, and Håvard D Johansen. 2019. ResUNet++: An Advanced Archi- tecture for Medical Image Segmentation. In Proc. of International Symposium on Multimedia. 225–230. [12] Michal F Kaminski, Jaroslaw Regula, Ewa Kraszewska, Marcin Polkowski, Urszula Wojciechowska, Joanna Didkowska, Maria Zwierko, Maciej Rupinski, Marek P Nowacki, and Eugeniusz Butruk. 2010. Quality indicators for colonoscopy and the risk of interval cancer. New England Journal of Medicine 362, 19 (2010), 1795–1803. [13] Nadim Mahmud, Jonah Cohen, Kleovoulos Tsourides, and Tyler M Berzin. 2015. Computer vision and augmented reality in gastrointestinal endoscopy. Gastroen- terology report 3, 3 (2015), 179–184. [14] Tanvir Mahmud, Bishmoy Paul, and Shaikh Anowarul Fattah. 2020. PolypSegNet: A Modified Encoder-Decoder Architecture for Automated Polyp Segmentation from Colonoscopy Images. Computers in Biology and Medicine (2020), 104119. [15] Konstantin Pogorelov, Kristin Ranheim Randel, Thomas de Lange, Sigrun Losada Eskeland, Carsten Griwodz, Dag Johansen, Concetto Spampinato, Mario Taschwer, Mathias Lux, Peter Thelin Schmidt, et al. 2017. Nerthus: A Bowel Preparation Quality Video Dataset. In Proceedings of the ACM on Multimedia Systems Conference. 170–174. [16] Konstantin Pogorelov, Kristin Ranheim Randel, Carsten Griwodz, Sigrun Losada Eskeland, Thomas de Lange, Dag Johansen, Concetto Spampinato, Duc-Tien Dang-Nguyen, Mathias Lux, Peter Thelin Schmidt, et al. 2017. Kvasir: A multi- class image dataset for computer aided gastrointestinal disease detection. In Proc. of the ACM on Multimedia Systems Conference. 164–169. [17] Konstantin Pogorelov, Michael Riegler, Pål Halvorsen, Steven Hicks, Kristin Ran- heim Randel, Duc Tien Dang Nguyen, Mathias Lux, Olga Ostroukhova, and Thomas de Lange. 2018. Medico multimedia task at MediaEval 2018. In Proc. of MediaEval 2018 CEUR Workshop. [18] Michael Riegler, Konstantin Pogorelov, Pål Halvorsen, Carsten Griwodz, Thomas Lange, Kristin Randel, Sigrun Eskeland, Duc Tien Dang Nguyen, Mathias Lux, and Concetto Spampinato. 2017. Multimedia for medicine: the medico task at Mediaeval 2017. In Proc. CEUR Worksh. Multim. Bench. Worksh. [19] Pia H Smedsrud, Henrik L Gjestang, Oda O Nedrejord, Espen Næss, Vajira Thambawita, Steven Hicks, Hanna Borgli, Debesh Jha, Tor Jan Derek Berstad, Sigrun L Eskeland, et al. 2020. Kvasir-Capsule, a video capsule endoscopy dataset.