=Paper=
{{Paper
|id=Vol-3181/paper1
|storemode=property
|title=Medico Multimedia Task at MediaEval 2021: Transparency in Medical Image
Segmentation
|pdfUrl=https://ceur-ws.org/Vol-3181/paper1.pdf
|volume=Vol-3181
|authors=Steven Hicks,Debesh Jha,Vajira Thambawita,Hugo Hammer,Thomas de Lange,Sravanthi Parasa,Michael Riegler,Pål Halvorsen
|dblpUrl=https://dblp.org/rec/conf/mediaeval/HicksJTHLPRH21
}}
==Medico Multimedia Task at MediaEval 2021: Transparency in Medical Image
Segmentation==
Medico Multimedia Task at MediaEval 2021:
Transparency in Medical Image Segmentation
Steven A. Hicks1,2 , Debesh Jha1,3 , Vajira Thambawita1,2 , Hugo L. Hammer2 ,
Thomas de Lange4,5 , Sravanthi Parasa6 , Michael A. Riegler1 , and Pål Halvorsen1,2
1 SimulaMet, Norway
2 Oslo Metropolitan University, Norway
3 UiT The Arctic University of Norway, Norway
4 Sahlgrenska University Hospital Mölndal, Sweden
5 University of Gothenburg, Sweden
6 Swedish Medical Center, USA
ABSTRACT
The Medico Multimedia Task focuses on providing multimedia re-
searchers with the opportunity to contribute to different areas of
medicine using multimedia data to solve several subtasks. This year,
the task focuses on transparency within machine learning-based
medical segmentation systems, where the use case is gastrointesti-
nal endoscopy. In this paper, we motivate the organization of this
task, describe the development and test dataset, and present the
evaluation process used to assess the participants’ submissions.
Figure 1: Examples taken from the development part of the
polyp segmentation dataset HyperKvasir. Note that the im-
ages have been resized from their original image dimen-
1 INTRODUCTION sions.
Finding and removing colon polyps is an essential step in prevent-
ing colorectal cancer. Current procedures have a high miss rate,
whereas computer-aided diagnosis systems can reduce the proba- 2 DATASET DETAILS
bility that diagnosticians overlook a polyp during a colonoscopy. The provided dataset is based on HyperKvasir [1]1 , which is cur-
As machine learning becomes more common in high-risk fields rently the largest public gastrointestinal dataset. We combined
like medicine, the need for transparent systems becomes more the segmentation part of HyperKvasir with additional images and
critical. In this case, transparency is defined as giving as much masks to create the development and testing datasets for this task.
detail as possible on the different parts that make up a machine The development dataset contains 1, 360 images of polyps and cor-
learning pipeline, including everything from data collection to final responding image masks, while the test dataset consists of 200 im-
prediction. This task focuses on high-performing, efficient, and age pairs collected from the same distribution as the development
transparent algorithms for polyp segmentation. dataset. The additional images added to the development dataset
The Medico Multimedia Task is held for the fifth time at the were collected from the testing datasets that were used in two pre-
MediaEval benchmark. We continue the tradition of using medical vious tasks, namely EndoTect [3] and last year’s Medico [4]. The
data to develop machine learning models that solve real-world ground truth for the provided dataset was created by an experienced
issues in medicine [2, 4–6]. Like last year, we use the gastrointestinal computer scientist, which was then verified by an expert gastroen-
tract as the medical use case, where automatic polyp segmentation terologist with over ten years of experience. Example images and
is the primary focus. However, this year, we have more training corresponding segmentation masks can be seen in Figure 1.
data and add an additional task that focuses on transparency in
the submitted solutions. The task is of interest to the researchers 3 TASK DESCRIPTIONS AND EVALUATION
working with multimedia segmentation, deep learning (semantic
The 2021 edition of the Medico Multimedia Task provides three
segmentation), computer vision and trustable and transparent AI
different subtasks, namely the polyp segmentation task, the efficient
systems.
segmentation task, and the transparent machine learning systems
task. The polyp segmentation subtask is the only subtask that is
required, the other two are optional. Each task allows for a total of
Copyright 2021 for this paper by its authors. Use permitted under Creative Commons five submissions each.
License Attribution 4.0 International (CC BY 4.0).
MediaEval’21, December 13-15 2021, Online
1 https://datasets.simula.no/hyper-kvasir
MediaEval’21, December 13-15 2021, Online Hicks et al.
3.1 Subtask 1: Polyp Segmentation explanations, open and commented source code, and detailed im-
The polyp segmentation task targets high-performing polyp seg- plementation descriptions.
mentation systems. Using the provided development dataset, par- Submissions to this task will be evaluated by a committee com-
ticipants are asked to develop models that automatically segment prised of at least three computer scientists and expert gastroenterol-
the presence of colon polyps in a given image. Submission to this ogists that are familiar with AI. The committee will evaluate the
task should be a zip file containing a predicted segmentation mask submissions from different perspectives. For example, the medical
using the .png file format for each image in the testing dataset. Each doctors will look at the system from a clinic point of view, assess-
predicted mask should use the same resolution as the input image ing transparency based on how it can be used in the clinic. The
and have the same filename. computer scientists will look at the technical transparency of the
Submissions will be evaluated based on the correctness of the submissions, like source code descriptions and the clarity of the
predicted masks using various segmentation metrics like pixel ac- implementation. Each team that submits to this task will receive a
curacy, precision, recall, Sørensen–Dice coefficient (Dice), and In- report on the level of transparency determined by the evaluation
tersection over Union (IoU). The primary metric used to rank the committee.
submissions will be IoU. The participants will receive a .csv file
containing the evaluation metrics for each run. 4 DISCUSSION AND OUTLOOK
Automatic segmentation of polyps in the gastrointestinal tract is
3.2 Subtask 2: Efficient Segmentation a problem that is highly requested by medical doctors working in
the field. Being more transparent about the work that goes into
The efficient segmentation task aims for efficient segmentation sys-
developing these methods would not only help doctors make more
tems while still obtaining a satisfactory prediction accuracy. Model
informed decisions on what systems should be used, but can also
efficiency is measured in the number of frames that a model can
aid in further development by future researchers. We hope that
process per second. The motivation behind this is the need for
this task will encourage the multimedia community to aid in the
real-time detection systems used during live endoscopy procedures.
development of computer-assisted finding segmentation, and fur-
For the system to be considered real-time, it should be able to pro-
ther motivate the use of transparent and open implementations of
cess at least 30 frames per second. To participate in this subtask,
machine learning systems in medicine.
participants must use the development dataset to train a polyp
segmentation model. Furthermore, this task also requires the par-
REFERENCES
ticipants to submit a Docker image of their implementation to be
[1] Hanna Borgli, Vajira Thambawita, Pia H Smedsrud, Steven Hicks, Debesh Jha,
evaluated on the organizers’ hardware. The Docker submission Sigrun L Eskeland, Kristin Ranheim Randel, Konstantin Pogorelov, Mathias Lux,
should generate a .csv submission file that contains the name of Duc Tien Dang Nguyen, Dag Johansen, Carsten Griwodz, Håkon K Stensland,
Enrique Garcia-Ceja, Peter T Schmidt, Hugo L Hammer, Michael A Riegler, Pål
the segmented image and the time (in seconds) used to perform Halvorsen, and Thomas de Lange. 2020. HyperKvasir, a comprehensive multi-class
the segmentation. A detailed description of the preparation and image and video dataset for gastrointestinal endoscopy. Scientific Data 7, 1 (2020),
submission requirements of the Docker image is available on the 283. https://doi.org/10.1038/s41597-020-00622-y
[2] Steven Hicks, Pål Halvorsen, Trine B. Haugen, Jorunn M. Andersen, Oliwia
official GitHub repository2 . Witczak, Konstantin Pogorelov, Hugo L. Hammer, Duc Tien Dang Nguyen, Math-
Models will be evaluated based on the performance metrics used ias Lux, and Michael Riegler. 2019. Medico multimedia task at MediaEval 2019. In
to evaluate the polyp segmentation task and the number of frames Proc. of MediaEval 2019 CEUR Workshop.
[3] Steven A. Hicks, Debesh Jha, Vajira Thambawita, Pål Halvorsen, Hugo L. Hammer,
that can be segmented per second. Submission will be ranked based and Michael A. Riegler. 2021. The EndoTect 2020 Challenge: Evaluation and
on a balanced metric between predictive performance and speed. Comparison of Classification, Segmentation and Inference Time for Endoscopy.
In Pattern Recognition. ICPR International Workshops and Challenges, Alberto
All submissions are evaluated on what can be considered consumer- Del Bimbo, Rita Cucchiara, Stan Sclaroff, Giovanni Maria Farinella, Tao Mei, Marco
grade hardware, that is, a computer running Arch Linux with an Bertini, Hugo Jair Escalante, and Roberto Vezzani (Eds.). Springer International
Intel Core i9-10900K processor, an Nvidia GeForce RTX 3090 graph- Publishing, Cham, 263–274.
[4] Debesh Jha, Steven Hicks, Krister Emanuelsen, Håvard Johansen, Dag Johansen,
ics processing unit (GPU), and 32 gigabytes of RAM. Thomas de Lange, Michael Riegler, and Pål Halvorsen. 2020. Medico Multimedia
Task at MediaEval 2020: Automatic Polyp Segmentation. In Proc. of MediaEval
2020 CEUR Workshop.
3.3 Subtask 3: Transparent Machine Learning [5] Konstantin Pogorelov, Michael Riegler, Pål Halvorsen, Steven Hicks, Kristin Ran-
Systems heim Randel, Duc Tien Dang Nguyen, Mathias Lux, Olga Ostroukhova, and
Thomas de Lange. 2018. Medico multimedia task at MediaEval 2018. In Proc.
The goal of the transparent machine learning system task is to pro- of MediaEval 2018 CEUR Workshop.
mote more transparency in medical applications of machine learn- [6] Michael Riegler, Konstantin Pogorelov, Pål Halvorsen, Carsten Griwodz, Thomas
Lange, Kristin Randel, Sigrun Eskeland, Duc Tien Dang Nguyen, Mathias Lux,
ing. The motivation behind this task is rooted in a general lack of and Concetto Spampinato. 2017. Multimedia for medicine: the medico task at
transparency in medical machine learning research [7]. A lot of Mediaeval 2017. In Proc. CEUR Worksh. Multim. Bench. Worksh.
[7] Hannah Stower. 2020. Transparency in medical AI. Nature Medicine (2020).
work is often published using private data, closed-source imple- https://doi.org/10.1038/s41591-020-01147-y
mentations, and lackluster evaluations, making the systems not
very reproducible or transparent. We leave it to the participants
to determine what makes a machine learning system transparent.
Still, some ideas include failure analysis, ablation studies, model
2 https://github.com/multimediaeval/2021-Medico-Multimedia