<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multimedia for Medicine: The Medico Task at MediaEval 2017</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michael Riegler</string-name>
          <email>michael@simula.no</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Konstantin Pogorelov</string-name>
          <email>konstantin@simula.no</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pål Halvorsen</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kristin Ranheim Randel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sigrun Losada Eskeland</string-name>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Duc-Tien Dang-Nguyen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mathias Lux</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carsten Griwodz</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Concetto Spampinato</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas de Lange</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cancer Registry of Norway</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dublin City University</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Simula Research Laboratory</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Catania</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Klagenfurt</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>University of Oslo</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff6">
          <label>6</label>
          <institution>Vestre Viken Hospital Trust</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>13</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>The Multimedia for Medicine Medico Task, running for the first time as part of MediaEval 2017, focuses on detecting abnormalities, diseases and anatomical landmarks in images captured by medical devices in the gastrointestinal tract. The task characteristics are described, including the use case and its challenges, the dataset with ground truth, the required participant runs and the evaluation metrics.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        The Medico task tackles the challenge of predicting diseases based
on multimedia data collected in hospitals with the additional
requirements to use as little training data as possible, perform the
analysis eficient regarding processing time and to generate
automatic text reports (summaries) of the findings. The task difers from
well know medical imaging tasks like the ImageClef medical tasks
(http://www.imageclef.org/) [
        <xref ref-type="bibr" rid="ref1 ref7">1, 7</xref>
        ] in the points that it (i) has only
multimedia data (videos and images) and no medical imaging data
(CT scans, etc.), (ii) asks for using as little training data as possible
and (iii) evaluates the approaches also regarding processing time.
Furthermore, the automatic generated reports are a novel part of
the task, but since it is very hard to evaluate them this subtask is
experimental this year.
      </p>
      <p>
        It is a typical assumption that visual analysis as it is already
provided by the computer vision and medical image processing
communities today is suficient to solve healthcare multimedia
challenges [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Although we concede that these methods are indeed
essential contributors to promising approaches, we have come to
the understanding that analysing images and videos alone does
not solve the challenges in medical fields such as endoscopy or
ultrasound, because of the task complexity and the needs of both
medical experts and patients. Neither does it make serious use of
the multitude of additional information sources including sensors
and temporal information [
        <xref ref-type="bibr" rid="ref3 ref8 ref9">3, 8, 9</xref>
        ].
      </p>
      <p>
        The Medico task is designed to help to improve the health care
system through application of multimedia research knowledge
and methods to reach the next level of computer and
multimediaassisted diagnosis, detection and interpretation of abnormalities.
This is useful in multiple scenarios. For example, in some areas of
the human body, such as the gastrointestinal (GI) tract, the
detection of abnormalities and diseases in early stages can significantly
improve the chance of successful treatment and survival. Through
endoscopic examinations (insertion of a camera in the
gastrointestinal tract), diseases can be detected visually, even before they become
symptomatic. This is particularly the case for colorectal cancer (in
the large bowel) or its cancer precursors (colorectal polyps), which
can be detected through colonoscopy or capsule endoscopy. The
challenge is, however, that both medical experts and machines
currently fail to detect all polyps [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Moreover, in previous research in
this area, computer vision and medical imaging have created visual
augmentations of the interior of a body. To automatically detect
and locate abnormalities, visual representations are not suficient.
There is a need for image and video processing, analysis,
information search and retrieval, in combination with other sensor data and
assistance from medical experts, and it all needs integration [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Here, participants are asked to look beyond computer vision
and medical imaging to show the potential of multimedia research
going far beyond well known scenarios like analysis of content on
YouTube and Flickr. For this detection task, we provide Kvasir, a
large public dataset [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] containing videos and images from the GI
tract showing diferent diseases and anatomical landmarks. The
ground truth is provided by medical experts (specialists in GI
endoscopy) annotating the dataset, and the data is split into training
and test data. Based on this, the participants are asked to solve
four subtasks, i.e., the two first are mandatory, and the two last are
optional: (i) classify diseases with as few images in the training
dataset as possible; (ii) solve the classification problem in a fast
and eficient way; (iii) run the second task on the same hardware
(supported platforms are Linux, macOS and Windows); and (iv)
automatically create a text-report for a medical doctor for three video
cases. Tackling the task can be addressed by leveraging techniques
from multiple multimedia-related disciplines, including (but not
limited to) machine learning (classification), multimedia content
analysis and multimodal fusion.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>DATASET DETAILS</title>
      <p>
        The Kvasir dataset1 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] consists of 8,000 GI tract images that are
annotated and verified by medical doctors (experienced endoscopists)
for the ground truth. It includes 8 classes showing anatomical
landmarks, pathological findings or endoscopic procedures in the GI
tract, i.e., 1000 images for each class, split into 500 for training and
500 for testing. The anatomical landmarks are Z-line, pylorus and
cecum, while the pathological findings include esophagitis, polyps
and ulcerative colitis. In addition, we provide two set of images
related to removal of polyps, the dyed and lifted polyp and the dyed
resection margins. The dataset consists of images with diferent
resolutions from 720x576 up to 1920x1072 pixels and is organized
by sorting them into separate folders named according to the
content. Some of the included images have a green sub-picture in the
image illustrating the position and configuration of the endoscope
inside the bowel, delivered from an electromagnetic imaging system
(ScopeGuide, Olympus Europe). This sub-picture may support the
interpretation of the image. As mentioned before, the whole dataset
is split into two equally sized development and test datasets. Both
the development and the test datasets consist of 4,000 images, 500
images for each class stored in two archives: images archive and
features archive. In the development dataset, the images are stored in
the separate folders named according to the name of the classes that
images belong to. In the test dataset, all the images stored in one
folder. The image files are encoded using JPEG compression. The
encoding settings can vary across the dataset, and they reflect the a
priori unknown endoscopic equipment settings. Furthermore, the
features archive contains the extracted visual feature descriptors
for all the images in the images archive. The extracted visual
features are the global image features, i.e., JCD, Tamura, ColorLayout,
EdgeHistogram, AutoColorCorrelogram and PHOG. Each feature
vector consists of a number of floating point values. The size of
the vector depends on the feature. The sizes of the feature vectors
are: 168 (JCD), 18 (Tamura), 33 (ColorLayout), 80 (EdgeHistogram),
256 (AutoColorCorrelogram) and 630 (PHOG) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The extracted
visual features are stored in the separate folders and text files named
according to the name and the path of the corresponding image
ifles. Each file consists of six lines, one line per feature, and a line
consists of a feature name separated from the feature vector by
colon. Each feature vector consists of a corresponding number of
lfoating point values separated by commas. The extension of the
extracted visual feature files is ".features".
      </p>
      <p>For the automatic report generation, we use three videos
depicting diseases or findings that can be found in the Kvasir dataset. The
goal is to generate reports describing the three videos for medical
experts having an automatic report generation in mind.
3</p>
    </sec>
    <sec id="sec-3">
      <title>EVALUATION METRICS AND TASKS</title>
      <p>For the evaluation of detection accuracy, we use several standard
metrics (more detailed descriptions on the task web-page). True
positive represents the number of correctly identified samples. True
negative shows the number of correctly identified negative samples.
False positive is the number of wrongly identified samples. False
negative denotes the number of wrongly identified negative samples.
Recall (frequently called sensitivity) is the ratio of samples that are
correctly identified as positive among all existing positive samples.
Precision shows the ratio of samples that are correctly identified
as positive among the returned samples. Specificity represents the
ratio of negatives that are correctly identified as negatives.
Accuracy is the percentage of correctly identified true and false samples.
Matthews correlation coeficient (MCC) takes into account true and
false positives and negatives, and is a balanced measure even if the
classes are of very diferent sizes. F1 score is a measure of a test’s
accuracy by calculating the harmonic mean of the precision and recall.
We also evaluate the amount of training data that has been used to
achieve good results and the speed (processing performance) of the
classification. For the evaluation, the participants must submit one
run for each of the required subtasks defined below. Additionally,
they optionally can submit three more for any of the described
subtasks, i.e., participants can submit up to five runs in total.
Required subtasks. The detection subtask is a task for
multiclass classification of diseases in the GI tract. Participants have
to use visual information in the provided dataset where the goal
is to maximize the algorithm’s performance in terms of detection
accuracy, where amount of training data is also taken into account.
Detection is evaluated based on the metrics above (all should be
reported), but a ranking is made using MCC and the amount of
used training data. The oficial metric is a multi-class generalization
of the MCC. This generalization is called the RK statistic (for K
diferent classes) and defined in terms of a K × K confusion matrix.
The RK statistic is in essence a correlation coeficient between the
observed and predicted binary classifications for (for K diferent
classes); it returns a value between −1 and +1. A coeficient of +1
represents a perfect prediction, 0 corresponds to no better than
random prediction and a value &lt; 0 indicates disagreement between
prediction and observation (the lower negative value corresponds
to the stronger disagreement). The minimum negative value of the
RK statistic is between −1 and 0 depending on the true distribution.
The maximum value is always +1.</p>
      <p>The eficient detection subtask addresses the speed of the
classification. The classification of diseases has to be achieved as fast
as possible in terms of data processing using any computation
speed-up techniques. The goal is to find a balance between the
algorithm’s performance in terms of detection accuracy and the
performance in terms of data processing speed, while keeping in
mind that the problem area requires real-time processing while
lacking data. For the evaluation, the processing time weighted by
the detection accuracy.</p>
      <p>Optional subtasks. The eficient detection on same hardware
subtask is the same as the eficient detection subtask above, but all
submitted solutions are tested on the same hardware. The
organizers run the code provided by the participants on the same hardware,
and the evaluation is again based on the processing time weighted
by detection accuracy is used.</p>
      <p>The experimental report generation subtask asks the participants
to automatically create a text-report for a medical doctor describing
the detection results for three video cases. A definition of what
a text report is, what it should contain (list of requirements) and
a description of what the medical experts do with the report is
provided. The assessment then follows the list of requirements, and
the report is assessed manually from two of our medical partners in
terms of usefulness in the medical context and if it satisfies existing
demands for documentation of endoscopic procedures.
4</p>
    </sec>
    <sec id="sec-4">
      <title>DISCUSSION AND OUTLOOK</title>
      <p>The task itself can be seen as very challenging, hard to solve and
hard to evaluate. Due to its novel use case, we hope to motivate a
lot of researchers to have a look into the field of medical
multimedia. Performing research that can have societal impact will be an
important part of multimedia research in the future. We hope that
the Medico task can help to raise awareness of the topic but also
provide an interesting and meaningful use case to researchers.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Bogdan</given-names>
            <surname>Ionescu</surname>
          </string-name>
          , Henning Müller, Mauricio Villegas, Helbert Arenas, Giulia Boato,
          <string-name>
            <surname>Duc-Tien</surname>
            Dang-Nguyen, Yashin Dicente Cid, Carsten Eickhof, Alba Garcia Seco de Herrera, Cathal Gurrin, Bayzidul Islam, Vassili Kovalev, Vitali Liauchuk, Josiane Mothe, Luca Piras,
            <given-names>Michael</given-names>
          </string-name>
          <string-name>
            <surname>Riegler</surname>
            , and
            <given-names>Immanuel</given-names>
          </string-name>
          <string-name>
            <surname>Schwall</surname>
          </string-name>
          .
          <year>2017</year>
          . Overview of ImageCLEF 2017:
          <article-title>Information extraction from images</article-title>
          .
          <source>In Experimental IR Meets Multilinguality, Multimodality, and Interaction 8th International Conference of the CLEF Association, CLEF 2017 (LNCS 10439)</source>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Mathias</given-names>
            <surname>Lux and Savvas A Chatzichristofis</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Lire: lucene image retrieval: an extensible java cbir library</article-title>
          .
          <source>In Proceedings of the 16th ACM international conference on Multimedia. ACM</source>
          ,
          <volume>1085</volume>
          -
          <fpage>1088</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Konstantin</given-names>
            <surname>Pogorelov</surname>
          </string-name>
          , Sigrun Losada Eskeland, Thomas de Lange, Carsten Griwodz, Kristin Ranheim Randel, Håkon Kvale Stensland,
          <string-name>
            <surname>Duc-Tien</surname>
            Dang-Nguyen, Concetto Spampinato, Dag Johansen,
            <given-names>Michael</given-names>
          </string-name>
          <string-name>
            <surname>Riegler</surname>
          </string-name>
          , and others.
          <year>2017</year>
          .
          <article-title>A holistic multimedia system for gastrointestinal tract disease detection</article-title>
          .
          <source>In Proceedings of the 8th ACM on Multimedia Systems Conference (MMSYS)</source>
          .
          <source>ACM</source>
          ,
          <volume>112</volume>
          -
          <fpage>123</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Konstantin</given-names>
            <surname>Pogorelov</surname>
          </string-name>
          , Kristin Ranheim Randel, Carsten Griwodz, Sigrun Losada Eskeland, Thomas de Lange, Dag Johansen, Concetto Spampinato,
          <string-name>
            <surname>Duc-Tien</surname>
            Dang-Nguyen, Mathias Lux, Peter Thelin Schmidt,
            <given-names>Michael</given-names>
          </string-name>
          <string-name>
            <surname>Riegler</surname>
            , and
            <given-names>Pål</given-names>
          </string-name>
          <string-name>
            <surname>Halvorsen</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Kvasir: A MultiClass Image Dataset for Computer Aided Gastrointestinal Disease Detection</article-title>
          .
          <source>In Proceedings of the 8th ACM on Multimedia Systems Conference (MMSYS)</source>
          .
          <source>ACM</source>
          ,
          <volume>164</volume>
          -
          <fpage>169</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Konstantin</given-names>
            <surname>Pogorelov</surname>
          </string-name>
          , Michael Riegler, Pål Halvorsen, Peter Thelin Schmidt, Carsten Griwodz, Dag Johansen, Sigrun Losada Eskeland, and Thomas de Lange.
          <year>2016</year>
          .
          <article-title>GPU-accelerated real-time gastrointestinal diseases detection</article-title>
          .
          <source>In Proceeding of the IEEE International Symposium onComputer-Based Medical Systems (CBMS)</source>
          . IEEE,
          <fpage>185</fpage>
          -
          <lpage>190</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Riegler</surname>
          </string-name>
          , Mathias Lux, Carsten Gridwodz, Concetto Spampinato, Thomas de Lange, Sigrun L Eskeland, Konstantin Pogorelov, Wallapak Tavanapong,
          <string-name>
            <surname>Peter T Schmidt</surname>
            , Cathal Gurrin, Dag Johansen, Håvard Johansen, and
            <given-names>Pål</given-names>
          </string-name>
          <string-name>
            <surname>Halvorsen</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Multimedia and Medicine: Teammates for better disease detection and survival</article-title>
          .
          <source>In Proceedings of the 2016 ACM Multimedia Conference (ACM MM). ACM</source>
          ,
          <volume>968</volume>
          -
          <fpage>977</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Mauricio</given-names>
            <surname>Villegas</surname>
          </string-name>
          , Henning Müller, Alba García Seco de Herrera, Roger Schaer, Stefano Bromuri, Andrew Gilbert, Luca Piras, Josiah Wang, Fei Yan, Arnau Ramisa, and others. 2016.
          <article-title>General overview of imageCLEF at the CLEF 2016 labs</article-title>
          .
          <source>In Procedings of the International Conference of the Cross-Language Evaluation Forum for European Languages (LNCS 9822)</source>
          . Springer,
          <fpage>267</fpage>
          -
          <lpage>285</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Yi</given-names>
            <surname>Wang</surname>
          </string-name>
          , Wallapak Tavanapong, Johnny Wong, JungHwan Oh, and
          <string-name>
            <surname>Piet C De Groen</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Computer-aided detection of retroflexion in colonoscopy</article-title>
          .
          <source>In Proceeding of the 24th International Symposium on Computer-Based Medical Systems (CBMS)</source>
          .
          <source>IEEE</source>
          , 1-
          <fpage>6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Yi</given-names>
            <surname>Wang</surname>
          </string-name>
          , Wallapak Tavanapong, Johnny Wong, Jung Hwan Oh, and
          <string-name>
            <surname>Piet C De Groen</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Polyp-alert: Near real-time feedback during colonoscopy</article-title>
          .
          <source>Computer methods and programs in biomedicine 120</source>
          ,
          <issue>3</issue>
          (
          <year>2015</year>
          ),
          <fpage>164</fpage>
          -
          <lpage>179</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>