<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Transfer learning with prioritized classification and training dataset equalization for medical objects detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Olga Ostroukhova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Konstantin Pogorelov</string-name>
          <email>konstantin@simula.no</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Riegler</string-name>
          <email>michael@simula.no</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Duc-Tien Dang-Nguyen</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pål Halvorsen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Research Institute of Multiprocessor Computation Systems n.a. A.V.</institution>
          <addr-line>Kalyaev</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Simula Metropolitan Center for Digital Engineering</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Simula Research Laboratory</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Bergen</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Oslo</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>29</fpage>
      <lpage>31</lpage>
      <abstract>
        <p>This paper presents the method proposed by the organizer team (SIMULA) for MediaEval 2018 Multimedia for Medicine: the Medico Task. We utilized the recent transfer-learning-based image classification methodology and focused on how easy it is to implement multi-class image classifiers in general and how to improve the classification performance without deep neural network model redesign. The goal for this was both to provide a baseline for the Medico task and to show the performance of out-of-the-box classiifers for the medical use-case scenario.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        This paper provides a detailed description of the methods proposed
by team SIMULA for MediaEval 2018 Multimedia for Medicine
Medico Task [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The main goal of the task is to perform
medical image classification. The use case scenario is gastrointestinal
endoscopies. The 2018-year version of the task is designed as an
sixteen classes classification problem. Compared to the 2017-year
version which was limited to eight classes [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], the current version
of the task comes with several additional challenges such as an
imbalanced number of samples in the classes to make it more
realistic [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. In the previous year of the task, participants proposed
diferent methods ranging from simple handcrafted features to deep
neural networks [
        <xref ref-type="bibr" rid="ref10 ref12 ref3 ref4 ref5 ref6">3–6, 10, 12</xref>
        ]. For our approach, we propose a
convolutional neural network approach (CNN) in combination with
transfer learning. To compensate for the imbalanced dataset, we
perform prioritized classification and dataset equalization.
      </p>
    </sec>
    <sec id="sec-2">
      <title>PROPOSED APPROACH</title>
      <p>As the organizer’s team for the Medico task, our aim is not
achieving the best possible classification performance. Instead, we decided
to check how low is the entry threshold to the medical images
classification and corresponding lesion detection challenge. To achieve
this, and also to provide a baseline for the competing teams, we
involved the recent transfer-learning-based image classification
methodology and checked how well we are able to (i) easily
implement multi-class image classifier and (ii) improve the classification
performance without deep neural network model redesign.</p>
      <p>
        Thus, for the basic classification algorithm, we used a CNN
architecture and a transfer learning-based classifier, which has been
previously introduced for the medical images classification in our
previous work [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This approach is based on the Inception v3
architecture [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. To achieve the highest possible performance on the
provided limited development set, we used the model pre-trained
on the ImageNet dataset [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We performed the model retraining
using the method described in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We kept all the basic
convolutional layers of the network and only retrained the two top fully
connected (FC) layers after random initialization of their weights.
The FC layers were retrained using the RMSprop [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] optimizer
which allows an adaptive learning rate during the training process.
We did not used any additional enhancing or pre-processing for the
images provided in the datasets. In order to increase the number
of training samples, we performed various augmentation
operations on the images in the training set. Specifically, we performed
horizontal and vertical flipping and a change of brightness in the
interval of ±20%.
      </p>
      <p>The initial experimental studies showed that the pre-trained
Inception v3 model is able to eficiently extract high-level features
from the given medical images, and it is converge quickly during
the retraining process with suficient resulting classification
performance (see section 3). However, due to a heavily imbalanced
training dataset and despite the used training data augmentation,
the detection performance of some classes was not good enough.
To solve this issue, we implemented an additional training dataset
balancing procedure that performs equalization of the training set
by the random duplication of the training samples for the
underiflled classes, like instruments, blurry, etc. This nearly doubled the
number of the training samples allowing for better classification
performance for the classes with a low number of images provided.</p>
      <p>
        An additional classifier output post-processing step was
implemented in order to address the diferent importance of the diferent
classes as it was stated in the task dataset description [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
Specifically, we performed the prioritized selection of the resulting output
class for each image based of the model’s probability output. This
was implemented as the selection of the first class with the
detection probability higher than a set threshold from the array of classes
sorted in order of their importance.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>RESULTS AND ANALYSIS</title>
      <p>For the oficial task submission creation, two separate models were
used, trained on the diferent datasets. The first model was trained
on the training set created from the development set using the
described (see section 2) data augmentation procedure. The trained
model was used to process the task’s test set, and the classification
output was post-processed using the prioritized classification
selector with four diferent probability threshold settings from 0.75 to
0.1 resulting in the runs #2 - #5. For the run #1, we used the max
probability selector without class prioritization. The results using
the first model were submitted as the speed runs. The second model
was trained using the equalized training set, and the same rules for
the five runs generation were submitted as the detection run.</p>
      <p>The oficial evaluation results for all the runs are shown in table 1.
As one can see, all the runs significantly outperform the ZeroR and
Random baselines and show good classification performance. All
the runs that utilize the equalized training set have slightly better
classification performance. Surprisingly, the introduced prioritized
classification method did not result in improved detection
performance, not for the original nor for the equalized training sets. With
the threshold of 0.75, the classification performance is equal to the
non-prioritized runs. It means that the trained classifier is
performing as well as it can, and additional re-classification using the class
priorities does not make sense for this particular dataset. However,
it still can be potentially interesting for bigger datasets or a higher
number of classes. The best performing run was the detection run
#1 generated using the equalized training set and non-prioritized
classifier with the classification performance of 0.854 for Rk
statistic (MCC for k diferent classes). The confusion matrix for this run
is depicted in table 2, and the class imbalance and corresponding
training and classification challenges can be easily observed. The
most challenging class was Instruments that is mostly caused by
the diferent shapes, positions and visibilities of the instruments in
the images. There also was a number of miss-classification cases
for the Dyed classes as well as for Esophagitis and Normal Z-line
classes.</p>
      <p>With respect to the classification performance in terms of
processing speed, the proposed classified can process approximately
43 frames per second on a GPU-enabled consumer-grade personal
computer regardless of the enabled or disabled post-processing
classes prioritization.
4</p>
    </sec>
    <sec id="sec-4">
      <title>CONCLUSIONS AND FUTURE WORK</title>
      <p>In this paper, we presented an out-of-the-box solution utilizing a
modern pre-trained CNN for the task of medical image
classification. The goal was to provide a baseline for the task and to show
the performance of basic methods without any deep architecture
modification. The best achieved performance measured as Matthew
correlation coeficient for k diferent classes of 0.854 and a speed
of 43 frames per second. This is already a quite good result for an
out-of-the-box method.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Jia</given-names>
            <surname>Deng</surname>
          </string-name>
          , Wei Dong, Richard Socher,
          <string-name>
            <surname>Li-Jia</surname>
            <given-names>Li</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kai</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <surname>Li</surname>
          </string-name>
          Fei-Fei.
          <year>2009</year>
          .
          <article-title>Imagenet: A large-scale hierarchical image database</article-title>
          .
          <source>In Computer Vision and Pattern Recognition</source>
          ,
          <year>2009</year>
          .
          <article-title>CVPR 2009</article-title>
          . IEEE Conference on. Ieee,
          <volume>248</volume>
          -
          <fpage>255</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Jef</given-names>
            <surname>Donahue</surname>
          </string-name>
          , Yangqing Jia, Oriol Vinyals, Judy Hofman, Ning Zhang, Eric Tzeng, and
          <string-name>
            <given-names>Trevor</given-names>
            <surname>Darrell</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition.</article-title>
          .
          <source>In Proc. of ICML</source>
          , Vol.
          <volume>32</volume>
          .
          <fpage>647</fpage>
          -
          <lpage>655</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Yang</surname>
            <given-names>Liu</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhonglei Gu</surname>
          </string-name>
          , and William K Cheung.
          <year>2017</year>
          . HKBU at MediaEval 2017 Medico:
          <article-title>Medical multimedia task</article-title>
          .
          <source>In Working Notes Proceedings of the MediaEval 2017 Workshop (MediaEval</source>
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Syed</given-names>
            <surname>Sadiq Ali Naqvi</surname>
          </string-name>
          , Shees Nadeem, Muhammad Zaid, and Muhammad Atif Tahir.
          <year>2017</year>
          .
          <article-title>Ensemble of Texture Features for Finding Abnormalities in the Gastro-Intestinal Tract</article-title>
          .
          <source>Working Notes Proceedings of the MediaEval 2017 Workshop (MediaEval</source>
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Stefan</given-names>
            <surname>Petscharnig</surname>
          </string-name>
          and
          <string-name>
            <given-names>Klaus</given-names>
            <surname>Schöfmann</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Learning laparoscopic video shot classification for gynecological surgery</article-title>
          .
          <source>An International Journal of Multimedia Tools and Applications</source>
          <volume>77</volume>
          ,
          <issue>7</issue>
          (
          <year>2018</year>
          ),
          <fpage>8061</fpage>
          -
          <lpage>8079</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Stefan</given-names>
            <surname>Petscharnig</surname>
          </string-name>
          , Klaus Schöfmann, and
          <string-name>
            <given-names>Mathias</given-names>
            <surname>Lux</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>An Inception-like CNN Architecture for GI Disease and Anatomical Landmark Classification</article-title>
          .
          <source>In Working Notes Proceedings of the MediaEval 2017 Workshop (MediaEval</source>
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Konstantin</given-names>
            <surname>Pogorelov</surname>
          </string-name>
          , Sigrun Losada Eskeland, Thomas de Lange, Carsten Griwodz, Kristin Ranheim Randel, Håkon Kvale Stensland,
          <string-name>
            <surname>Duc-Tien</surname>
            Dang-Nguyen, Concetto Spampinato, Dag Johansen,
            <given-names>Michael</given-names>
          </string-name>
          <string-name>
            <surname>Riegler</surname>
          </string-name>
          , and others.
          <year>2017</year>
          .
          <article-title>A holistic multimedia system for gastrointestinal tract disease detection</article-title>
          .
          <source>In Proceedings of the 8th ACM on Multimedia Systems Conference. ACM</source>
          ,
          <volume>112</volume>
          -
          <fpage>123</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Konstantin</given-names>
            <surname>Pogorelov</surname>
          </string-name>
          , Kristin Ranheim Randel, Thomas de Lange, Sigrun Losada Eskeland, Carsten Griwodz, Dag Johansen, Concetto Spampinato, Mario Taschwer, Mathias Lux, Peter Thelin Schmidt,
          <string-name>
            <given-names>Michael</given-names>
            <surname>Riegler</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Pål</given-names>
            <surname>Halvorsen</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Nerthus: A Bowel Preparation Quality Video Dataset</article-title>
          .
          <source>In Proceedings of the 8th ACM on Multimedia Systems Conference (MMSYS)</source>
          .
          <source>ACM</source>
          ,
          <volume>170</volume>
          -
          <fpage>174</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Konstantin</given-names>
            <surname>Pogorelov</surname>
          </string-name>
          , Kristin Ranheim Randel, Carsten Griwodz, Sigrun Losada Eskeland, Thomas de Lange, Dag Johansen, Concetto Spampinato,
          <string-name>
            <surname>Duc-Tien</surname>
          </string-name>
          Dang-Nguyen, Mathias Lux, Peter Thelin Schmidt, and others.
          <source>2017</source>
          .
          <article-title>Kvasir: A multi-class image dataset for computer aided gastrointestinal disease detection</article-title>
          .
          <source>In Proceedings of the 8th ACM on Multimedia Systems Conference (MMSYS)</source>
          .
          <source>ACM</source>
          ,
          <volume>164</volume>
          -
          <fpage>169</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Konstantin</surname>
            <given-names>Pogorelov</given-names>
          </string-name>
          , Michael Riegler, Pål Halvorsen, Carsten Griwodz, Thomas de Lange, Kristin Ranheim Randel, Sigrun Eskeland,
          <string-name>
            <surname>Duc-Tien</surname>
            Dang-Nguyen,
            <given-names>Olga</given-names>
          </string-name>
          <string-name>
            <surname>Ostroukhova</surname>
          </string-name>
          , and others.
          <year>2017</year>
          .
          <article-title>A comparison of deep learning with global features for gastrointestinal disease detection</article-title>
          .
          <source>In Working Notes Proceedings of the MediaEval 2017 Workshop (MediaEval</source>
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Konstantin</surname>
            <given-names>Pogorelov</given-names>
          </string-name>
          , Michael Riegler, Pål Halvorsen, Thomas De Lange, Kristin Ranheim Randel,
          <string-name>
            <surname>Duc-Tien</surname>
            Dang-Nguyen,
            <given-names>Mathias</given-names>
          </string-name>
          <string-name>
            <surname>Lux</surname>
            , and
            <given-names>Olga</given-names>
          </string-name>
          <string-name>
            <surname>Ostroukhova</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Medico Multimedia Task at MediaEval 2018</article-title>
          .
          <source>In Working Notes Proceedings of the MediaEval 2018 Workshop.</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Riegler</surname>
          </string-name>
          , Konstantin Pogorelov, Pål Halvorsen, Carsten Griwodz, Thomas Lange, Kristin Ranheim Randel, Sigrun Eskeland, Dang Nguyen, Duc Tien, Mathias Lux, and others.
          <source>2017</source>
          .
          <article-title>Multimedia for medicine: the medico Task at mediaEval 2017</article-title>
          .
          <source>In Working Notes Proceedings of the MediaEval 2017 Workshop (MediaEval</source>
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Christian</surname>
            <given-names>Szegedy</given-names>
          </string-name>
          , Vincent Vanhoucke, Sergey Iofe, Jonathon Shlens, and
          <string-name>
            <given-names>Zbigniew</given-names>
            <surname>Wojna</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Rethinking the inception architecture for computer vision</article-title>
          .
          <source>arXiv preprint arXiv:1512.00567</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Tijmen</given-names>
            <surname>Tieleman</surname>
          </string-name>
          and
          <string-name>
            <given-names>Geofrey</given-names>
            <surname>Hinton</surname>
          </string-name>
          .
          <source>2012. Lecture 6</source>
          .5
          <article-title>-rmsprop: Divide the gradient by a running average of its recent magnitude</article-title>
          .
          <source>COURSERA: Neural networks for machine learning 4</source>
          ,
          <issue>2</issue>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>