<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MULTI-CLASS ARTEFACT DETECTION IN VIDEO ENDOSCOPY VIA CONVOLUTION NEURAL NETWORKS Mohammad Azam Khan, Jaegul Choo</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Korea University Department of Computer Science and Engineering Seoul</institution>
          ,
          <country country="KR">South Korea</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes our approach for EAD2019: Multi-class artefact detection in video endoscopy. We optimized focal loss for dense object detection based RetinaNet network pretrained with the ImageNet dataset and applied several data augmentation and hyperparmeter tuning strategies, obtaining a weighted final score of 0.2880 for multi-class artefact detection task and mean average precision (mAP) score of 0.2187 with deviation 0.0770 for multi-class artefact generalisation task. In addition, we developed a U-Net based convolutional neural networks (CNNs) for multi-class artefact region segmentation task and achieved a final score of 0.4320 for the online test set in the competition.</p>
      </abstract>
      <kwd-group>
        <kwd>Endoscopic artefact</kwd>
        <kwd>Video endoscopy</kwd>
        <kwd>artefact generalization</kwd>
        <kwd>Convolutional neural networks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Endoscopic Artefact Detection (EAD) [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] is a core
challenge in facilitating diagnosis and treatment of diseases in
hollow organs. This Challenge highlights the growing
application of artificial intelligence (AI) in general, and specific
application of deep learning (DL) techniques for the early
detection of numerous cancers, therapeutic procedures and
minimally invasive surgery. In this concern, the organizers mainly
focused on three sub-tasks for this challenge using the EAD
dataset [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]: multi-class artefact detection, region
segmentation and detection generalization.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. OUR APPROACH</title>
      <p>
        For multi-class artefact detection and generalisation tasks, our
solution is based on keras-retinanet [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] which is basically an
implementation of a popular dense object detection method
called RetinaNet [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] using open-source framework Keras [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
with Tensorflow1 back-end. The RetinaNet is a single-stage
convolutional neural network detection architecture, which
was really appealing to us for its training simplicity.
Overall detection pipeline for two tasks is shown in Figure 1.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2.1. Multi-class artefact detection and generalisation</title>
      <p>For multi-class artefact detection task, first of all, we
preprocessed the dataset (by resizing the images into 768 1024
pixels), and applied several standard data augmentation
techniques including rotation, translation, scaling, and horizontal
flipping. We optimized the network with resnet-101
backbone that were pretrained on ImageNet images. Later, we
used non-maximum suppression to eliminate some
overlapping bounding boxes from predicted bounding boxes as a
post-processing step.</p>
      <p>In this challenge, the third task was multiclass-artefact
generaliation task. Sometimes it is crucial for algorithms to
avoid biases induced by specific training dataset. Hence, to
be aligned with the organizers’ motivation, we tried to
optimize the network that we used for artefact detection task
above. Our main intuition was to develop more generalized
model so that the model can be used across different
endoscopic datasets.</p>
      <sec id="sec-3-1">
        <title>Dataset Test (Online)</title>
      </sec>
      <sec id="sec-3-2">
        <title>Dataset Validation Online</title>
      </sec>
      <sec id="sec-3-3">
        <title>Score</title>
        <p>0:4926
0:2880</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>2.2. Multi-class artefact region segmentation</title>
      <p>
        The second task of the challenge was multi-class artefact
region segmentation. We used an encoder-decoder architecture
called U-Net that is designed for biomedical image
segmentation [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The encoder path identifies the contents of the image
while the decoder part localize where the contents are
available. More importantly, in a U-Net, the output is an image
with the same dimension of the input, but with one channel.
Unfortunately, we were not able to make extensive
experiments for this task.
      </p>
    </sec>
    <sec id="sec-5">
      <title>3. RESULTS</title>
      <p>Model performance of multi-class artefact detection task is
shown in Table 2. Table 3 shows the overall performance of
multi-class artefact generalization task. As explained in
Section 2.2, with the limited experiments, our model performance
is shown in Table 1 on the final test set of region segmentation
task.</p>
    </sec>
    <sec id="sec-6">
      <title>4. DISCUSSION</title>
      <p>In the beginning, when we had phase 1 dataset, we tried to
develop our model using 3-fold cross validation. Our models
relatively worked as well. Later, when dataset 2 had been
released, we incorporated these additional data in our models
using 5-fold cross validation. However, our model perform a
bit worst. After carefully analyzing, we found that the dataset
provided in the second phase is more diverse than the first
dataset. We were not able to manage this diversity somehow.</p>
      <p>Overall, we noticed a significant gap between our local
validation score and the leader board score. Then we
reviewed the annotation process more carefully. We found that
some cases a bit unusual in the training dataset having more
than one classes for almost same bounding boxes. It was
understandable why some bounding boxes were overlapped for
different class artefacts. However, the situation was not the
same for all the bounding boxes of the different/same class(s).
An example case is shown in figure 2 (overlapping bounding
boxes are marked with circle in yellow color).</p>
      <p>The competition was an exciting and educational
experience to solve a problem in real-life settings. We thank the
organizers for all their hard work for organizing and
annotating the datasets for the competition; large medical image data
sets of sufficient size and quality for this purpose are rare.</p>
    </sec>
    <sec id="sec-7">
      <title>5. CONCLUSION</title>
      <p>
        Motivated by the no new-net [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], we wanted to demonstrate
the effectiveness of well trained state-of-the-art networks in
the context of three different tasks of EAD 2019 challenge.
While most of the researchers are currently besting each other
with minor modification of exiting networks, we instead
focused on the training process. The detection of specific
artefacts and then precise boundary delineation of detected
artefacts, and finally detection generalization of independent of
specific data type and source - all would mark critical steps
forward for this domain.
      </p>
    </sec>
    <sec id="sec-8">
      <title>6. REFERENCES</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Sharib</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <surname>Felix Zhou</surname>
          </string-name>
          , Christian Daul, Barbara Braden, Adam Bailey, Stefano Realdon, James East, Georges Wagnires, Victor Loschenov, Enrico Grisan, Walter Blondel, and Jens Rittscher, “
          <article-title>Endoscopy artifact detection (EAD 2019) challenge dataset</article-title>
          ,”
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Sharib</given-names>
            <surname>Ali</surname>
          </string-name>
          , Felix Zhou, Adam Bailey, Barbara Braden, James East, Xin Lu, and Jens Rittscher, “
          <article-title>A deep learning framework for quality assessment and restoration in video endoscopy,” CoRR</article-title>
          , vol. abs/
          <year>1904</year>
          .07073,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Hans</given-names>
            <surname>Gaiser</surname>
          </string-name>
          et al.,
          <source>“Fizyr/keras-retinanet: 0.5</source>
          .0,”
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Tsung-Yi</surname>
            <given-names>Lin</given-names>
          </string-name>
          , Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar, “
          <article-title>Focal loss for dense object detection</article-title>
          ,” in
          <source>2017 IEEE International Conference on Computer Vision</source>
          (ICCV).
          <source>oct</source>
          <year>2017</year>
          , IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>[5] Franc¸ois Chollet et al., https://github.com/fchollet/keras, 2015, GitHub.</mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Olaf</given-names>
            <surname>Ronneberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Philipp</given-names>
            <surname>Fischer</surname>
          </string-name>
          , and Thomas Brox, “
          <article-title>U-net: Convolutional networks for biomedical image segmentation</article-title>
          ,
          <source>” in Lecture Notes in Computer Science</source>
          , pp.
          <fpage>234</fpage>
          -
          <lpage>241</lpage>
          . Springer International Publishing,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Fabian</given-names>
            <surname>Isensee</surname>
          </string-name>
          , Philipp Kickingereder, Wolfgang Wick,
          <string-name>
            <given-names>Martin</given-names>
            <surname>Bendszus</surname>
          </string-name>
          , and
          <string-name>
            <surname>Klaus</surname>
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Maier-Hein</surname>
          </string-name>
          , “No newnet,” in Brainlesion: Glioma, Multiple Sclerosis,
          <source>Stroke and Traumatic Brain Injuries</source>
          , pp.
          <fpage>234</fpage>
          -
          <lpage>244</lpage>
          . Springer International Publishing,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>