<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Transfer of Knowledge: Fine-tuning for Polyp Segmentation with Atention</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rabindra Khadka</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simulamet</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Norway rabindra.khadka@ymail.com</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>14</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>This paper describes how the transfer of prior knowledge can effectively take on segmentation tasks with the help of attention mechanisms. The UNet model pretrained on brain MRI dataset was fine-tuned with the polyp dataset. Attention mechanism was integrated to focus on relevant regions in the input images. The implemented architecture is evaluated on 200 validation images based on intersection over union and dice score between ground truth and predicted region. The model demonstrates a promising result with computational eficiency.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Early detection of polyps is vital to reduce colorectal cancer (CRC)
deaths as it is one of the most common types of cancer reported
in the world [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Colonoscopy is performed in order to detect
polyps in the gastrointestinal tract, which assists doctors to perform
timely intervention before they become malign. Acknowledging
the significance of having accurate segmentation techniques in a
clinical setting, MediaEval Challenge 2020 organized the automatic
polyp segmentation challenge to develop systems that eficiently
detect and segment colon polyps [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Machine learning algorithms like deep learning models for
semantic image segmentation have recently shown promising results
in medical setting [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ]. So, machine learning models can be used
to aid doctors while performing endoscopy that can bring
potential polyp segment into doctor’s attention, which could have been
missed or incorrectly passed. In this work, the knowledge transfer
approach was adapted using a pre-trained UNet guided with
attention mechanism [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The pre-trained network was coupled with
attention and then fine-tuned to achieve faster convergence and a
good validation score.
      </p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>
        Traditionally, a number of manual techniques to extract polyp
features such as color, shape, appearances have been used to train
a classifier to identify polyps from its background [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. With the
advent of deep learning models, polyp segmentation problem has
been approached by learning polyp and its mask. ResUNet++ [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
for polyp segmentation shows a promising result which is based on
deep residual UNet (ResUNet) structure. The other work includes
PraNet with complex architecture that takes into account the
relationship between polyp area and its boundary [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] which used
reverse attention mechanism to model the boundaries of the polyps.
Their work showed improved results on various data-sets.
      </p>
      <p>
        Attention mechanism has been commonly used in NLP domain.
There are two types of trainable attention mechanism namely Hard
attention [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and soft attention [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Self-attention mechanism also
have been proposed which eliminates the dependency on external
gating information and have shown better performance results.
Selfattention mechanism has been used with UNet for segmentation of
medical images(pancreas,abdominal) that showed promising state
of art results across diferent datasets [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
3
3.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>APPROACH</title>
    </sec>
    <sec id="sec-4">
      <title>Data</title>
      <p>
        Data used for this challenge was Kvasir-SEG [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] dataset which
consist of 1,000 polyp images and the respective ground truth. The
data set was divided into training and validation set in the ratio of
80:20. Test data set consisted of 160 images without ground truth.
3.2
      </p>
    </sec>
    <sec id="sec-5">
      <title>Data processing</title>
      <p>
        Images were normalized in the range of [
        <xref ref-type="bibr" rid="ref1">-1,1</xref>
        ] and resized to 256 ×
256. Augmentation techniques were also applied to the images
randomly before feeding them into the model; namely
horizontal flipping, vertical flipping, mirroring, rotation(-5 to 5), elastic
transformation, channel shifting and solar flares.
3.3
      </p>
    </sec>
    <sec id="sec-6">
      <title>Model Architecture</title>
      <p>
        UNet is known for its eficient performance on image segmentation
task[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. It provides the architecture with multi stage cascaded
convolutions neural networks which helps to extract the region of
interest and make dense prediction. Despite UNet having a good
representational power, it also redundantly uses compute resources
as it repeatedly extracts low level features. Therefore, for
overcoming this drawback of UNet, attention mechanism can be
integrated with the UNet architecture. This has led to improvement of
model’s sensitivity to region of interest and also suppresses features
response from irrelevant regions in the image. The soft additive
attention have shown better performance than the multiplicative
attention [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        In this work, the notion of knowledge transfer has been the
key motivation factor to choose a simple pre-trained model. The
additive soft attention mechanism was integrated with the
pretrained UNet architecture. The chosen pre-trained model has been
trained on brain MRI dataset [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The schema of the attention UNet
architecture can be seen Figure 1. The key benefit of this attention
UNet structure in compare to multi stage CNNs is that it does not
require training of multiple models to deal with object localization
and thus reduces number of model parameters. As seen in Figure 1,
additive attention is applied to obtain the gating coeficient  .
A compound loss was used during training which comprises of
both dice loss ( ) and binary cross entropy loss ( ) as
( =  +  + || ||22). 2 regularization was applied while
optimizing the loss.
      </p>
      <p>2 ∗ Í  ∗ 
 = 1 − Í 2 + Í 2 + 
 (, ˆ) = −(   (ˆ) + (1 − )(1 − ˆ))
where ,  and ˆ are the ground truth value and predicted value
respectively.
3.5</p>
    </sec>
    <sec id="sec-7">
      <title>Implementation details</title>
      <p>The model was implemented using pytorch. It was initially trained
on single GPU from google’s colab. Gradient updates were
performed with the batch size of 16. The initial learning rate of 1 − 4
was applied to Adam optimizer. The learning rate was monitored
and reduced when the validation score plateaued. 2 regularization
and early stopping methods were used to prevent over fitting. For
evaluation of the segmentation work, mean dice coeficient (DSC)
and mean intersection over union (IOU) were computed on the
validation set.
(1)
(2)</p>
      <p>The    model produced a good validation result but did
not generalized well as expected on unseen set of test data.
However, the model was able to converge in 50 epochs. It also gave
a very high   value in compare to  which suggests that
the adopted model can yield smooth real time results. There is
room for improving the model’s prediction by doing some
hyperparameter search and adopting regularization techniques like weight
averaging.
In this work, the fine tuning of pretrained UNet model with
attention mechanism has shown some promising results. This approach
removes the requirement of an external object localization model
and thus makes the model much simpler that yielded a high  
value. The fine tuning of the pretrained model helped to converge
faster without requirement of large number training examples. This
indicates the importance and power of transferring prior
knowledge from one domain to another. As reported in recent literature,
attention mechanism have been a crucial factor in enhancing
performance of various models. Lastly, we also note that this notion of
knowledge transfer has been adopted successfully by meta-learning
algorithms while learning from various tasks. This can serve well
while solving problems in medical setting where there exist scarcity
of labeled data and impact of diferent kinds of data shifts.
6</p>
    </sec>
    <sec id="sec-8">
      <title>ACKNOWLEDGMENT</title>
      <p>The research has benefited from the Experimental Infrastructure for
Exploration of Exascale Computing (eX3), which is financially
supported by the Research Council of Norway under contract 270053.</p>
      <p>1
1https://github.com/IamRabin/MediaevalChallenge2020</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bahdanau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          , Bengio, and
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <year>2014</year>
          .
          <article-title>Neural machine translation by jointly learning to align and translate</article-title>
          .
          <source>arXiv preprint arXiv:1409.0473</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Fan</given-names>
            <surname>Deng-Ping</surname>
          </string-name>
          , Ji Ge-Peng, Zhou Tao, Chen Geng, Fu Huazhu,
          <string-name>
            <given-names>Shen</given-names>
            <surname>Jianbing</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Shao</given-names>
            <surname>Ling</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>PraNet: Parallel Reverse Attention Network for Polyp Segmentation</article-title>
          .
          <source>IMICCAI</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Debesh</given-names>
            <surname>Jha</surname>
          </string-name>
          , Steven A.
          <string-name>
            <surname>Hicks</surname>
          </string-name>
          , Krister title=Medico Multimedia Task at MediaEval 2020:
          <article-title>Automatic Polyp Segmentation Emanuelsen</article-title>
          , Hå- vard
          <string-name>
            <given-names>D.</given-names>
            <surname>Johansen</surname>
          </string-name>
          , Dag Johansen, Thomas de Lange,
          <article-title>Michael A</article-title>
          .
          <string-name>
            <surname>Riegler</surname>
            , and
            <given-names>Pål</given-names>
          </string-name>
          <string-name>
            <surname>Halvorsen</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <source>In Proc. of MediaEval 2020 CEUR Workshop.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.H.</given-names>
            <surname>Smedsrud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.A.</given-names>
            <surname>Riegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Johansen</surname>
          </string-name>
          , T. De Lange,
          <string-name>
            <given-names>P.</given-names>
            <surname>Halvorsen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Johansen</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Resunet++: An advanced architecture for medical image segmentation</article-title>
          .
          <source>IEEE ISM 9</source>
          (
          <issue>2</issue>
          ) (
          <year>2019</year>
          ),
          <fpage>225</fpage>
          -
          <lpage>2255</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Debesh</given-names>
            <surname>Jha</surname>
          </string-name>
          ,
          <string-name>
            <surname>Pia H Smedsrud</surname>
          </string-name>
          ,
          <article-title>Michael A Riegler, Pål Halvorsen</article-title>
          , Thomas de Lange, Dag Johansen, and
          <string-name>
            <given-names>Håvard D</given-names>
            <surname>Johansen</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>KvasirSEG: A Segmented Polyp Dataset</article-title>
          .
          <source>In Proc. of International Conference on Multimedia Modeling (MMM)</source>
          .
          <volume>451</volume>
          -
          <fpage>462</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Mateuszbuda</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>U-NET FOR BRAIN MRI</article-title>
          . (
          <year>2015</year>
          ). https://pytorch. org/hub/mateuszbuda_brain
          <article-title>-segmentation-pytorch_unet/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Luong</given-names>
            <surname>Minh-Thang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Pham</given-names>
            <surname>Hieu</surname>
          </string-name>
          , and Manning Christopher
          <string-name>
            <surname>D.</surname>
          </string-name>
          <year>2015</year>
          .
          <article-title>Efective Approaches to Attention-based Neural Machine Translation</article-title>
          .
          <source>arXiv:1508.04025</source>
          (
          <year>2015</year>
          ). https://arxiv.org/abs/1508.04025
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>V.</given-names>
            <surname>Mnih</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Heess</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Graves</surname>
          </string-name>
          , and et al.
          <year>2014</year>
          .
          <article-title>Recurrent models of visual attention</article-title>
          .
          <source>Advances in neural information processing systems</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Ozan</given-names>
            <surname>Oktay</surname>
          </string-name>
          , Jo Schlemper, Loic Le Folgoc,
          <string-name>
            <given-names>Matthew</given-names>
            <surname>Lee</surname>
          </string-name>
          , Mattias Heinrich, Kazunari Misawa, Kensaku Mori,
          <string-name>
            <surname>Steven</surname>
            <given-names>McDonagh</given-names>
          </string-name>
          , Nils Y Hammerla, Bernhard Kainz, Ben Glocker, and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Rueckert</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Attention U-Net: Learning Where to Look for the Pancreas</article-title>
          . arXiv:
          <year>1804</year>
          .
          <volume>03999</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Fischer P. Brox Ronneberger</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>U-net:Convolutional networks for biomedical image segmentation</article-title>
          .
          <source>MICCAI</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Histace</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Romain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Dray</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Granado</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer</article-title>
          .
          <source>International Journal of Computer Assisted Radiology and Surgery</source>
          <volume>9</volume>
          (
          <issue>2</issue>
          ) (
          <year>2014</year>
          ),
          <fpage>283</fpage>
          -
          <lpage>293</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Gurudu S.R. Liang J Tajbakhsh</surname>
          </string-name>
          ,
          <source>N. Automated polyp detection in colonoscopy. IEEE</source>
          <volume>35</volume>
          (
          <issue>2</issue>
          ) (????),
          <fpage>630</fpage>
          -
          <lpage>644</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Marc</given-names>
            <surname>Coram Varun Gulshan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Lily</given-names>
            <surname>Peng</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs</article-title>
          .
          <source>JAMA</source>
          (
          <year>2016</year>
          ). https: //jamanetwork.com/journals/jama/fullarticle/2588763
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Yan</given-names>
            <surname>Kang Yingjie Zhao Qixun Qu Zhiqiong Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Ge</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Breast tumor detection in digital mammography based on extreme learning machine</article-title>
          . (
          <year>2012</year>
          ). http://faculty.neu.edu.cn/bmie/wangzq/image/ lunwen/17.pdf
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>