<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Network and U-Net for Polyps Segmentation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Quoc-Huy Trinh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Minh-Van Nguyen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thiet-Gia Huynh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Minh-Triet Tran</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Information Technology, University of Science</institution>
          ,
          <addr-line>VNU-HCM</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>John von Neumann Institute</institution>
          ,
          <addr-line>VNU-HCM</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Vietnam National University</institution>
          ,
          <addr-line>Ho Chi Minh city</addr-line>
          ,
          <country country="VN">Vietnam</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>14</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>The Medico: Multimedia Task 2020 focuses on developing an eficient and accurate computer-aided diagnosis system for automatic segmentation [3]. We participate in task 1, polyps segmentation task, which is to develop algorithms for segmenting polyps on a comprehensive dataset. In this task, we propose methods combining Residual module, Inception module, Adaptive Convolutional neural network with U-Net model and PraNet for semantic segmentation of various types of polyps in endoscopic images. We select 5 runs with diferent architecture and parameters in our methods. Our methods show potential results in accuracy and eficiency through multiple experiments, and our team is in Top 4 best results with Jaccard index of 0.765.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>The goal of Medico automatic polyp segmentation challenge is to
evaluate various methods for automatic polyp segmentation that can
be used to detect and mask out various types of polyps (including
irregular, small or flat polyps) with high accuracy.</p>
      <p>
        In this challenge, our goals are to segment the mask of all types
of polyps in the dataset. We consider five solutions, corresponding
to our five submitted runs. In our first approach, we adopt a simple
U-Net architecture [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] to parse masks of polyps. Second, we replace
the regular ReLU with Leaky ReLU to deal with dead neurons.[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
Third, to further boost the result, we design an Inception module to
extract better features. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] Fourth, we use a pretrained model with
the Resnet50 backbone to build ResUNet, yielding better obtained
results. Last, we employ PraNet, a parallel reverse attention network
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for polyp segmentation in colonoscopy images.
      </p>
    </sec>
    <sec id="sec-2">
      <title>METHODS</title>
      <p>We submit five runs for this task. In each run, we modify and
improve diferent architectures to enhance accuracy and segmentation
speed. Initially, we choose U-Net to create the base model that we
can test the baseline of the test set. However, the evaluation get low,
we decide to develop the U-Net by using Leaky ReLU, but the result
gets lower. Next, we replace the simple Convolution block with a
more complicated block, the Inception block and Residual block, to
improve the accuracy. However, the results are not higher. So, we
decide to choose PraNet and get the results as our expectation.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>U-Net Architecture</title>
      <p>
        In the first run, to have the overview of the segmentation problem
in this task, we use U-Net, which is used widely in medical image
segmentation [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], to segment polyps masks and modify it to get
a better baseline result. Our overall architecture, which is
uncomplicated as the standard U-Net, consists of convolutional encoding
and decoding units that take an image as input and produce the
segmentation feature maps with respective pixel classes. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
      </p>
      <p>In our model, we keep the encoding step - which includes 5x5 and
3x3 convolutional blocks and a 1x1 convolutional block with ReLU.
However, in this run, to fit the model, we use only one Pooling
layer with a small learning rate and batch size.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Leaky ReLU</title>
      <p>In the second run, to enhance the result with U-Net, we breakdown
and modify the ReLU layer. During the training step, ReLU can
cause something known as dead neurons. To overcome this critical
defect of dying neurons, we substitute ReLU with another activation
function, the Leaky ReLU. We replace all ReLU activation functions
with our custom leaky ReLU blocks while preserving the remaining
convolutional blocks and layers. Moreover, after each convolutional
block, we also add a Leaky ReLU to evaluate extracted features.
2.3</p>
    </sec>
    <sec id="sec-5">
      <title>ResUNet</title>
      <p>
        In the third run, we propose to extract more features to improve the
model. We add a more standard convolutional block to this
structure, but it easily to be a vanishing gradient. To achieve consistent
training as the depth of the network increases and also prevent
those issues, we replace the building blocks of the U-Net
architecture with modified residual blocks of convolutional layers. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
So our solution is using ResUNet, to prevent the issue and help
our training process better. ResUNet architecture combines the
strengths of residual units and features concatenate, which help to
ease the training of networks and facilitate information
propagation. Dilation convolution is a powerful tool that can enlarge the
receptive field of feature points without reducing the resolution of
the feature maps. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
      </p>
    </sec>
    <sec id="sec-6">
      <title>Inception Module</title>
      <p>We combine the Inception module with U-Net. The main idea of
the Inception architecture based on finding out how an optimal
local sparse structure in a convolutional vision network can be
approximated and covered by readily available dense components.</p>
      <p>One of the main beneficial aspects of this architecture is that it
allows for increasing the number of units at each stage significantly
without an uncontrolled blow-up in computational complexity. The
ubiquitous use of dimension reduction allows for shielding a large
number of input filters of the last step to the next layer, first reducing
their dimension before convolving over them with large patch size.</p>
      <p>
        Another practically useful aspect of this design is that it aligns
with the intuition that visual information should be processed at
various scales and then aggregated so that the next stage can
abstract features from diferent scales simultaneously. The eficient
allocation of computational resources allows both the width of
each step and the total number of steps to be increased without
getting computational dificulty. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] So, we add Inception module
to our model. However, to have the compatibility with our U-Net
architecture, we modify the order of convolution block to inception
module and increase more convolution block to extract feature with
a reasonable number to avoid vanishing gradient.
2.5
      </p>
    </sec>
    <sec id="sec-7">
      <title>PraNet</title>
      <p>
        In the last run, we use PraNet, which is a parallel reverse attention
network for accurate polyp segmentation in colonoscopy and also
has the highest benchmark on Medical Image Segmentation on
Kvasir-SEG [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to compare the result to our working.
3
3.1
      </p>
    </sec>
    <sec id="sec-8">
      <title>DATASET AND EXPERIMENTAL RESULTS</title>
    </sec>
    <sec id="sec-9">
      <title>Datasets and Processing</title>
      <p>
        The Kvaris-SEG [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is a dataset with 1000 images of Polyps in
endoscopic images with their mask. This dataset can be downloaded
from https://datasets.simula.no/kvasir-seg/.
      </p>
      <p>
        Data preparation: We use Retina net to create a bounding box
cover the Polyps in endoscopic images.[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] After generating
bounding box, we check the result by our own and create a tool to crop
the Polyps from the dataset of endoscopic images and mask in the
dataset, this preprocess method help us to increase the quality of
images double but keep the resolution of the training and validation
set.
      </p>
      <p>Data Augmentation: To increase the numbers of data to
improve the training process, we use data augmentation with two
methods: rotation and zooming.
3.2</p>
    </sec>
    <sec id="sec-10">
      <title>Results</title>
      <p>Evaluation is on the test set of Mediaeval- Medico task, which
include 160 images of Polyps in endoscopic images. Table 1 shows
that Run 1 is better than Run 2 and Run 3 is better than Run 4.
Run 5 is the best model overall. By replace ReLU with Leaky ReLU,
although Accuray in Run 2 is not better than Run 1, Recall is
gradually higher. Moreover, in 5 runs, ResUNet obtains the result of 90.1,
which is approximate with the highest overall is 94.6.</p>
      <p>Run ID
Run 1
Run 2
Run 3
Run 4
Run5</p>
    </sec>
    <sec id="sec-11">
      <title>CONCLUSION</title>
      <p>Currently, there is a growing interest in the development of
computeraided diagnosis (CADx) systems that could act as a second observer
and digital assistant for the endoscopists. Algorithmic
benchmarking is an eficient approach to analyze. We can try more new method
with a diferent approach and improve the data preparation process
to get a higher result.</p>
      <p>In general, PraNet still gets the highest accuracy score.
However, The computing speed remains high while U-Net and ResUNet
take more advantages in speed with slightly diferent in results.
Moreover, data preparation is the most impact process to achieve
higher accuracy and evaluation metrics. Therefore, we propose to
use Inception Resnet or Densenet 169 feature extractor combine
with U-Net to improve the accuracy of the test.</p>
    </sec>
    <sec id="sec-12">
      <title>ACKNOWLEDGMENT</title>
      <p>This research is funded by Vietnam National University HoChiMinh
City (VNU-HCM) under grant number D2020-42-01 on “Artificial
Intelligence and Extended Reality for Medical Diagnosis and
Treatment Assistance”.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Deng-Ping</surname>
            <given-names>Fan</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ge-Peng</surname>
            <given-names>Ji</given-names>
          </string-name>
          , Tao Zhou, Geng Chen, Huazhu Fu,
          <string-name>
            <given-names>Jianbing</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Ling</given-names>
            <surname>Shao</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>PraNet: Parallel Reverse Attention Network for Polyp Segmentation</article-title>
          . arXiv:
          <year>2006</year>
          .
          <article-title>11392 [eess</article-title>
          .IV]
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Zhang</surname>
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Ren S. Sun J. He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Identity mappings in deep residual networks</article-title>
          .
          <source>arXiv preprint arXiv:1603.05027</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Debesh</given-names>
            <surname>Jha</surname>
          </string-name>
          , Steven A.
          <string-name>
            <surname>Hicks</surname>
          </string-name>
          , Krister Emanuelsen, Håvard Johansen, Dag Johansen, Thomas de Lange,
          <article-title>Michael A</article-title>
          .
          <string-name>
            <surname>Riegler</surname>
            , and
            <given-names>Pål</given-names>
          </string-name>
          <string-name>
            <surname>Halvorsen</surname>
          </string-name>
          .
          <year>2020</year>
          . Medico Multimedia Task at MediaEval 2020:
          <article-title>Automatic Polyp Segmentation</article-title>
          .
          <source>In Proc. of the MediaEval 2020 Workshop.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Debesh</given-names>
            <surname>Jha</surname>
          </string-name>
          ,
          <string-name>
            <surname>Pia H Smedsrud</surname>
          </string-name>
          ,
          <article-title>Michael A Riegler, Pål Halvorsen</article-title>
          , Thomas de Lange, Dag Johansen, and
          <string-name>
            <given-names>Håvard D</given-names>
            <surname>Johansen</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Kvasir-SEG: A segmented polyp dataset</article-title>
          .
          <source>In Proc. of International Conference on Multimedia Modeling</source>
          .
          <fpage>451</fpage>
          -
          <lpage>462</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Debesh</given-names>
            <surname>Jha</surname>
          </string-name>
          ,
          <string-name>
            <surname>Pia H Smedsrud</surname>
          </string-name>
          ,
          <article-title>Michael A Riegler, Dag Johansen</article-title>
          , Thomas De Lange, Pål Halvorsen, and
          <string-name>
            <given-names>Håvard D</given-names>
            <surname>Johansen</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>ResUNet++: An Advanced Architecture for Medical Image Segmentation</article-title>
          .
          <source>In Proc. of International Symposium on Multimedia</source>
          .
          <volume>225</volume>
          -
          <fpage>230</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Tsung-Yi</surname>
            <given-names>Lin</given-names>
          </string-name>
          , Priya Goyal, Ross Girshick, Kaiming He, and
          <string-name>
            <given-names>Piotr</given-names>
            <surname>Dollár</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Focal Loss for Dense Object Detection</article-title>
          .
          <source>arXiv:1708</source>
          .
          <year>02002</year>
          [cs.CV]
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhong</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Cao</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>D-Resunet: Resunet and Dilated Convolution for High Resolution Satellite Imagery Road Extraction</article-title>
          .
          <fpage>3927</fpage>
          -
          <lpage>3930</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Chris</given-names>
            <surname>Yakopcic Tarek M. Taha Md Zahangir Alom</surname>
          </string-name>
          , Mahmudul Hasan and
          <string-name>
            <given-names>Vijayan K.</given-names>
            <surname>Asari</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Recurrent Residual Convolutional Neural Network based on U-Net (R2U-Net) for Medical Image Segmentation</article-title>
          . arXiv preprint arXiv:
          <year>1802</year>
          .
          <volume>06955</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Fischer O. Ronneberger</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Brox</surname>
          </string-name>
          .
          <year>2015</year>
          . U-Net:
          <article-title>Convolutional networks for biomedical image segmentation</article-title>
          .
          <source>In International Conference on Medical Image Computing and Computer-Assisted Intervention</source>
          .
          <fpage>234</fpage>
          -
          <lpage>241</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Christian</surname>
            <given-names>Szegedy</given-names>
          </string-name>
          , Wei Liu, Yangqing Jia,
          <string-name>
            <given-names>Pierre</given-names>
            <surname>Sermanet</surname>
          </string-name>
          , Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Rabinovich</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Going Deeper with Convolutions</article-title>
          .
          <source>arXiv:1409</source>
          .4842 [cs.CV]
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zou</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Shi</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Dilated convolution neural network with LeakyReLU for environmental sound classification</article-title>
          .
          <source>In 2017 22nd International Conference on Digital Signal Processing (DSP)</source>
          .
          <article-title>1-5</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>