<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Eifcient S upervision N et: P olyp S egmentation Using EifcientNet and Attention Unit</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sabari Nathan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Suganya Ramamoorthy</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Couger Inc</institution>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Thiagarajar college of engineering</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>14</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>Colorectal cancer is the third most common cause of cancer worldwide. In the era of medical Industry, identifying colorectal cancer in its early stages has been a challenging problem. Inspired by these issues, the main objective of this paper is to develop a Multi supervision net algorithm for segmenting polys on a comprehensive dataset. We have taken the Medico polyp challenge dataset, which consists of 1000 segmented polyp images from the gastrointestinal tract. We proposed an eficient Net B4 as a pre-trained architecture in multi-supervision net. The model is trained with multiple output layers. We present quantitative results on colorectal dataset to evaluate the performance and achieved good results in all the performance metrics. The experimental results proved that the proposed model is robust and provides a good level of accuracy in segmenting polyps on a comprehensive dataset for diferent metrics such as dice coeficient, recall, precision, and F2.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) We created a novel Multi-Supervision Net architecture is
proposed, i.e., the model is trained with multiple output
layer.
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) We evaluated the proposed architecture on a challenging
polyp segmentation data set.
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) We used EficientNetB4 as a backbone for the proposed
architecture.
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) Achieved good experimental results in terms of accuracy,
      </p>
      <p>F1 score, and loss.</p>
    </sec>
    <sec id="sec-2">
      <title>METHODS</title>
      <p>
        To deal with the challenge of a robust Medico automatic polyp
segmentation task, we have proposed a Multi-Supervision Net
architecture. The detailed architecture is presented in Figure 3. In the
following subsection, we will discuss the details of each module we
used in the proposed architecture.
Our proposed polyp segmentation method uses a deep
convolutional network to learn a connection between the input polyps’
images. The overall block diagram of the proposed architecture is
shown in Figure 3, which mainly consist of five layers (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
Convolutional layer, (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) Eficient layer [ 6], (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) Encoder block with eficient
Net B4 (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) Decoder block combination of dense block and
Concurrent Spatial and Channel Attention (CSCA) [5] blocks (
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
Convolution Block Attention Module (CBAM) [7]. In this architecture, the
input image pixels 332×487 is resized to 384×256 and divided by 255
are passed in to the convolutional layer. The proposed network was
inspired by the multilevel hyper vision Net [1] and had properties
of an encoder, decoder structure with supervision layers. Encoder
block used eficient Net B4 as the backbone of the architecture.
Decoder block consists of a combination of dense block [2] and
Concurrent Spatial and Channel Attention (CSCA) block. Both the
encoder and decoder are connected by Convolution block attention
CBAM block. All the output of encoder are supervision, i.e., take
individual decoder output and upsampled with output layer and
supervised by the loss function, and also all upsampled output are
concatenated and fed into CBAM. Totally six outputs are obtained
from our proposed architecture. In the upsampling, we have used a
convolution transpose layer. CBAM, an efective attention module
for feed-forward convolutional neural networks. CBAM has two
sequential sub-modules: channel attention module and spatial
attention module. The intermediate feature map is adaptively refined
through CBAM at every convolutional block of deep networks. The
CBAM block is utilized to generate spatial attention on the channel
attention of the encoder output and the decoder output. The output
of this attention is added with the input image. Equation 1 and
Equation 2 shows the details of mathematical operations
 =  [ 1  0
      </p>
      <p>!
 +  1 ( ( )))]
 =  [ 0</p>
      <p>∥ ( ) ∥]
2.2</p>
    </sec>
    <sec id="sec-3">
      <title>Decoder part</title>
      <p>The output of the CSCA block was passed into CBAM. The CBAM
shares the processing results into another dense block. Now the
algorithm starts from a bottom-up approach. Add the dense block
and 5th eficient layer output results and forward to the
consecutive dense and CSCA blcok. The CSCAB was connected to the
average pooling and convolutional layer. The result of pooling and
convolution is fed into the output layer. Now upsample the image
or results of CSCA block and concatenate with 4th eficient layer,
and remaining are same as the previous layer. After completing the
operations up to 5 times, we concatenate every “average pooling
+ conv layer” result pass the output into CBAM and another
concatenator. Finally, the concatenator sent output to the output layer.
The output layer produces the final result of the given image. Loss
function combination of categorical and dice loss
3
3.1</p>
    </sec>
    <sec id="sec-4">
      <title>RESULTS</title>
    </sec>
    <sec id="sec-5">
      <title>Dataset</title>
      <p>The Kvaris-SEG [4] contains 1,000 polyp images and their
corresponding ground truth mask, as shown in Figure 1. The dataset
was collected from real routine clinical examinations at Baerum
Hospital in Norway by expert gastroenterologists. The resolution
of images varies from 332 × 487 to 1920 × 1072 pixels. Some of the
images contain a green thumbnail in the lower-left corner of the
images showing the scope position marking from the ScopeGuide
(Olympus) refer to Figure 2. The Medico 2020 team annotate
another separate dataset and use it as the test set to benchmark our
proposed approach. Figure 2 shows some examples of test images
used in the challenge.
3.2</p>
    </sec>
    <sec id="sec-6">
      <title>Training</title>
      <p>The proposed architecture has been trained and validated with the
Medico automatic polyp segmentation challenges Task 1: Polyp
Segmentation dataset. The proposed network is inspired by the
Hyper Vision Net [4]. The shared dataset consists of 1000 segmented
polyp images from the gastrointestinal tract images. We randomly
split it into 70 percentage for training and 30 percentage for
validation from the whole dataset. We used the Adam optimizer with
an initial learning rate of 0.001. The learning rate was decreased to
0.00001 while we trained our models for 500 epochs. The proposed
network was trained with the IntelCore i7 processor, GeForce GTX
We divided the image by 255 to normalize it between 0-1. We applied
data augmentation techniques such as HorizontalFlip, VerticalFlip,
Blur (limit = 3), and Rotate(-10, 10) to increase the image count.
A mixture of two diferent loss functions is used to supervise the
network model outputs: categorical cross-entropy and dice loss.
In this work, Adam optimizer is applied, which perfectly revise
network weights in an iterative approach in training data. We used
Adam optimizer with a learning rate of 0.001 to 0.00001 and 500
epochs for training the model. Our Loss function is defined by
categorical Cross entropy and dice loss.
4</p>
    </sec>
    <sec id="sec-7">
      <title>DISCUSSION</title>
      <p>Currently, there is a growing interest in the development of
computeraided diagnosis (CADx) systems that could act as a second observer
and digital assistant for endoscopists. Algorithmic benchmarking
is an eficient approach to analyze the results of diferent methods.
The task will use mean Intersection over Union (mIoU), or Jaccard
index, as an evaluation metric, which is a standard metric for all
medical segmentation task. Our proposed algorithm shows the
Jaccard value as 0.777 in run 1 and run 2. The other evaluation metrics,
such as dice coeficient, precision, recall, F2, and frame per second
(FPS) for a comprehensive evaluation, also show efectiveness. In
the challenge overview paper [3], the organizers will be calculating
the metrics such as the Dice coeficient, mIoU, recall, precision,
overlap, F2, FPS, the method submitted by each team and presented
in Table 1.
5</p>
    </sec>
    <sec id="sec-8">
      <title>CONCLUSION</title>
      <p>We have presented a novel and unique Multi-supervision Net with
EficientNetB4 as the architecture’s backbone to improve image
segmentation accuracy under diferent factors. We accomplished
this by training a multilevel attention network to take images from
the Medico challenge 2020 polyp segmentation dataset. Moreover,
we present a CSCA block in the decoder to improve image quality.
As a major contribution, CBAM enhances the overall mechanism
and utilizes significant features from the Encoder block. Many
evaluations and comparisons to previous state-of-the-art approaches
show that we can achieve good performance in qualitative and
quantitative experimentation and proved the eficacy of the proposed
architecture.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Sabarinathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.Parisa</given-names>
            <surname>Beham</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.M.</given-names>
            <surname>Md</surname>
          </string-name>
          .
          <source>Mansoor Roomi</source>
          .
          <year>2019</year>
          .
          <article-title>Hyper Vision Net: Kidney Tumor Segmentation Using Coordinate Convolutional Layer and Attention Unit</article-title>
          . arXiv preprint arXiv:
          <year>1908</year>
          .03339.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Gao</given-names>
            <surname>Huang</surname>
          </string-name>
          , Zhuang Liu, Laurens van der Maaten, and
          <string-name>
            <surname>Kilian</surname>
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Weinberger</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Densely Connected Convolutional Networks</article-title>
          .
          <source>arXiv:1608</source>
          .
          <fpage>06993</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Debesh</given-names>
            <surname>Jha</surname>
          </string-name>
          , Steven A.
          <string-name>
            <surname>Hicks</surname>
          </string-name>
          , Krister Emanuelsen, Håvard Johansen, Dag Johansen, Thomas de Lange,
          <article-title>Michael A</article-title>
          .
          <string-name>
            <surname>Riegler</surname>
            , and
            <given-names>Pål</given-names>
          </string-name>
          <string-name>
            <surname>Halvorsen</surname>
          </string-name>
          .
          <year>2020</year>
          . Medico Multimedia Task at MediaEval 2020:
          <article-title>Automatic Polyp Segmentation</article-title>
          .
          <source>In Proc. of the MediaEval 2020 Workshop.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Debesh</given-names>
            <surname>Jha</surname>
          </string-name>
          ,
          <string-name>
            <surname>Pia H Smedsrud</surname>
          </string-name>
          ,
          <article-title>Michael A Riegler, Pål Halvorsen</article-title>
          , Thomas de Lange, Dag Johansen, and
          <string-name>
            <given-names>Håvard D</given-names>
            <surname>Johansen</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Kvasir-SEG: A segmented polyp dataset</article-title>
          .
          <source>In Proc. of International Conference on Multimedia Modeling</source>
          .
          <fpage>451</fpage>
          -
          <lpage>462</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Abhijit</given-names>
            <surname>Guha</surname>
          </string-name>
          <string-name>
            <surname>Roy</surname>
          </string-name>
          , Nassir Navab, and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Wachinger</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Concurrent Spatial and Channel Squeeze Excitation in Fully convolutional Networks</article-title>
          . arXiv:
          <year>1803</year>
          .02579.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Mingxing</given-names>
            <surname>Tan</surname>
          </string-name>
          and
          <string-name>
            <surname>Quoc V.</surname>
          </string-name>
          <year>2019</year>
          .
          <article-title>EficientNet: Rethinking Model Scaling for Convolutional Neural Networks</article-title>
          . arXiv:
          <year>1905</year>
          .11946.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[7] Woo and Sanghyun</source>
          .
          <year>2018</year>
          .
          <article-title>CBAM: Convolutional block attention module</article-title>
          .
          <source>In Proceedings of the European Conference on Computer Vision</source>
          (ECCV).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>