<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Bigger Networks are not Always Beter: Deep Convolutional Neural Networks for Automated Polyp Segmentation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Adrian Krenzer</string-name>
          <email>adrian.krenzer@uni-wuerzburg.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Frank Puppe</string-name>
          <email>frank.puppe@uni-wuerzburg.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Julius-Maximilian University of Würzburg</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>14</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>This paper presents our team's (AI-JMU) approach to the Medico automated polyp segmentation challenge. We consider deep convolutional neural networks to be well suited for this task. To determine the best architecture we test and compare state of the art backbones and two diferent heads. Finally, we achieve a Jaccard index of 73.74% on the challenge's test set. We further demonstrate that bigger networks do not always perform better. However, the growing network size always increases the computational complexity. Worldwide colorectal cancer (CRC) represents the third most commonly diagnosed cancer [6, 17]. According to Herszenyi and Tulassay [10] CRC attributes to 9% of all cancer incidence globally and is the fourth cause of cancer death worldwide [6, 17]. In order to detect potentially cancerous tissues early, physicians conduct a socalled colonoscopy. During this procedure, the physician searches for polyps inside the colon in order to remove them. Polyps are abnormally growing tissues that usually look like small, flat bumps or tiny fungal stems. Due to the aberrant cell growth, they can eventually become malignant or cancerous. Nevertheless, even the best physicians have a risk of overlooking a polyp. Missed polyps are not removed and can therefore have fatal consequences. Automated detecting and segmenting polyps is the task of the medico challenge [12]. This challenge is special because it is not allowed to include training data other than the 1000 provided polyp images of Jha et al. [13]. In this paper, we present our challenge results and explain how we select the networks for our final predictions.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>
        In the domain of object segmentation with deep learning, there
are two general state of the art approaches: Fully convolutional
networks [
        <xref ref-type="bibr" rid="ref16 ref21 ref7">7, 16, 21</xref>
        ] and encoder-decoder architectures [
        <xref ref-type="bibr" rid="ref1 ref24 ref5">1, 5, 24</xref>
        ].
Some state of the art polyp segmentation methods include
encoderdecoder architectures. However due to the high computational
complexity of those models, polyp segmentation research focuses
mostly on fully convolutional architectures to enable real-time
segmentation systems [
        <xref ref-type="bibr" rid="ref11 ref28">11, 28</xref>
        ]. We consider our approaches to
belong to the field of fully convolutional networks. The chosen
models are based on our previous study [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], which we advance
for this challenge by: Focusing exclusively on polyp segmentation,
testing a new state of the art backbone in polyp segmentation [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]
and comparing diferent architectures comprehensively.
      </p>
    </sec>
    <sec id="sec-3">
      <title>APPROACH</title>
      <p>
        This section focuses on our approaches for the Medico automated
polyp segmentation tasks. We train all our models using a Tesla
Turing RTX 8000 Nvidia GPU. For this challenge, Deep CNNs are best
suited as they provide very stable outcomes in multi-class
segmentation tasks like the COCO challenge [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Since both bounding
boxes and segmentation masks are available in the dataset, we
choose networks that can handle both inputs. Therefore we select
the Mask R-CNN [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and the Cascade Mask R-CNN [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We build
both architectures based on two-stage object detection models
using Faster R-CNN [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Therefore a region proposal network first
suggests candidate bounding boxes (Regions of Interest, RoI) before
making the final prediction. In this case, an additional branch is
added designed to predict segmentation masks, where the suggested
RoIs enhance the segmentation mask predictions. A Cascade Mask
R-CNN uses an extended framework which is defined by a
cascadelike composition utilizing several Mask R-CNNs with shared weight
on the backbones. We train both the Cascade Mask R-CNN and
Mask R-CNN with the open-source Detectron2 framework [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
      </p>
      <p>
        We select these types of models because of two rationales: First,
the availability of both bounding boxes and segmentation masks for
training purposes allows us to maximize the Mask R-CNN
performance, because RoI and segmentation are closely related. Second,
because the mask of polyps included in the KVASIR-SEG dataset
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] often vary significantly in size and shape we desire a network
that is unafected by those variations and determines a pixel-wise
mask of the polyp. Because we use semantic segmentation, we
deal with this as an instance segmentation defined by a single
instance per incident per class. Therefore, we alter the ground truth
bounding boxes in our data to include only one instance instead of
multiple instances.
      </p>
      <p>
        We test the Cascade Mask R-CNN and Mask R-CNN with ResNet
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] as well as the new ResNeSt [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] backbone. The latter adds a
split attention block to the ResNet backbone and reconfigures the
ResNet structure. This block and structure enable the network to
share attention across feature-map groups. This might ofer some
benefits to the polyp segmentation task. Additionally, we vary the
depth of both backbones, with depths of 50 and 101 for ResNet
as well as 50, 101, and 200 for ResNeSt. The backbones we use
consist of CNN classifiers pre-trained using the ImageNet-1k dataset
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. The whole architecture is pre-trained on the COCO dataset
[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Consequentially we use transfer learning to compensate for
the small size of the training dataset. We train networks with the
Detectron2 framework [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] and a fork of the Detectron2 framework
published by Zhang et al. [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. Both provide a wide range of
pretrained object detection and segmentation models. Prior to the
actual processing, we convert our data to the COCO dataset format.
Afterward, the required image preprocessing steps, i.e. padding,
resizing, rescaling the pixel values, etc., are automatically performed
within the framework.
      </p>
      <p>
        We define the total loss as the sum of classification, box-regression
and mask loss  = cls + box + mask [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], where mask is the
binary cross-entropy for autonomous segmentation of all masks. The
training of all models includes a stochastic gradient descent
using a learning rate of 0.00025 and a batch size of 16. Every model
trains for up to 80000 iterations, maintaining checkpoints every
300 iterations. Afterward, we adopt the checkpoint with the
lowest validation loss for the final outcome. Additionally, we utilize
random horizontal flipping, vertical flipping, and random resizing
as data augmentation while retaining aspect ratio to diminish the
generalization error.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>RESULTS AND ANALYSIS</title>
      <p>
        We evaluate the models on our validation dataset which is a subset
of the Kvasir-SEG data [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. For the evaluation we consider quality
and speed. For quality we compute the dice coeficient, intersection
over union (IoU), and accuracy (Acc). For speed we specify frames
per second (FPS). All our validations are carried out using an Nvidia
V100 GPU within the cloud solution of Google Colab [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Table 1
depicts our results. While Cascade Mask R-CNN outperforms Mask
R-CNN in every quality metric, Mask R-CNN is faster with
computation. However, the architecture’s speed shows a clear pattern: the
Mask R-CNN using the smallest backbone (lowest computational
complexity) is the fastest, and Cascade Mask R-CNN (highest
computational complexity) with the largest backbone is the slowest.
Comparing the ResNet and RestNeSt backbone: Using the ResNeSt
backbone results in higher scores in all metrics. Nevertheless, the
RestNeSt backbone increases the computational complexity and
therefore decreases FPS. Concerning the depth of the network:
Changing the depth from 50 to 101 increases the quality of the
results. This implies that a deeper backbone may always result in
better quality. However, our results show that a larger backbone not
always causes better quality, but always diminishes the speed due
to higher computational complexity, in our case dropping FPS down
to 2.9 for ResNeSt200. We evaluate ResNeSt200 backbone only with
the Cascade Mask R-CNN because there are no pre-trained weights
available for the Mask R-CNN version.
      </p>
      <p>Overall, Cascade Mask R-CNN with a ReStNest101 backbone
provides the best quality results. Therefore, we consider this backbone
A. Krenzer, F. Puppe
a) Best performing
b) Worst performing
for the quality task of the Medico challenge. For the eficacy task of
the challenge we choose the Cascade Mask R-CNN with ReStNest50
backbone. It is faster and less taxing on memory than ReStNest101
while still maintaining high-quality results. Our challenge scores
for the quality task are an IoU of 0.737. For the eficacy task our
results are an IoU of 0.721 while computing with 3.36 FPS on an
Nvidia GTX 1080. To qualitatively demonstrate a set of our results,
we depict the four best and worst classified images of our validation
set in figure 1. The algorithm performs best on big, unconcealed
polyps. Nevertheless, small polyps like shown in the first three
images of figure 1  are harder to segment. In addition, concealing
the polyp with a tool like in the last image of figure 1  prevents the
algorithm from detecting the polyp.
5</p>
    </sec>
    <sec id="sec-5">
      <title>CONCLUSION AND OUTLOOK</title>
      <p>In summary, our results suggest that using a deeper neural network,
extending it with another backbone, or adding a computationally
more expensive architecture like Cascade Mask R-CNN leads to
higher quality segmentations. Nevertheless, the increasing network
size is not always beneficial. Moreover, we demonstrate that the
ReStNeSt101 backbone combined with the Cascade Mask R-CNN
structure is the best segmentation algorithm among our examples.</p>
      <p>
        Further research could extend our architectures and compare
them with other state of the art segmentation models like the
DeepLabv3+ [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], HRNet [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], MRFM [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. Those three
architectures and the proposed architecture are currently the best
performing architectures on object segmentation benchmarks [
        <xref ref-type="bibr" rid="ref18 ref22 ref26 ref27">18, 22, 26,
27</xref>
        ]. Especially promising is the speed and quality trade of using
HarDNet [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and BiSeNet [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] for further evaluations.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Vijay</given-names>
            <surname>Badrinarayanan</surname>
          </string-name>
          , Alex Kendall, and
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Cipolla</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Segnet: A deep convolutional encoder-decoder architecture for image segmentation</article-title>
          .
          <source>IEEE transactions on pattern analysis and machine intelligence</source>
          <volume>39</volume>
          , 12 (
          <year>2017</year>
          ),
          <fpage>2481</fpage>
          -
          <lpage>2495</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Ekaba</given-names>
            <surname>Bisong</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Google Colaboratory</article-title>
          .
          <source>In Building Machine Learning and Deep Learning Models on Google Cloud Platform</source>
          . Springer,
          <fpage>59</fpage>
          -
          <lpage>64</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Zhaowei</given-names>
            <surname>Cai</surname>
          </string-name>
          and
          <string-name>
            <given-names>Nuno</given-names>
            <surname>Vasconcelos</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Cascade R-CNN: High Quality Object Detection and Instance Segmentation</article-title>
          . CoRR abs/
          <year>1906</year>
          .09756 (
          <year>2019</year>
          ). arXiv:
          <year>1906</year>
          .09756 http://arxiv.org/abs/
          <year>1906</year>
          .09756
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Ping</given-names>
            <surname>Chao</surname>
          </string-name>
          ,
          <string-name>
            <surname>Chao-Yang</surname>
            <given-names>Kao</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu-Shan</surname>
            <given-names>Ruan</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chien-Hsiang Huang</surname>
          </string-name>
          , and
          <string-name>
            <surname>Youn-Long Lin</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>HarDNet: A Low Memory Trafic Network</article-title>
          . CoRR abs/
          <year>1909</year>
          .00948 (
          <year>2019</year>
          ). arXiv:
          <year>1909</year>
          .00948 http://arxiv.org/abs/
          <year>1909</year>
          .00948
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Liang-Chieh</surname>
            <given-names>Chen</given-names>
          </string-name>
          , Yukun Zhu, George Papandreou, Florian Schrof, and
          <string-name>
            <given-names>Hartwig</given-names>
            <surname>Adam</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Encoder-decoder with atrous separable convolution for semantic image segmentation</article-title>
          .
          <source>In Proceedings of the European conference on computer vision (ECCV)</source>
          .
          <volume>801</volume>
          -
          <fpage>818</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Pasqualino</given-names>
            <surname>Favoriti</surname>
          </string-name>
          , Gabriele Carbone, Marco Greco, Felice Pirozzi, Rafaele Emmanuele Maria Pirozzi, and
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Corcione</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Worldwide burden of colorectal cancer: a review</article-title>
          .
          <source>Updates in surgery 68</source>
          ,
          <issue>1</issue>
          (
          <year>2016</year>
          ),
          <fpage>7</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Chaoyi</given-names>
            <surname>Han</surname>
          </string-name>
          , Yiping Duan, Xiaoming Tao, and
          <string-name>
            <given-names>Jianhua</given-names>
            <surname>Lu</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Dense convolutional networks for semantic segmentation</article-title>
          .
          <source>IEEE Access</source>
          <volume>7</volume>
          (
          <year>2019</year>
          ),
          <fpage>43369</fpage>
          -
          <lpage>43382</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Kaiming</given-names>
            <surname>He</surname>
          </string-name>
          , Georgia Gkioxari, Piotr Dollár, and
          <string-name>
            <surname>Ross</surname>
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Girshick</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <string-name>
            <surname>Mask</surname>
          </string-name>
          R-CNN.
          <source>CoRR abs/1703</source>
          .06870 (
          <year>2017</year>
          ). arXiv:
          <volume>1703</volume>
          .06870 http://arxiv.org/abs/1703.06870
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Kaiming</given-names>
            <surname>He</surname>
          </string-name>
          , Xiangyu Zhang, Shaoqing Ren, and
          <string-name>
            <given-names>Jian</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Deep Residual Learning for Image Recognition</article-title>
          .
          <source>CoRR abs/1512</source>
          .03385 (
          <year>2015</year>
          ). arXiv:
          <volume>1512</volume>
          .03385 http://arxiv.org/abs/1512.03385
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Laszlo</given-names>
            <surname>Herszenyi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Zsolt</given-names>
            <surname>Tulassay</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Epidemiology of gastrointestinal and liver tumors</article-title>
          .
          <source>Eur Rev Med Pharmacol Sci</source>
          <volume>14</volume>
          ,
          <issue>4</issue>
          (
          <year>2010</year>
          ),
          <fpage>249</fpage>
          -
          <lpage>258</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Debesh</surname>
            <given-names>Jha</given-names>
          </string-name>
          , Sharib Ali,
          <string-name>
            <surname>Håvard D Johansen</surname>
          </string-name>
          ,
          <string-name>
            <surname>Dag D Johansen</surname>
            ,
            <given-names>Jens</given-names>
          </string-name>
          <string-name>
            <surname>Rittscher</surname>
          </string-name>
          ,
          <article-title>Michael A Riegler,</article-title>
          and
          <string-name>
            <given-names>Pål</given-names>
            <surname>Halvorsen</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Real-Time Polyp Detection, Localisation and Segmentation in Colonoscopy Using Deep Learning</article-title>
          . arXiv preprint arXiv:
          <year>2011</year>
          .
          <volume>07631</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Debesh</surname>
            <given-names>Jha</given-names>
          </string-name>
          , Steven A.
          <string-name>
            <surname>Hicks</surname>
            , Krister Emanuelsen,
            <given-names>Håvard D.</given-names>
          </string-name>
          <string-name>
            <surname>Johansen</surname>
          </string-name>
          , Dag Johansen, Thomas de Lange,
          <article-title>Michael A</article-title>
          .
          <string-name>
            <surname>Riegler</surname>
            , and
            <given-names>Pål</given-names>
          </string-name>
          <string-name>
            <surname>Halvorsen</surname>
          </string-name>
          .
          <source>Medico Multimedia Task at MediaEval</source>
          <year>2020</year>
          :
          <article-title>Automatic Polyp Segmentation</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Debesh</surname>
            <given-names>Jha</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pia H Smedsrud</surname>
          </string-name>
          ,
          <article-title>Michael A Riegler, Pål Halvorsen</article-title>
          , Thomas de Lange, Dag Johansen, and
          <string-name>
            <given-names>Håvard D</given-names>
            <surname>Johansen</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>KvasirSEG: A Segmented Polyp Dataset</article-title>
          .
          <source>In Proc. of International Conference on Multimedia Modeling (MMM)</source>
          .
          <volume>451</volume>
          -
          <fpage>462</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Adrian</given-names>
            <surname>Krenzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hekalo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Puppe</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Endoscopic Detection And Segmentation Of Gastroenterological Diseases With Deep Convolutional Neural Networks</article-title>
          . In EndoCV@ISBI.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Tsung-Yi Lin</surname>
            ,
            <given-names>Michael</given-names>
          </string-name>
          <string-name>
            <surname>Maire</surname>
            , Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and
            <given-names>C Lawrence</given-names>
          </string-name>
          <string-name>
            <surname>Zitnick</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Microsoft coco: Common objects in context</article-title>
          .
          <source>In European conference on computer vision</source>
          . Springer,
          <fpage>740</fpage>
          -
          <lpage>755</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Jonathan</surname>
            <given-names>Long</given-names>
          </string-name>
          , Evan Shelhamer, and
          <string-name>
            <given-names>Trevor</given-names>
            <surname>Darrell</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Fully convolutional networks for semantic segmentation</article-title>
          .
          <source>In Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          .
          <volume>3431</volume>
          -
          <fpage>3440</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Marmot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T</given-names>
            <surname>Atinmo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T</given-names>
            <surname>Byers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T</given-names>
            <surname>Hirohata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            <surname>Jackson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W</given-names>
            <surname>James</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L</given-names>
            <surname>Kolonel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S</given-names>
            <surname>Kumanyika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C</given-names>
            <surname>Leitzmann</surname>
          </string-name>
          , and others.
          <source>2007</source>
          .
          <article-title>Food, nutrition, physical activity, and the prevention of cancer: a global perspective</article-title>
          . (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Heungmin</surname>
            <given-names>Oh</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Minjung</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Hyungtae</given-names>
            <surname>Kim</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Joonki</given-names>
            <surname>Paik</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Metadata Extraction Using DeepLab V3 and Probabilistic Latent Semantic Analysis for Intelligent Visual Surveillance Systems</article-title>
          .
          <source>In 2020 IEEE International Conference on Consumer Electronics (ICCE)</source>
          .
          <source>IEEE</source>
          , 1-
          <fpage>2</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Shaoqing</surname>
            <given-names>Ren</given-names>
          </string-name>
          , Kaiming He,
          <string-name>
            <surname>Ross B. Girshick</surname>
            , and
            <given-names>Jian</given-names>
          </string-name>
          <string-name>
            <surname>Sun</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <string-name>
            <surname>Faster</surname>
            <given-names>R-CNN</given-names>
          </string-name>
          :
          <article-title>Towards Real-Time Object Detection with Region Proposal Networks</article-title>
          .
          <source>CoRR abs/1506</source>
          .01497 (
          <year>2015</year>
          ). arXiv:
          <volume>1506</volume>
          .01497 http://arxiv. org/abs/1506.01497
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Olga</surname>
            <given-names>Russakovsky</given-names>
          </string-name>
          , Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein,
          <string-name>
            <surname>Alexander C. Berg</surname>
          </string-name>
          , and
          <string-name>
            <surname>Fei-Fei Li</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>ImageNet Large Scale Visual Recognition Challenge</article-title>
          .
          <source>CoRR abs/1409</source>
          .0575 (
          <year>2014</year>
          ). arXiv:
          <volume>1409</volume>
          .0575 http://arxiv.org/abs/1409.0575
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Evan</surname>
            <given-names>Shelhamer</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Jonathan</given-names>
            <surname>Long</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Trevor</given-names>
            <surname>Darrell</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Fully convolutional networks for semantic segmentation</article-title>
          .
          <source>IEEE transactions on pattern analysis and machine intelligence 39</source>
          ,
          <issue>4</issue>
          (
          <year>2017</year>
          ),
          <fpage>640</fpage>
          -
          <lpage>651</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Tao</surname>
          </string-name>
          , Karan Sapra, and
          <string-name>
            <given-names>Bryan</given-names>
            <surname>Catanzaro</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Hierarchical Multi-Scale Attention for Semantic Segmentation</article-title>
          . arXiv preprint arXiv:
          <year>2005</year>
          .
          <volume>10821</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Yuxin</surname>
            <given-names>Wu</given-names>
          </string-name>
          , Alexander Kirillov, Francisco Massa,
          <string-name>
            <surname>Wan-Yen Lo</surname>
            , and
            <given-names>Ross</given-names>
          </string-name>
          <string-name>
            <surname>Girshick</surname>
          </string-name>
          .
          <year>2019</year>
          . Detectron2. https://github.com/facebookresearch/ detectron2. (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Jimei</surname>
            <given-names>Yang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Brian</given-names>
            <surname>Price</surname>
          </string-name>
          , Scott Cohen,
          <string-name>
            <given-names>Honglak</given-names>
            <surname>Lee</surname>
          </string-name>
          , and
          <string-name>
            <surname>Ming-Hsuan Yang</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Object contour detection with a fully convolutional encoder-decoder network</article-title>
          .
          <source>In Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          .
          <volume>193</volume>
          -
          <fpage>202</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Changqian</surname>
            <given-names>Yu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Changxin</given-names>
            <surname>Gao</surname>
          </string-name>
          , Jingbo Wang,
          <string-name>
            <surname>Gang Yu</surname>
            ,
            <given-names>Chunhua</given-names>
          </string-name>
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>and Nong</given-names>
          </string-name>
          <string-name>
            <surname>Sang</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation</article-title>
          . arXiv preprint arXiv:
          <year>2004</year>
          .
          <volume>02147</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Jianlong</surname>
            <given-names>Yuan</given-names>
          </string-name>
          , Zelu Deng,
          <string-name>
            <given-names>Shu</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Zhenbo</given-names>
            <surname>Luo</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Multi Receptive Field Network for Semantic Segmentation</article-title>
          .
          <source>In 2020 IEEE Winter Conference on Applications of Computer Vision</source>
          (WACV). IEEE,
          <year>1883</year>
          -
          <fpage>1892</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Hang</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Chongruo Wu, Zhongyue Zhang, Yi Zhu, Zhi Zhang, Haibin Lin, Yue Sun, Tong He,
          <string-name>
            <surname>Jonas Mueller</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Manmatha</surname>
            ,
            <given-names>Mu</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>and Alexander</given-names>
          </string-name>
          <string-name>
            <surname>Smola</surname>
          </string-name>
          .
          <year>2020</year>
          . ResNeSt:
          <string-name>
            <surname>Split-Attention</surname>
            <given-names>Networks.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>arXiv:cs</article-title>
          .CV/
          <year>2004</year>
          .08955
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Jiafu</surname>
            <given-names>Zhong</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wei</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huisi Wu</surname>
            , Zhenkun Wen, and
            <given-names>Jing</given-names>
          </string-name>
          <string-name>
            <surname>Qin</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>PolypSeg: An Eficient Context-Aware Network for Polyp Segmentation from Colonoscopy Videos</article-title>
          .
          <source>In International Conference on Medical Image Computing and Computer-Assisted Intervention</source>
          . Springer,
          <fpage>285</fpage>
          -
          <lpage>294</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>