<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Augmentation Strategy with Lightweight Network for Polyp Segmentation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Raman Ghimire</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sahadev Poudel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sang-Woong Lee</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of IT Convergence Engineering, Gachon University</institution>
          ,
          <addr-line>Seongnam 13120</addr-line>
          ,
          <country country="KR">South Korea</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Software, Gachon University</institution>
          ,
          <addr-line>Seongnam 13120</addr-line>
          ,
          <country country="KR">South Korea</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Automatic segmentation of medical images is a dificult task in computer vision due to the various backgrounds, shapes, sizes, and colors of polyps or tumors. Despite the success of deep learning (DL)-based encoder-decoder architectures in medical image segmentation, it is not always suitable to implement in real-time clinical settings due to its high computation power and less speed. In this EndoCV2021 challenge, we focus on a light-weight deep learning-based algorithm for the polyp segmentation task. The network applies a low memory trafic CNN, i.e., HarDNet68, as a backbone and a decoder. The decoder block is based on a cascaded partial decoder famous for fast and accurate object detection. Further, to circumvent the issue of a small number of images while training, we propose a data augmentation strategy to increase the model's generalization by performing augmentation on the fly. Extensive experiments on the test set demonstrate that the proposed method produces outstanding segmentation accuracy.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;polyp segmentation</kwd>
        <kwd>light-weight network</kwd>
        <kwd>augmentation strategy</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The accurate segmentation of medical images is a deciding step in the diagnosis and treatment
of several diseases under clinical settings. The automatic segmentation of diseases can assist
doctors or medical professionals in predicting the size of a polyp or lesion and enables continuous
monitoring, planning, and follow-up studies resulting in treatment without delay. However,
background artifacts, noises, variations in shape and size of the polyps, and blurry boundaries
are some of the main factors contributing to more complications for accurate segmentation.</p>
      <p>
        In recent years, owing to the rapid progress in deep learning-based techniques such as
convolutional neural networks (CNNs), it is now possible to segment medical images without
human intervention. The robust, non-linear feature extraction capabilities of CNNs make it
adaptable in other domains such as medical image classification [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ], detection [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ], image
retrieval [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In particular, methods such as fully convolutional neural network [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and
encoderdecoder based architectures such as U-Net [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and SegNet [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] have been widely applied for
image segmentation tasks. These networks consist of a contraction path that captures the context
in the image, and the symmetric expanding path consists of single or multiple upsampling
techniques to enable a precise localization [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">8, 9, 10</xref>
        ]. Further, skip-connection techniques have
been efective in redeeming fine-grained details, enhancing the network’s performance even on
a complex dataset.
      </p>
      <p>
        Inspired by these methods, many approaches have been presented to solve segmentation
problems in a wide range of domains. However, the complex architectures, limiting resources,
and the low frame per second (FPS) limit the practical implementation of U-Net variants in the
clinical domain. Therefore, reducing model size by enhancing both energy and computation
eficiency carries great importance. Usually, reduced model size indicates fewer
FLOPs(floatingpoint operation per second) and lesser DRAM(dynamic random-access memory) trafic for
reading and writing feature maps and network parameters. State-of-the-art networks like
Residual Networks(ResNet) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and Densely Connected Networks(DenseNet) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] have steered
the research paradigm towards a compressed model with high parameter eficiency while
maintaining high accuracy. When small training sets are available, the traditional deep learning
model usually overfits; this lack of available data has been a significant bottleneck in the
research field. Not only that, even when enough data is available, there is a high computational
cost involved. Therefore, we leverage a lightweight network for accurate polyp segmentation,
which provides good segmentation accuracy and speed in comparison to prior methods [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
Besides decreasing the number of network parameters and the involved computational cost,
the presented method also preserves the segmentation accuracy. Further, we propose an
augmentation strategy for the polyp segmentation, which helps generalize the model in a
complex environment. It encourages the model to learn the semantic features in diferent
variations and scales.
      </p>
      <p>
        This study’s significant contributions can be summarized as follows: First, we leverage the
high-speed HarDNet-MSEG model for accurate polyp segmentation. Second, we compares it
with other existing architectures with EficientNetB0 backbone [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] model. Third, we propose
an augmentation strategy for the polyp segmentation so that the model can be generalized in a
complex environment. Fourth, we evaluated the proposed methodology in diferent architectures,
and the experimental results show the eficiency of our method. Overall, we show that the
lightweight network with an improved augmentation strategy can be used in the real-time
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>
        Conventional architectures [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ] have achieved high accuracy over small model size
counterparts but have low inference speed. HardNet [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], considering the influence of memory
trafic on model design, achieves an increase in inference speed by reducing the shortcuts, and
similarly to make up for the loss of accuracy, increases its channel’s width for the key layer. 1x1
convolution is used to increase the computational density. With this, not only is the inference
time reduced compared with DenseNet [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and ResNet [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], the model also achieves higher
accuracy on ImageNet [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>
        The backbone of this model follows the Cascaded partial decoder [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. In the cascaded
partial decoder, the shallow features are discarded as the deeper layers’ can represent the
shallow information’s spatial details comparably well. The addition of skip connections and
appropriate convolution also helps in aggregating the feature maps at diferent scales. Fig. 3(a)
shows a Receptive Field Block [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. In RFB, varying convolutional and dilated convolutional
layers generate features with diverse receptive fields. RFB block is used following the [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] to
enlarge receptive fields in feature maps of diferent resolutions. As shown in Fig. 3(b), in dense
aggregation, we upsample the lower scale features and do element-wise multiplication with
another feature of the corresponding scale.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments</title>
      <p>3.0.1. Metrics
The most commonly used metrics for the medical image segmentation are the Dice coeficient
and IOU- which can be defined as follows:
    =</p>
      <p>2 *  
2 *   +   +  
(1)
Algorithm 1 Detailed augmentation strategy for the training process
1: p indicates the probability for the each augmentation performed on the image.
2: For each training image and corresponding mask:
• Crop the image size into 352 x 352 pixels.
• With probability of p=0.5, perform random rotation [0,90]
• With probability of p=0.5, perform horizontal flip.
• With probability of p=0.5, perform vertical flip.
• With p=0.3, apply one of:
– random IAAAAdditive gaussian noise
– gaussian noise
• With p=0.3, apply one of:
– shiftScaleRotate With scale limit of 0.2 and rotate limit of 45.
– random brightness shift within the range of -10 to +10 percent.</p>
      <p>– random contrast shift within the range of -10 to +10 percent.
• With p=0.3, apply one of:
– motion blur.</p>
      <p>– median blur with blur limit of 2.
• With p=0.3, apply one of:
– mask dropout in the RGB image.</p>
      <p>– gaussian noise
• With probability of p=0.3, perform color jitter of brightness (0.2), contrast (0.2),
saturation (0.2) and hue (0.2).
3: Feed the transformed image into the network in each epoch.</p>
      <p>= (2)</p>
      <p>+   +  
where, TP represents true positive, FP represents false positive, and FN represents false negative.
Both the Dice coeficient and IoU calculate the similarity between the predicted mask and the
ground truth mask shown in Eqs. (1) and (2), respectively.</p>
      <sec id="sec-3-1">
        <title>3.1. Implementation Details</title>
        <p>
          We divided the whole EndoCV2021 dataset [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] into training and validation set with a ratio of
80:20 percent. Out of 1452 image, 1162 images are used for the training set and the remaining
for the validation set from the challenge. All the images are resized to 352 × 352 and performed
heavy data augmentation shown in Algorithm 1. We implement our model in Pytorch and
conduct our experiments on GeForce RTX 2080 Ti. We use Adam optimizer with a learning rate
of 0.00001 for all the experiments.
        </p>
        <sec id="sec-3-1-1">
          <title>3.1.1. Baselines</title>
          <p>
            We perform experiments on several state-of-the-architectures. We employ U-Net [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ], U-Net++
[
            <xref ref-type="bibr" rid="ref21">21</xref>
            ], ResUNet [
            <xref ref-type="bibr" rid="ref22">22</xref>
            ] and ResUNet++ [
            <xref ref-type="bibr" rid="ref23">23</xref>
            ] with the EficientNet-B0 [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ] as a backbone to design
the eficient and light-weight model. Further, we use rotation, flipping, scaling for normal
augmentation and also perform heavy augmentation stated in Algorithm 1 and compares the
performance gain in each architectures. With heavy augmentation, the model gets a diferent
transformed image in each epoch and eventually helps in generalization.
          </p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. Experimental Results</title>
          <p>Figure 3 displays the qualitative result for the polyp segmentation across several architectures
on multi-institutional challenging images. It can be observed that the Hard-Net can segment
polyp accurately, almost matching the ground truth of the original images, whereas other
architectures like U-Net and ResUnet could not segment the polyps with higher confidence and
misses the polyp in some images. Further, U-Net++ over segments the polyp part covering the
unwanted parts (see in row 4).</p>
          <p>From Table 1, it is observable that the HarDNet achieves higher segmentation accuracy
in terms of both the Jaccard index and dice coeficient. The implementation of U-Net with
eficientNet backbone obtained the least Jaccard index of 0.8226 after heavy augmentation.
Similarly, ResUnet and U-Net++ achieved a Jaccard index of 0.8247 and 0.8453 under the same
settings. The ResUnet++ obtains the second position with a Jaccard index of 0.8392 and a
Dice coeficient score of 0.8753. Further, Hard-Net also has a higher frame per second (FPS)
than other existing SOTA methods. It obtained the highest speed of 88 FPS while the
UNet, ResUNet, U-Net++, and ResUNet++ have 65,48,44,42 FPS, respectively. Moreover, the
augmentation strategy explained in Algorithm 1 also helps increase performance by at least 2
percent in each index. Usually, we started with simple augmentation techniques like rotation,
lfip (Horizontally and vertically) as a baseline augmentation, and then added other methods
like Gaussian noise, blurring, masking, color jittering, etc. We carefully design the probabilities
ratio during transformation because a substantial augmentation could lead the model not to
learn anything from the input image. Therefore, we keep higher probabilities for the rotation
and flipping and comparatively smaller probabilities to other techniques. According to our
experiments (fourth column), a strong augmentation could not generalize the model well;
instead obtained a similar accuracy as the baseline augmentations.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Discussion</title>
        <p>
          In clinical settings, an expeditious deep learning method is much needed. Usually, it is found
that there is always a tradeof between the speed and the accuracy while applying a deep
learning-based algorithm. However, in this case, Hard-Net surpassed other prior methods in
terms of speed and accuracy, which is a good sign for clinical practice. From the extensive
experimental results from Figure 3 and Table 1, we can observe that the Hard− Net shows
improvement over all other existing methods in terms of Jaccard index, dice coeficient, and FPS.
To decrease the network complexities, we utilized the EficientNetB0 as an encoder backbone
for all the architectures and tried to minimize the complication as far as possible. However,
Hard-Net surpassed these architectures and achieved an unassailable lead over them in every
index. Further, our augmentation strategies can achieve a significant performance gain, which
helps the model generalize on the challenging validation set. The possible limitation of this
study is setting the manual probabilities for the diferent augmentation techniques. Moreover,
the current input resolution is 352 × 352 for the network, which can be increased more without
reducing the speed. The results were also uploaded for round I and round II of the competition
where we achieved third rank on round II based on the generalisability scores provided by the
organisers similar to detection generalisation defined in [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>This paper presented diferent methods for the accurate segmentation of polyps in GI tract
diseases. We employ the EficientNet model as an encoder backbone for all existing methods
and compare it with the recently published HarDNet model. We conclude that the HarDNet
took the unassailable lead over other methods in terms of segmentation accuracy and speed.
Further, the augmentation strategies applied in the model increase the performance by 2 percent.
In the future, we plan to continue researching more eficient tasks.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Acknowledgments</title>
      <p>This work was supported by the GRRC program of Gyeonggi province. [GRRC-Gachon2020
(B02), AI-based Medical Information Analysis].</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Saha</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Chakraborty, Her2net: A deep framework for semantic segmentation and classification of cell membranes and nuclei in breast cancer evaluation</article-title>
          ,
          <source>IEEE Transactions on Image Processing</source>
          <volume>27</volume>
          (
          <year>2018</year>
          )
          <fpage>2189</fpage>
          -
          <lpage>2200</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nardelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jimenez-Carretero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bermejo-Pelaez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. R.</given-names>
            <surname>Washko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. N.</given-names>
            <surname>Rahaghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Ledesma-Carbayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. S. J.</given-names>
            <surname>Estépar</surname>
          </string-name>
          ,
          <article-title>Pulmonary artery-vein classification in ct images using deep learning</article-title>
          ,
          <source>IEEE transactions on medical imaging 37</source>
          (
          <year>2018</year>
          )
          <fpage>2428</fpage>
          -
          <lpage>2440</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Poudel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-W.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Colorectal disease classification using eficiently scaled dilation in convolutional neural network, IEEE Access PP (</article-title>
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>1</lpage>
          . doi:
          <volume>10</volume>
          .1109/ ACCESS.
          <year>2020</year>
          .
          <volume>2996770</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.-C.</given-names>
            <surname>Shin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. R.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Nogues</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mollura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Summers</surname>
          </string-name>
          ,
          <article-title>Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning</article-title>
          ,
          <source>IEEE transactions on medical imaging 35</source>
          (
          <year>2016</year>
          )
          <fpage>1285</fpage>
          -
          <lpage>1298</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , M. Liu,
          <string-name>
            <given-names>D.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <article-title>Detecting anatomical landmarks from limited medical imaging data using two-stage task-oriented deep neural networks</article-title>
          ,
          <source>IEEE Transactions on Image Processing</source>
          <volume>26</volume>
          (
          <year>2017</year>
          )
          <fpage>4753</fpage>
          -
          <lpage>4764</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Bawany</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Kuriyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Ramchandran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. C.</given-names>
            <surname>Wykof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <article-title>A novel deep learning pipeline for retinal vessel detection in fluorescein angiography</article-title>
          ,
          <source>IEEE Transactions on Image Processing</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <article-title>Visual saliency guided complex image retrieval</article-title>
          ,
          <source>Pattern Recognition Letters</source>
          <volume>130</volume>
          (
          <year>2020</year>
          )
          <fpage>64</fpage>
          -
          <lpage>72</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Long</surname>
          </string-name>
          , E. Shelhamer, T. Darrell,
          <article-title>Fully convolutional networks for semantic segmentation</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>3431</fpage>
          -
          <lpage>3440</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>O.</given-names>
            <surname>Ronneberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brox</surname>
          </string-name>
          , U-net:
          <article-title>Convolutional networks for biomedical image segmentation</article-title>
          ,
          <source>in: International Conference on Medical image computing and computerassisted intervention</source>
          , Springer,
          <year>2015</year>
          , pp.
          <fpage>234</fpage>
          -
          <lpage>241</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>V.</given-names>
            <surname>Badrinarayanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kendall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cipolla</surname>
          </string-name>
          ,
          <article-title>Segnet: A deep convolutional encoder-decoder architecture for image segmentation</article-title>
          ,
          <source>IEEE transactions on pattern analysis and machine intelligence</source>
          <volume>39</volume>
          (
          <year>2017</year>
          )
          <fpage>2481</fpage>
          -
          <lpage>2495</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Deep residual learning for image recognition</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>770</fpage>
          -
          <lpage>778</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Van Der</given-names>
            <surname>Maaten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. Q.</given-names>
            <surname>Weinberger</surname>
          </string-name>
          ,
          <article-title>Densely connected convolutional networks</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>4700</fpage>
          -
          <lpage>4708</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>C.-H. Huang</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-Y. Wu</surname>
            ,
            <given-names>Y.-L.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Hardnet-mseg: A simple encoder-decoder polyp segmentation neural network that achieves over 0.9 mean dice and 86 fps</article-title>
          , arXiv preprint arXiv:
          <volume>2101</volume>
          .07172 (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          , Eficientnet:
          <article-title>Rethinking model scaling for convolutional neural networks</article-title>
          , arXiv preprint arXiv:
          <year>1905</year>
          .
          <volume>11946</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>F.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <article-title>Residual attention network for image classification</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>3156</fpage>
          -
          <lpage>3164</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Howard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kalenichenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Weyand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Andreetto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Adam</surname>
          </string-name>
          , Mobilenets:
          <article-title>Eficient convolutional neural networks for mobile vision applications</article-title>
          ,
          <source>arXiv preprint arXiv:1704.04861</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.-J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fei-Fei</surname>
          </string-name>
          ,
          <article-title>Imagenet: A large-scale hierarchical image database, in: 2009 IEEE conference on computer vision and pattern recognition</article-title>
          , Ieee,
          <year>2009</year>
          , pp.
          <fpage>248</fpage>
          -
          <lpage>255</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <article-title>Cascaded partial decoder for fast and accurate salient object detection</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>3907</fpage>
          -
          <lpage>3916</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Huang</surname>
          </string-name>
          , et al.,
          <article-title>Receptive field block net for accurate and fast object detection</article-title>
          ,
          <source>in: Proceedings of the European Conference on Computer Vision (ECCV)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>385</fpage>
          -
          <lpage>400</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ghatwary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Realdon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cannizzaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Riegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Halvorsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Daul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rittscher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. E.</given-names>
            <surname>Salem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lamarque</surname>
          </string-name>
          , T. de Lange,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>East</surname>
          </string-name>
          ,
          <article-title>Polypgen: A multi-center polyp detection and segmentation dataset for generalisability assessment</article-title>
          ,
          <source>arXiv</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. R. Siddiquee</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Tajbakhsh</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Liang</surname>
          </string-name>
          , Unet++
          <article-title>: A nested u-net architecture for medical image segmentation, in: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support</article-title>
          , Springer,
          <year>2018</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>F. I.</given-names>
            <surname>Diakogiannis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Waldner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Caccetta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>Resunet-a: a deep learning framework for semantic segmentation of remotely sensed data</article-title>
          ,
          <source>ISPRS Journal of Photogrammetry and Remote Sensing</source>
          <volume>162</volume>
          (
          <year>2020</year>
          )
          <fpage>94</fpage>
          -
          <lpage>114</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. H.</given-names>
            <surname>Smedsrud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Riegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Johansen</surname>
          </string-name>
          , T. De Lange,
          <string-name>
            <given-names>P.</given-names>
            <surname>Halvorsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. D.</given-names>
            <surname>Johansen</surname>
          </string-name>
          , Resunet++:
          <article-title>An advanced architecture for medical image segmentation</article-title>
          ,
          <source>in: 2019 IEEE International Symposium on Multimedia (ISM)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>225</fpage>
          -
          <lpage>2255</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Braden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bailey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yang</surname>
          </string-name>
          , G. Cheng, P. Zhang,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kayser</surname>
          </string-name>
          , R. D.
          <string-name>
            <surname>Soberanis-Mukul</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Albarqouni</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Watanabe</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Oksuz</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Ning</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Khan</surname>
            ,
            <given-names>X. W.</given-names>
          </string-name>
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Realdon</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Loshchenov</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          <string-name>
            <surname>Schnabel</surname>
            ,
            <given-names>J. E.</given-names>
          </string-name>
          <string-name>
            <surname>East</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Wagnieres</surname>
            ,
            <given-names>V. B.</given-names>
          </string-name>
          <string-name>
            <surname>Loschenov</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Grisan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Daul</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Rittscher</surname>
          </string-name>
          ,
          <article-title>An objective comparison of detection and segmentation algorithms for artefacts in clinical endoscopy</article-title>
          ,
          <source>Scientific Reports</source>
          <volume>10</volume>
          (
          <year>2020</year>
          )
          <article-title>2748</article-title>
          . doi:
          <volume>10</volume>
          .1038/s41598-020-59413-5.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>