<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>XP-Net: An Attention Segmentation Network by Dual Teacher Hierarchical Knowledge distillation for Polyp Generalization</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ragu B</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antony Raj</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rahul GS</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sneha Chand</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Preejith SP</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohanasankar Sivaprakasam</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Electrical Engineering</institution>
          ,
          <addr-line>IIT Madras, Chennai</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Healthcare Technology Innovation Centre</institution>
          ,
          <addr-line>Chennai</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>-5pt Endoscopic imaging is largely used as the diagnostic tool for Colon polyps-induced GI tract cancer. This diagnosis via image identification requires expertise that may be lacking in inexperienced physicians. Hence, using a software-aided approach to detect those anomalies may better identify the tissue abnormalities. In this paper, a novel deep learning network 'XP-Net' with Efective Pyramidal Squeeze Attention (EPSA) module using hierarchical adversarial knowledge distillation by a combination of two teacher networks is proposed. It adds 'complementary knowledge' to the student network- thus aiding in the improvement of network performance. The lightweight EPSA block enhances the current network architecture by capturing multi-scale spatial information of objects at a granular level with long-range channel dependency. The XP-Net compiled into the NVIDIA TensorRT engine gave a better real-time performance in terms of throughput. The proposed network has achieved a dice score of 0.839 and IoU of 0.805 in the validation data set, and it was able to attain an average throughput of 60 fps in mobile GPU. This proposed deep learning-based segmentation approach is expected to aid clinicians in addressing the complications involved in the identification and removal of precancerous anomalies more competently.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Polyp</kwd>
        <kwd>Generalization</kwd>
        <kwd>Attention block</kwd>
        <kwd>Knowledge distillation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Colorectal polyps are one of the early indicators of lower
Gastro-Intestinal (GI) tract cancer. These polyps are extra
growth lumps of tissues, having no particular function
in the bodily processes [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Although these growth
tissues are often benign, they can become cancerous. The
early detection and removal of the polyps in the colon
region may prevent these tissues from becoming
cancerous. Colonoscopy is a general diagnostic procedure
widely used to investigate the colon region for any type
of malformation and disease [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Generally, a trained
physician visually inspects the colon region for polyps
and removes them using a minimally invasive endoscopic
surgery. Research on the visual inspection of the colon Figure 1: (A) shows a generic hierarchical knowledge
distillaregion shows that small size adenomas (benign tumor), tion using a single teacher and (B) is our proposed
methodolless than 5mm diameter, have a miss rate of 27% and for ogy using dual teacher to derive the student network
adenomas greater than 10mm have a miss rate of 6%. It
has been reported that the quality of bowel preparation
and the experience of colonoscopists are major contribu- tection is a highly researched area that has been found
tory factors to missed polyps during a colonoscopy [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. efective to mitigate the miss rates and assist in the faster
A quick alternative, computer-vision based polyps de- diagnosis for colonoscopists [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The addition of deep
learning techniques proves to be much more efective,
4th International Workshop and Challenge on Computer Vision in since a network like U-net has shown promising results
Endoscopy (EndoCV2022) in conjunction with the 19th IEEE Inter- in biomedical imaging and widely accepted as the
statenational Symposium on Biomedical Imaging ISBI2022, March of-the-art image to image translation network [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
28th, 2022, IC Royal Bengal, Kolkata, India In this paper, the U-net was chosen as the baseline
$ ragu.b@htic.iitm.ac.in (R. B); antony.raj@htic.iitm.ac.in (A. Raj) model because of its ability to outperform other
segmen© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License tation networks with extensive data augmentation
reCPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org)
gardless of a limited dataset, as reported Ronneberger et a granular level with long-range channel dependency at
al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. A plug-and-play EPSA module was implemented the initial stage of the network.
as proposed by EPSANet [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] with U-Net for enhancing Our first teacher network comprises of U-Net with
the multiscale spatial information, which results in the EPSA module. Similarly, we trained the second teacher
detection of objects over diferent scale factors [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Since network, a baseline U-Net with EPSA block using pix2pix
the baseline U-Net with the EPSA module was found GAN [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], which has a promising result for an image to
to be computationally heavy for real-time performance, image translation that learns a loss adapted to the input
model compression techniques through knowledge dis- data and task. The proposed student network consists
tillation are implemented [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Among the other model of separable filters that hold the same U-Net
architeccompression approaches, knowledge distillation shows ture with the EPSA module, which results in the reduced
great superiority, which is to transfer knowledge of a number of learnable parameters from the defined teacher
large teacher model to a small student model [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In network. The hierarchical knowledge distillation
techour proposed student network, we implemented sepa- nique used in our method is proposed in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], where a
rable filters resulting in model reduction by 78% of the single teacher is used for knowledge distillation.
Howteacher network. We implemented a hierarchical knowl- ever the network that we have developed utilizes the
edge distillation technique which was proposed in the dual teachers via multi-step learning as suggested in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
paper HAD-Net [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] where a single teacher network is to map the in-between features to train the respective
used to distill the knowledge. Whereas, in our proposed student network.
methodology, the Dual teacher transfers the complemen- The input and target of the teacher and student
nettary knowledge to the student network. All the mod- work is denoted by x and y. The output segmentation
els where trained over EndoCV2022 Challenge dataset of two teachers and student is denoted by T(1,2) yˆ and
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ][
        <xref ref-type="bibr" rid="ref10">10</xref>
        ][
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Syˆ respectively. The multi-scale feature map of teacher
      </p>
      <p>
        In summary, the contributions in this paper are as and student is denoted by T(1,2)y and Syˆ. In
follows: hierarchical knowledge distillation, the student loss is
denoted by Ls which consists of weighted combination
of two terms, (a) the sum of dice [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and tversky loss
[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] with the student generated segmentation (Syˆ) and
ground truth (y), (b) mean square error adversarial loss.
      </p>
      <p>
        The overall student loss is given in equation 2.
• A hierarchical dual teacher knowledge distillation
network to transfer the complementary
knowledge of both networks to a student.
• A student network with a lower computational
cost for real-time performance without
significantly reducing accuracy.
• Experiments: By evaluating our model’s
generality in the external Kvasir-Seg dataset [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ],
The dice and IoU scores of 0.782 and 0.769 are
achieved, respectively.
2. XP-Net
      </p>
      <sec id="sec-1-1">
        <title>2.1. Methodology</title>
        <p>
          In our methodology, a student network is derived from
two teacher networks through a hierarchical
knowledge distillation process. The two teacher networks that
are highly computational, transfer their complementary
knowledge to a lightweight student network. The
baseline U-Net architecture has the ability to capture features
at multiple scales. To enhance this visual perception of
the U-Net network, we implemented an Efective
Pyramidal Squeeze Attention (EPSA) block at the first encoder
of the U-Net.This attention mechanism boosts the
allocation of the most informative feature expressions while
suppressing the less useful ones, allowing the model to
focus on clinically crucial areas [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The lightweight
EPSA block enhanced the current architecture’s ability
by capturing multi-scale spatial information of objects at
 = [Dice Loss + Tversky Loss]
 = [ˆ; ] +
        </p>
        <p>*  [(, ˆ, ˆ), 1]</p>
        <sec id="sec-1-1-1">
          <title>The hierarchical discriminator (HD) is trained using</title>
          <p>
            LS-GAN loss denoted as L. The L is made up
of two mean square error term. one term is between
the HD output after being passed a "fake" datasample
from the teacher, and a tensor of all zeros [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ]. The
other term is a mean square error loss between the HD
output after being passed a "real" data sample from either
teacher 1 or teacher 2 and a tensor of all ones. The
overall discriminator loss is denoted in equation 3.
 = [(, ˆ, ˆ, 0]
          </p>
          <p>+  [(, (1,2)ˆ, (1,2)ˆ, 1]</p>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>2.2. Network Architecture</title>
        <p>2.2.1. Teacher Network
CABR32-P-CBR64-P-CBR128-P-CBR256-PCBR512-UPCONV256-CBR256-UPCONV128(1)
(2)
(3)
(a) Teacher and Student Discriminator Network
(b) Teacher Network Blocks
(c) Student Network Blocks</p>
        <sec id="sec-1-2-1">
          <title>CEK denotes a (1,1) convolution with k output feature map with a Sigmoid activation function.</title>
        </sec>
        <sec id="sec-1-2-2">
          <title>UPCONVK represents a layer of transpose convolution with a kernel size (2,2), stride (2,2) with k output number of feature maps.</title>
        </sec>
        <sec id="sec-1-2-3">
          <title>Pool represents a pooling layer with a kernel size (2,2) and stride (2,2).</title>
          <p>2.2.2. Student Network
CAPDBR32-P-CPDBR64-P-CPDBR128-PCPDBR256-P-CPDBR512-UPCONV256CPDBR256-UPCONV128-CPDBR128-UPCONV64CPRBR64-UPCONV32-CPDBR32-CE1
• CPDBRK</p>
          <p>CPDBRK represents stack of (A) point wise
convolution of kernel size (1,5) and depth wise
convolution of kernel size (1,1) followed
by Batch norm and Relu and (B) point wise
convolution of kernel size (5,1) and depth
wise convolution of kernel size (1,1)
followed by Batch norm and Relu. All the
convolution layers consists of K number
of feature outputs.
• CAPDBRK
2.2.3. Discriminator Network
CAT-DC32-CAT-DC128-CAT-DC128-CAT-DC32CAT-DC32-ENCONV
• CAT
• DCK</p>
          <p>CAT is the concatenation of two diferent layers
either from teacher or student network.</p>
        </sec>
        <sec id="sec-1-2-4">
          <title>DCK represents a stack of convolution of kernel size (3,3), padding and stride of (1,1) with instance norm and Leaky Relu with negative slope of 0.2.</title>
        </sec>
        <sec id="sec-1-2-5">
          <title>The hierarchical discriminator consists of five discrim</title>
          <p>inator blocks (DC) and an End Convolution (ENCONV).
In our proposed model, the feature map from encoder
1, encoder 3, decoder 1, decoder 3 from the teacher or
student network are used for hierarchical knowledge
distillation. The full network architecture is described in
Fig.2.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Dataset and Implementation</title>
      <sec id="sec-2-1">
        <title>3.1. Dataset</title>
        <sec id="sec-2-1-1">
          <title>Automatic polyp detection and classification requires</title>
          <p>
            the availability of big datasets of polyp images or videos
along with high-quality, manual annotations provided
by experts. These annotations provide the ground truth
necessary to train the supervised deep learning models.
EndoCV2022 challenge provided us with series of
sequence dataset of 2631 images with their corresponding
ground truth masks [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ][
            <xref ref-type="bibr" rid="ref10">10</xref>
            ][
            <xref ref-type="bibr" rid="ref11">11</xref>
            ]. In that, we utilized more
than 95% of the data for training and 5% of the data for
testing. External dataset such as Kvasir-Seg was utilized
for testing the model generality.
3.1.1. Dataset augmentation
All the models were trained with an input image size of
512x512. The data augmentation such as random rotate,
horizontal flip, vertical flip, perspective transform was
implemented. Usually the endoscopic images are
subjected to diferent light sources that might have diferent
intensities of brightness, contrast and hue, so images are
augmented in such a way to replicate those scenarios.
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>3.2. Implementation</title>
        <p>Both the teacher network is trained using Adam
optimizer with initial learning rate of 3e− 4 with step
learning rate scheduler of gamma 0.1 and step size of 30. The
networks were trained for 450 epoch with batch size of
8. The student network was trained using Adam
optimizer with  1 0.5 and  2 0.999 with a initial learning
rate of 1e− 4 with step learning rate scheduler of gamma
0.1 and step size of 30. After multiple experiments of
initializing weights with uniform, xavier-uniform and
kaiming-uniform given in pytorch weight initialization, it
showed that kaiming uniform weight initialization have
helped for better convergence of model. We also
implemented our model in Nvidia TensorRT inference library
for efective realtime model throughput. All the models
were trained using Nvidia RTX 3090 GPU.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Results and Discussion</title>
      <p>The networks were evaluated and the computed metrics
are reported in Table.1. In the validation data of EndoCV
dataset, the Teacher 1 model was able to achieve 0.893 and
0.889 for Dice score and IoU score, respectively. Similarly,
the teacher 2 model was able to achieve 0.871 and 0.884
for the same metrics. The student network has achieved
a commendable dice and IoU score of 0.839 and 0.805
even with the reduced number of learnable parameters.
The trade-of here is the larger sized teacher network for
a minimal loss in the accuracy of the light weight student
network. Similarly, these metrics were calculated for
Kvasir-Seg dataset and is reported in the Table 1.</p>
      <p>Results have shown that the teacher 2 perform
better for region with higher amount of specular reflection
than teacher 1 for those regions. The student network
thus obtains the complimentary knowledge from the two
teacher networks. With reference to the ground truth,
it is observed that the student network had proper
segmentation even though one of the teachers had missed
areas in its segmentation masks as shown in Fig 3. These
results show that multiple teacher knowledge helps to
generalize better segmentation.</p>
      <p>As a part of benchmarking the network in terms of
inference time, the model was converted into TensorRT
engine for faster throughput. The model was able to
attain an average throughput of 60 fps on GeForce RTX
3070 mobile GPU and 120 fps in Nvidia RTX 3090 GPU.
From the results, we believe that constructing multiple
teacher models which focuses on various aspects of the
input data can distill a superior student network.</p>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusion</title>
      <p>The proposed network is light weight and does faster
computation when compared with traditional networks
that are used for segmentation. Since this uses dual
teachers for knowledge distillation, by increasing the number
of teacher networks, there is room for further
improvement in performance. Moreover, the sample size of data
also plays a crucial role in the accuracy of the network.
Further studies can be done to design a much more
intelligent network for polyps and other varieties of early
cancer tissues.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Levin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Lieberman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>McFarland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Andrews</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Brooks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Giardiello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Glick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , et al.,
          <article-title>Screening and surveillance for the early detection of colorectal cancer and adenomatous polyps, 2008: a joint guideline from the american cancer society, the us multi-society task force on colorectal cancer, and the american college of radiology</article-title>
          ,
          <source>Gastroenterology</source>
          <volume>134</volume>
          (
          <year>2008</year>
          )
          <fpage>1570</fpage>
          -
          <lpage>1595</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D. K.</given-names>
            <surname>Rex</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Petrini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. H.</given-names>
            <surname>Baron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Deal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hofman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Jacobson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mergener</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. T.</given-names>
            <surname>Petersen</surname>
          </string-name>
          , et al.,
          <article-title>Quality indicators for colonoscopy</article-title>
          ,
          <source>Gastrointestinal endoscopy 63</source>
          (
          <year>2006</year>
          )
          <fpage>S16</fpage>
          -
          <lpage>S28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S. N.</given-names>
            <surname>Bonnington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Rutter</surname>
          </string-name>
          ,
          <article-title>Surveillance of colonic polyps: are we getting it right?</article-title>
          ,
          <source>World journal of gastroenterology 22</source>
          (
          <year>2016</year>
          )
          <year>1925</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Mintz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brodie</surname>
          </string-name>
          , Introduction to artificial intelligence in medicine,
          <source>Minimally Invasive Therapy &amp; Allied Technologies</source>
          <volume>28</volume>
          (
          <year>2019</year>
          )
          <fpage>73</fpage>
          -
          <lpage>81</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>O.</given-names>
            <surname>Ronneberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brox</surname>
          </string-name>
          , U-net:
          <article-title>Convolutional networks for biomedical image segmentation</article-title>
          , in: International Conference on
          <article-title>Medical image computing and computer-assisted intervention</article-title>
          , Springer,
          <year>2015</year>
          , pp.
          <fpage>234</fpage>
          -
          <lpage>241</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <surname>Epsanet:</surname>
          </string-name>
          <article-title>An eficient pyramid squeeze attention block on convolutional neural network</article-title>
          ,
          <source>arXiv preprint arXiv:2105.14447</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Vinyals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          , et al.,
          <article-title>Distilling the knowledge in a neural network</article-title>
          ,
          <source>arXiv preprint arXiv:1503.02531 2</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Vadacchino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. M.</given-names>
            <surname>Sepahvand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Nichyporuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Clark</surname>
          </string-name>
          , T. Arbel,
          <article-title>Had-net: A hierarchical adversarial knowledge distillation network for improved enhanced tumour segmentation without post-contrast images, in: Medical Imaging with Deep Learning</article-title>
          , PMLR,
          <year>2021</year>
          , pp.
          <fpage>787</fpage>
          -
          <lpage>801</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dmitrieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ghatwary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Polat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Temizel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krenzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hekalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. B.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Matuszewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gridach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Voiculescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Yoganand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chavan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Raj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. T.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Q.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Huynh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Boutry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rezvy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. H.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Subramanian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Balasubramanian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. W.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Daul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Realdon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cannizzaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lamarque</surname>
          </string-name>
          , T. TranNguyen,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bailey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Braden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>East</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rittscher</surname>
          </string-name>
          ,
          <article-title>Deep learning for detection and segmentation of artefact and disease instances in gastrointestinal endoscopy</article-title>
          ,
          <source>Medical Image Analysis</source>
          <volume>70</volume>
          (
          <year>2021</year>
          )
          <article-title>102002</article-title>
          . URL: https://doi.org/10.10162/j.media.
          <year>2021</year>
          .
          <volume>102002</volume>
          . doi:
          <volume>10</volume>
          .1016/j.media.
          <year>2021</year>
          .
          <volume>102002</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ghatwary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Realdon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cannizzaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. E.</given-names>
            <surname>Salem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lamarque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Daul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. V.</given-names>
            <surname>Anonsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Riegler</surname>
          </string-name>
          , et al.,
          <article-title>Polypgen: A multi-center polyp detection and segmentation dataset for generalisability assessment</article-title>
          ,
          <source>arXiv preprint arXiv:2106.04463</source>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .48550/ arXiv.2106.04463.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ghatwary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Isik-Polat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Polat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Galdran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.-Á. G.</given-names>
            <surname>Ballester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Thambawita</surname>
          </string-name>
          , et al.,
          <article-title>Assessing generalisability of deep learning-based polyp detection and segmentation methods through a computer vision challenge</article-title>
          ,
          <source>arXiv preprint arXiv:2202.12031</source>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .48550/arXiv.2202.12031.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Borgli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Thambawita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. H.</given-names>
            <surname>Smedsrud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hicks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. L.</given-names>
            <surname>Eskeland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. R.</given-names>
            <surname>Randel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Pogorelov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lux</surname>
          </string-name>
          , D. T. D.
          <string-name>
            <surname>Nguyen</surname>
          </string-name>
          , et al.,
          <article-title>Hyperkvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy</article-title>
          ,
          <source>Scientific data 7</source>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Isola</surname>
          </string-name>
          , J.-Y. Zhu,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Efros</surname>
          </string-name>
          ,
          <article-title>Imageto-image translation with conditional adversarial networks</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1125</fpage>
          -
          <lpage>1134</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Z.-Q.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ge</surname>
          </string-name>
          , W. Tian,
          <article-title>Orderly dual-teacher knowledge distillation for lightweight human pose estimation</article-title>
          ,
          <source>arXiv preprint arXiv:2104.10414</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Dice loss for data-imbalanced nlp tasks</article-title>
          , arXiv preprint arXiv:
          <year>1911</year>
          .
          <volume>02855</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>N.</given-names>
            <surname>Nasalwai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. S.</given-names>
            <surname>Punn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Sonbhadra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <article-title>Addressing the class imbalance problem in medical image segmentation via accelerated tversky loss function</article-title>
          ,
          <source>in: Pacific-Asia Conference on Knowledge Discovery and Data Mining</source>
          , Springer,
          <year>2021</year>
          , pp.
          <fpage>390</fpage>
          -
          <lpage>402</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>X.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Y.</given-names>
            <surname>Lau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          , S. Paul Smolley,
          <article-title>Least squares generative adversarial networks</article-title>
          ,
          <source>in: Proceedings of the IEEE international conference on computer vision</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>2794</fpage>
          -
          <lpage>2802</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>