<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>J. Seo); yeody@etri.re.kr (D. Yeo)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Deep Learning Model Generalization with Ensemble in Endoscopic Images</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ayoung Hong</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giwan Lee</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hyunseok Lee</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jihyun Seo</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Doyeob Yeo</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Daegu-Gyeongbuk Medical Innovation Foundation</institution>
          ,
          <addr-line>Daegu</addr-line>
          ,
          <country country="KR">South Korea</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of AI Convergence, Chonnam National University</institution>
          ,
          <addr-line>Gwangju</addr-line>
          ,
          <country country="KR">South Korea</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Computer Convergence Software, Korea University</institution>
          ,
          <addr-line>Sejong</addr-line>
          ,
          <country country="KR">South Korea</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Electronics and Telecommunications Research Institute</institution>
          ,
          <addr-line>Daejeon</addr-line>
          ,
          <country country="KR">South Korea</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Robotics Engineering Convergence, Chonnam National University</institution>
          ,
          <addr-line>Gwangju</addr-line>
          ,
          <country country="KR">South Korea</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Owing to the rapid development of deep learning technologies in recent years, autonomous diagnostic systems are widely used to detect abnormal lesions such as polyps in endoscopic images. However, the image characteristics, such as the contrast and illuminance, vary significantly depending on the center from which the data was acquired; this afects the generalization performance of the diagnostic method. In this paper, we propose an ensemble learning method based on -fold cross-validation to improve the generalization performance of polyp detection and polyp segmentation in endoscopic images. Weighted box fusion methods were used to ensemble the bounding boxes obtained from each detection model trained for data from each center. The segmentation results of the data center-specific model were averaged to generate the final ensemble mask. We used a Mask R-CNN-based model for both the detection and segmentation tasks. The proposed method achieved a score of 0.7269 on the detection task and 0.7423 ± 0.2839 on the segmentation task in Round 1 of the EndoCV2021 challenge.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;EndoCV2021</kwd>
        <kwd>polyp detection</kwd>
        <kwd>polyp segmentation</kwd>
        <kwd>ensemble</kwd>
        <kwd>-fold cross-validation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Colorectal polyps are abnormal tissue growths that create flat bumps or tiny mushroom-like
stalks on the lining of the colon or rectum. Although these abnormal cells often cause rectal
bleeding and abdominal pain due to partial bowel obstruction, most colorectal polyps are
asymptomatic [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, it is very important to examine the growth of polyps since it is
highly related to colorectal cancer.
      </p>
      <p>
        Colonoscopy is a common procedure used to examine the inside wall of the colon by using a
camera attached at its tip, and if necessary, polyps are removed by inserting the instrument
through its channel during the procedure. Doctors localize polyps and examine their size
and shape using the captured colonoscopic images; however, the polyp detection rate varies
depending on the abilities of individual clinicians [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This has led to the development of
automatic polyp detection systems using methods based on computer vision [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The recent
advancements in deep learning technologies have contributed to new polyp detection methods
using artificial intelligence.
      </p>
      <p>
        The Endoscopy Computer Vision Challenge (EndoCV) was initiated in 2019 as Endoscopic
Artefact Detection (EAD) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to realize reliable computer-assisted endoscopy by detecting
multiple artifacts such as pixel saturation, motion blur, defocus, bubbles, and debris. This
year, the EndoCV challenge [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] aims to develop a polyp detection and segmentation method
that works for endoscopic images obtained from multiple centers. In this paper, we propose
an ensemble learning method based on -fold cross-validation to improve the generalization
performance.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <sec id="sec-2-1">
        <title>2.1. Detectron2</title>
        <p>
          Detectron2 is a PyTorch [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]-based software system to provide the implementations of object
detection and segmentation networks, developed by Facebook AI Research [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. It provides
object detection algorithms, including Faster R-CNN [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and RetinaNet [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], and includes
implementations of instance segmentation with Mask R-CNN [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and panoptic segmentation with
Panoptic FPN [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. The implemented networks are provided with pre-trained weights using the
COCO dataset that facilitates transfer learning.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Mask R-CNN</title>
        <p>
          Mask R-CNN is a model proposed to perform image detection and segmentation
simultaneously [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], by extending Faster R-CNN with a Feature Pyramid Network (FPN), mask branch,
and Region of Interest Align (RoIAlign) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Faster R-CNN creates RoI from only one feature
map and performs classification and bounding box regression. However, it is dificult to collect
detailed feature data and required to create an anchor box with many scales and ratios to
detect objects of various sizes in a feature map that induces an ineficient learning process.
To overcome this limitation, FPN was introduced [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. FPN enables the detection of a large
object in a small map and a small object in a large map by creating an anchor box of the same
scale in each map. To perform segmentation, Faster R-CNN adopts RoI Pooling, whereas Mask
R-CNN fixes RoI through RoIAlign [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. In this method, pixel values are generated using bilinear
interpolation [13], which leads to a more accurate segmentation model by preventing position
distortion. In the EndoCV2021 challenge, polyp detection and segmentation must be performed
together; therefore, Mask R-CNN is used in this study.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Robust Model Selection Methods</title>
        <p>To train a deep learning classification model with high accuracy, it is important to avoid
overfitting of the model on the training dataset and to have robustness for data that the model has
never seen before. This implies that we need to choose a model with optimal hyperparameters
(a) An original training endoscopic image from
center 4 with a sub-endoscope image part.</p>
        <p>(b) The endoscopic image after removing the
sub-endoscope image part.
that have small bias and variations for general data. The Holdout evaluation is a simple model
selection method that divides the data into training and test sets. Typically, 4/5 of the available
data are used as a training set, and the remaining data are used as a test set. However, the
performance of this method relies heavily on the method of selecting the test set from the entire
dataset. This could cause the model to overfit the test set.</p>
        <p>One of the most popular model selection methods is -fold cross-validation [14]. In this
method, we randomly separate the entire data into  exclusive subsets. Then, the deep learning
classifier trains using  − 1 subsets and tests using the other subset by repeating the process 
times. The performance of the model is the average score obtained over the  times training.
This prevents the model from overfitting on the test data and helps in achieving better prediction
performance over data that it has never seen before.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed Method</title>
      <p>In this section, we describe how polyp detection and segmentation were performed in the
EndoCV2021 competition. We used ensemble inference based on -fold cross-validation to
increase the generalization performance. In addition, we used several data augmentation
techniques provided by Detectron2 and preprocessed the input images for training each mask
R-CNN model to improve both the segmentation and detection performances of each mask
R-CNN model.</p>
      <sec id="sec-3-1">
        <title>3.1. Preprocessing</title>
        <p>Since the characteristics of the image change depending on the center from where the endoscopic
image is acquired, we used data augmentation to prevent overfitting during training and to
improve the generalization performance. Data augmentation uses the same technique for the
detection and segmentation of each mask R-CNN model. We applied the augmentation techniques
provided in Detectron2, namely RandomBrightness, RandomContrast, RandomSaturation, and
RandomLighting.</p>
        <p>As shown in Figure 1, several endoscopic images in the training set included a secondary
C1
C1
C1
C1
C2
C2
C2
C2
C3
C3
C3
C3
C4
C4
C4
C4
C5
C5
C5
C5
endoscopic image in addition to the main endoscopic image. Because the ground truth
labels for the provided detection/segmentation tasks do not consider the sub-endoscope image
part, we removed the sub-endoscope image part in the training dataset to improve the
detection/segmentation performance.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Ensemble</title>
        <sec id="sec-3-2-1">
          <title>3.2.1. Training</title>
          <p>The training dataset provided in the EncoCV 2021 competition was given as data from five
diferent centers. As shown in Figure 2, we trained five mask R-CNN models for ensemble
inference based on -fold cross-validation. While training a single mask R-CNN model, the
data acquired from all five data centers were not used; instead, only the data acquired from the
remaining four data centers were used. In this study, the learned mask R-CNN model excluding
center  is called “Model ” ( = 1, 2, 3, 4, 5).</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Inference</title>
          <p>Ensemble inference was performed by combining the detection/segmentation inference results
of Models 1–5, as shown in Figure 3. In detection, bounding boxes ensemble the results of
ifve models using the weighted box fusion technique [ 15]. In segmentation, the mask created
from each model has a value of 0 for the background and 1 for the polyp. We averaged the
segmentation masks from Models 1–5. When the inference result was above the given threshold,
it was determined to be the final polyp mask; otherwise, it was considered as background. In
this study, the threshold was set to 0.6. In addition, only segmentation/detection results with
confidence values greater than 0.5 in each model were used in the ensemble scores.
Model 1</p>
          <p>Model 2</p>
          <p>Model 3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Results</title>
      <p>In this section, we compare the performance of the proposed ensemble method with that of a
single model trained with all the datasets from five centers.</p>
      <sec id="sec-4-1">
        <title>4.1. Settings of experiments</title>
        <p>
          At the time of writing this paper, not all scores were provided for the competition test set;
therefore, validation was performed using the given dataset. In addition to the data provided by
the five centers, the competition also provided a sequential image dataset. The sequential dataset
was kept separately and was used only to evaluate the model’s performance. The ensemble
model and the single model were trained using the default trainer provided by Detectron2. A
stochastic gradient descent optimizer with LR= 0.001 and a warm-up scheduler was used in the
experiments. Image augmentations such as random crop and flip were used to avoid overfitting.
We conducted the experiments with training steps of 50,000 and stored the checkpoints for
every 1,000 steps. Among the saved checkpoints, the one with the best performance on the
validation set was selected as the final model weight. For a more sophisticated evaluation,
various evaluation metrics were used: Jaccard, Dice, 2, Precision, Recall, and Accuracy, for
(a) Ground truth results
(b) Ensemble results
(c) Single model results
segmentation task; and mAP (Mean Averaged Precision at IoU=0.50:.05:.95), APs, APm and
APl (AP for small, medium and large), for the detection task. The organisers also provided
leaderboard scores that incorporated out-of-sample generalisation metric [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Evaluation results</title>
        <p>As shown in Figures 3 and 4, the inference results of models 1–5 and the single model show
many false positive regions. However, the ensemble technique based on -fold cross-validation
significantly reduced the number of false positive regions.</p>
        <p>The segmentation task result is shown in Table 1. The performance of each individual
model (Models 1 to 5) was significantly lower than that of the single model. However, the
ensemble model showed better performance than the single model, except for recall. In particular,
the precision increased significantly because the number of false positives was reduced in the
ensemble model.</p>
        <p>The experimental results for the detection task show the similar aspects, as shown in Table 2.
The performance of an individual model was either better or worse than that of the single model.
However, the ensemble model showed better performance than the single model, except for
APm. It should be noted that APl improved significantly.</p>
        <p>Table 3 shows the average computing time taken for the detection and segmentation inferences
using the single model and the proposed ensemble model. The computing time was measured
and averaged for 1793 images in the test dataset. Although the ensemble was performed using
ifve single models, the inference time of the ensemble model takes less than five times compared
to that of the single model. In addition, the ensemble process of a single image took on average
0.008 seconds and 0.014 seconds for detection and segmentation tasks, respectively.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In this paper, we proposed an ensemble learning method based on -fold cross-validation to
improve the generalization performance of polyp detection and segmentation in endoscopic
images. Five mask R-CNN models were trained using only the training data collected from four of
the five centers, and the inference results of the five models were ensembled. In the ensemble, the
weighted boxes fusion technique was used for detection, and the ensemble segmentation mask
was created by averaging the inference results of five segmentation masks. In the experiment,
the detection and segmentation performances were measured using the sequential image dataset
provided in the EndoCV2021 competition as a test dataset. The experimental results showed
that in both detection and segmentation tasks, the performance of the mask R-CNN using the
ensemble technique was better than that of the mask R-CNN model trained using data from all
ifve centers at once. The proposed method achieved a score of 0.7269 on the detection task and
0.7423 ± 0.2839 on the segmentation task in Round 1 of the EndoCV2021 challenge.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgement</title>
      <p>This work was supported in part by the National Research Foundation of Korea (NRF) grant
funded by the Korea government (MSIT) (No.NRF-2020R1F1A1072201) and in part by the
National Research Council of Science &amp; Technology (NST) grant from the Korean
government (MSIP) (No. CRC-15-05-ETRI).
for object detection, in: Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit, 2017,
pp. 2117–2125.
[13] E. J. Kirkland, Bilinear interpolation, in: Advanced Computing in Electron Microscopy,</p>
      <p>Springer, 2010, pp. 261–263.
[14] M. Anthony, S. B. Holden, Cross-validation for binary classification by real-valued
functions: theoretical analysis, in: Proceedings of the eleventh annual conference on
Computational learning theory, 1998, pp. 218–229.
[15] R. Solovyev, W. Wang, T. Gabruseva, Weighted boxes fusion: Ensembling boxes from
diferent object detection models, Image and Vision Computing (2021) 1–6.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Bond</surname>
          </string-name>
          ,
          <article-title>Polyp guideline: diagnosis, treatment, and surveillance for patients with colorectal polyps</article-title>
          ,
          <source>Am J Gastroenterol</source>
          <volume>95</volume>
          (
          <year>2000</year>
          )
          <fpage>3053</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Urban</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tripathi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Alkayali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mittal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Jalali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Karnes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Baldi</surname>
          </string-name>
          ,
          <article-title>Deep learning localizes and identifies polyps in real time with 96% accuracy in screening colonoscopy</article-title>
          ,
          <source>Gastroenterology</source>
          <volume>155</volume>
          (
          <year>2018</year>
          )
          <fpage>1069</fpage>
          -
          <lpage>1078</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bernal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sánchez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vilarino</surname>
          </string-name>
          ,
          <article-title>Towards automatic polyp detection with a polyp appearance model</article-title>
          ,
          <source>Pattern Recognit</source>
          <volume>45</volume>
          (
          <year>2012</year>
          )
          <fpage>3166</fpage>
          -
          <lpage>3182</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Braden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bailey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yang</surname>
          </string-name>
          , G. Cheng, P. Zhang,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kayser</surname>
          </string-name>
          , R. D.
          <string-name>
            <surname>Soberanis-Mukul</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Albarqouni</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Watanabe</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Oksuz</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Ning</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Khan</surname>
            ,
            <given-names>X. W.</given-names>
          </string-name>
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Realdon</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Loshchenov</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          <string-name>
            <surname>Schnabel</surname>
            ,
            <given-names>J. E.</given-names>
          </string-name>
          <string-name>
            <surname>East</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Wagnieres</surname>
            ,
            <given-names>V. B.</given-names>
          </string-name>
          <string-name>
            <surname>Loschenov</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Grisan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Daul</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Rittscher</surname>
          </string-name>
          ,
          <article-title>An objective comparison of detection and segmentation algorithms for artefacts in clinical endoscopy</article-title>
          ,
          <source>Scientific Reports</source>
          <volume>10</volume>
          (
          <year>2020</year>
          )
          <article-title>2748</article-title>
          . doi:
          <volume>10</volume>
          .1038/s41598-020-59413-5.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ghatwary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Realdon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cannizzaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Riegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Halvorsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Daul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rittscher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. E.</given-names>
            <surname>Salem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lamarque</surname>
          </string-name>
          , T. de Lange,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>East</surname>
          </string-name>
          ,
          <article-title>Polypgen: A multi-center polyp detection and segmentation dataset for generalisability assessment</article-title>
          ,
          <source>arXiv</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Paszke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lerer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bradbury</surname>
          </string-name>
          , G. Chanan,
          <string-name>
            <given-names>T.</given-names>
            <surname>Killeen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gimelshein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Antiga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Desmaison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>DeVito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Raison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tejani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chilamkurthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Steiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chintala</surname>
          </string-name>
          ,
          <string-name>
            <surname>Pytorch:</surname>
          </string-name>
          <article-title>An imperative style, high-performance deep learning library</article-title>
          , in: H.
          <string-name>
            <surname>Wallach</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Beygelzimer</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <article-title>d'Alché-</article-title>
          <string-name>
            <surname>Buc</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Fox</surname>
          </string-name>
          , R. Garnett (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          <volume>32</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2019</year>
          , pp.
          <fpage>8024</fpage>
          -
          <lpage>8035</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kirillov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          , W.-Y. Lo,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          , Detectron2, https://github.com/ facebookresearch/detectron2,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <surname>Faster</surname>
          </string-name>
          r-cnn:
          <article-title>Towards real-time object detection with region proposal networks</article-title>
          ,
          <source>arXiv preprint arXiv:1506.01497</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.-Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dollár</surname>
          </string-name>
          ,
          <article-title>Focal loss for dense object detection</article-title>
          ,
          <source>in: Proc IEEE Int Conf Comput Vis</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>2980</fpage>
          -
          <lpage>2988</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          , G. Gkioxari,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dollár</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <surname>Mask</surname>
          </string-name>
          r-cnn,
          <source>in: Proc IEEE Int Conf Comput Vis</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>2961</fpage>
          -
          <lpage>2969</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kirillov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dollár</surname>
          </string-name>
          ,
          <article-title>Panoptic feature pyramid networks</article-title>
          ,
          <source>in: Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>6399</fpage>
          -
          <lpage>6408</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>T.-Y. Lin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dollár</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Hariharan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Belongie</surname>
          </string-name>
          , Feature pyramid networks
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>