=Paper= {{Paper |id=Vol-2595/endoCV2020_Jadhav_et_al |storemode=property |title=Multi-plateau Ensemble For Endoscopic Artefact Segmentation And Detection |pdfUrl=https://ceur-ws.org/Vol-2595/endoCV2020_paper_id_20.pdf |volume=Vol-2595 |authors=Suyog Jadhav,Udbhav Bamba,Arnav Chavan,Rishabh Tiwari,Aryan Raj |dblpUrl=https://dblp.org/rec/conf/isbi/JadhavBCTR20 }} ==Multi-plateau Ensemble For Endoscopic Artefact Segmentation And Detection== https://ceur-ws.org/Vol-2595/endoCV2020_paper_id_20.pdf
    MULTI-PLATEAU ENSEMBLE FOR ENDOSCOPIC ARTEFACT SEGMENTATION AND
                               DETECTION

                     Suyog Jadhav, Udbhav Bamba, Arnav Chavan, Rishabh Tiwari, Aryan Raj

                                        Indian Institute of Technology (ISM), Dhanbad


                         ABSTRACT                                         the number of samples in the dataset were relatively low and
Endoscopic artefact detection challenge consists of 1) Arte-              unbalanced, various augmentation techniques were adopted
fact detection, 2) Semantic segmentation, and 3) Out-of-                  to prevent overfitting and achieve generalization. Horizontal
sample generalisation. For Semantic segmentation task,                    and vertical flip, cutout (random holes)[12], random contrast,
we propose a multi-plateau ensemble of FPN[1] (Feature                    gamma, brightness, rotation were tested out. To strongly reg-
Pyramid Network) with EfficientNet[2] as feature extrac-                  ularize the data we propose the use of CutMix for segmenta-
tor/encoder. For Object detection task, we used a three                   tion (Algorithm: 1) which was toggled on and off depending
model ensemble of RetinaNet[3] with Resnet50[4] Back-                     upon the variance of model outputs.
bone and FasterRCNN[5] (FPN + DC5[6]) with Resnext101
Backbone[7, 8]. A PyTorch implementation to our approach                  Algorithm 1 CutMix for Segmentation
to the problem is available at github.com/ubamba98/EAD2020.                 for each iteration do
     Index Terms- Endoscopy, FPN, EfficientNet, RetinaNet,                    input, target = get minibatch(dataset)
Faster RCNN, Artefact detection.                                              if mode == training then
                                                                                 input s, target s = shuffle minibatch(input, target)
                                                                                 lambda = Unif(0,1)
                          1. DATASETS                                            r x = Unif(0,W)
                                                                                 r y = Unif(0,H)
The given dataset of EndoCV-2020 [9, 10, 11] has a total of                      r w = Sqrt(1 - lambda)
643 images for the segmentation task which we divided into                       r h = Sqrt(1 - lambda)
three parts - train (474 images), validation (99 images) and                     x1 = Round(Clip(r x - r w / 2, min=0))
holdout (70 images) in the sequence they were released. We                       x2 = Round(Clip(r x + r w / 2, max=W))
made sure that the distribution of train and holdout were simi-                  y1 = Round(Clip(r y - r h / 2, min=0))
lar and that of validation was different. Validation set ensured                 y2 = Round(Clip(r y + r h / 2, min=H))
that the model was not overfitting to the training data and at                   input[:, :, x1:x2, y1:y2] = input s[:, :, x1:x2, y1:y2]
the same time generalizing well on holdout. For Detection, a                     target[:, :, x1:x2, y1:y2] = target s[:, :, x1:x2, y1:y2]
similar strategy was adopted - train (2200 images), validation                end if
(232 images) and holdout (99 images)                                          output = model forward(input)
                                                                              loss = compute loss(output, target)
                                                                              model update()
                          2. METHODS
                                                                            end for
2.1. Data Pre-processing and Augmentations
                                                                             For Object detection spatial transformations - flip and ran-
Due to the variable aspect ratios and sizes in the training               dom scaling and rotating were used.
data, we adopted a stage-dependent rescaling policy. Dur-
ing the training stage, we cropped the images to a fixed size
of 512x512 without resizing. This made sure that the spa-                 2.2. Multi-Plateau Approach
tial information was not lost and at the same time, the in-               Due to high variability and early overfitting nature in the
put to models was within trainable limits. During validation              dataset, the main focus was on making a strong ensemble by
and testing time, we padded the images such that both dimen-              training models on different optimisation plateaus. A total
sions are a multiple of 128 which is required for the Efficient-          of 8 different plateaus with permutations of two different
Net backbone (to handle max-pooling in deeper models). As                 optimisers and four different loss functions were optimised
    Copyright c 2020 for this paper by its authors. Use permitted under   with EfficientNet backbone increasing the depth, width and
Creative Commons License Attribution 4.0 International (CC BY 4.0).       resolution three times, going from B3 to B5 (Table 1). For
                                               Table 1. Multi-Plateau Results
        Encoder         Optimizer     Loss function             Validation (DICE)        FineTuning including Holdout
                                      DICE                            0.4917                        0.4900
                                      BCE+DICE                        0.4382                           –
                          Ranger      BCE                             0.4415                           –
                                      BCE+DICE+JACCARD                0.4771                        0.4630
     Efficientnet B3                  DICE                              0.4509                           –
                                      BCE+DICE                          0.4500                           –
                        Over9000      BCE                               0.4170                           –
                                      BCE+DICE+JACCARD                  0.4525                           –
                                      DICE                              0.4568                           –
                                      BCE+DICE                          0.4759                        0.4720
                          Ranger      BCE                               0.4165                           –
                                      BCE+DICE+JACCARD                  0.4718                        0.4666
     Efficientnet B4                  DICE                              0.3890                           –
                                      BCE+DICE                          0.4597                           –
                        Over9000      BCE                               0.4151                           –
                                      BCE+DICE+JACCARD                  0.4614                           –
                                      DICE                              0.4761                        0.4987
                                      BCE+DICE                          0.4693                        0.4643
                          Ranger      BCE                               0.4374                           –
                                      BCE+DICE+JACCARD                  0.4781                        0.4900
     Efficientnet B5                  DICE                              0.4352                           –
                                      BCE+DICE                          0.4730                        0.4823
                        Over9000      BCE                               0.4151                           –
                                      BCE+DICE+JACCARD                  0.4726                        0.4798


optimisers, Ranger and Over9000 were used. Ranger is a syn-      For every consecutive stage best checkpoint of the previous
ergistic optimiser combining RAdam (rectified Adam)[13]          stage was loaded.
and LookAhead[14], and Over9000 is a combination of
Ralamb[15] and LookAhead. A total of 2*4*3 = 24 models
were trained, but in the final ensemble, only the models with
a dice greater than 0.47 were considered. Average pixel-wise
ensembling was adopted.                                          2.4. Triple Threshold

2.3. Multi Stage Training                                        After analysing predictions on holdout, it was found that the
                                                                 number of false positives was quite high. To counter this, we
Complete segmentation training pipeline was divided into
                                                                 implemented a novel post-processing algorithm which specif-
four stages -
                                                                 ically reduced the number of false positives in the predictions
Stage 1 - CutMix was disabled to reduce regularization effect,
                                                                 (Algorithm 2). Three sets of thresholds - max prob thresh,
encoder was loaded with ImageNet weights and freezed for
                                                                 min prob thresh, min area thresh were tuned for this given
the decoder to learn spatial features without being stuck into
                                                                 task.
saddle point, crops were taken with at least one pixel having
a positive mask.                                                     max prob thresh and min prob thresh were tuned using
Stage 2 - CutMix was enabled for strong regularization, and      grid search on holdout dataset, whereas min area thresh was
encoder was unfreezed to learn spatial features of endoscopic    calculated by sorting sum of positive pixels of every class and
images.                                                          taking the 2.5th percent respectively. min area thresh used
Stage 3 - Random crops were trained instead of non-empty         after calculation were 2000, 128, 256, 256 and 1024 respec-
crops for the model to learn negative samples.                   tively for every class. The results for triple threshold on sin-
Stage 4 - Very few epochs with CutMix disabled and encoder       gle best model are compiled in Table 3 and comparison of our
freezed for generalization on original data.                     best performing models in Table 4.
                                               Table 2. Object Detection Ensembling
                      Parameter              Values Tested                            Description
                                                                            If two overlapping boxes have
                       iou thresh             0.4, 0.5, 0.6                     IoU value > iou thresh,
                                                                              one of the boxes is rejected.
                                                                                If a predicted box has a
                      score thresh            0.4, 0.5, 0.6                confidence value < score thresh
                                                                        associated with it, the box is rejected.
                                                                       The weights are given to the predictions
                                      [1, 1, 1], [1, 1, 2], [1, 2, 1],           by each of the models.
                        weights       [2, 1, 1], [1, 2, 2], [2, 1, 2],  A model with higher weight has more
                                                 [2, 2, 1]                influence on the final output than
                                                                             a model with a lower weight.


Algorithm 2 Triple Threshold
                                                                        Table 4. Precision Values on Best Performing Models
  for each sample do
    output masks = model(each sample)                                 Model                                        No Triple   Triple
    final masks = []                                                  B3-Ranger-DICE                                0.492       0.494
    i=0                                                               B5-Ranger-DICE                                0.597       0.608
    for each output mask in output masks do                           B5-Ranger-BCE+DICE+JACCARD                    0.549       0.561
       max mask = output mask > max prob thresh                       B5-Over9000-BCE+DICE                          0.520       0.530
       if max mask.sum() < min area thresh[i] then                    B5-Over9000-BCE+DICE+JACCARD                  0.505       0.515
          output mask = zeros(output mask.shape)
       else
          output mask = output mask > min prob thresh
       end if
       i=i+1                                                         bones of FPN and Resnext 101 32xd were trained (Table 5).
       final masks.append(output mask)                               Our ensemble strategy involves finding overlapping boxes of
    end for                                                          the same class and average their positions while adding their
  end for                                                            confidences. For finding the best parameters for ensembling
                                                                     the three models predictions, we ran a grid search with all
                                                                     possible combinations of the given range of values (Table 2).
             Table 3. Triple Threshold Results
        Min Thresh Max Thresh Val Precision
        0.5                  -                 0.597                               Table 5. Object Detection Results
        0.5                 0.6                0.601
        0.5                 0.7                0.608                  Model                              Val. mAP     Hold. mAP
        0.5                 0.8                0.598                  RetinaNet (FPN backend)              26.07            24.66
        0.4                  -                 0.588                  Faster RCNN (FPN backend)            20.11            21.47
        0.4                 0.6                0.593                  Faster RCNN (DC5 backend)            27.64            26.15
        0.4                 0.7                0.600                  Ensembled                            32.33            30.12
        0.4                 0.8                0.591

” - ” indicates no triple threshold


2.5. Object Detection                                                                        3. RESULTS
For Object Detection, individual models were trained with
SGD as optimizer and confidence threshold of 0.5. To counter         We achieved a Segmentation score which was a weighted lin-
the variance and improve the performance of our model pre-           ear combination of dice, IOU and F2 of 0.5675 on the final
dictions, general ensembling was performed. Retinanet with           leader board and for object detection task, an mAP of 0.2061
backbones of FPN and Resnet50 and Faster RCNN with back-             was obtained.
           4. DISCUSSION & CONCLUSION                                 Seiryo Watanabe, Ilkay Oksuz, Qingtian Ning, Shu-
                                                                      fan Yang, Mohammad Azam Khan, Xiaohong W. Gao,
Gastric cancer accounts for around 1 million deaths each year         Stefano Realdon, Maxim Loshchenov, Julia A. Schn-
which can be prevented by early diagnosis. In this paper we           abel, James E. East, Geroges Wagnieres, Victor B.
explored multi-plateau ensemble to generalize pixel level seg-        Loschenov, Enrico Grisan, Christian Daul, Walter Blon-
mentation and localization of artefacts in endoscopic images.         del, and Jens Rittscher. An objective comparison of de-
We developed novel augmentation and post-processing algo-             tection and segmentation algorithms for artefacts in clin-
rithms for better and robust model convergence.                       ical endoscopy. Scientific Reports, 10, 2020.
                                                                 [12] Terrance DeVries and Graham W. Taylor. Improved
                    5. REFERENCES                                     regularization of convolutional neural networks with
                                                                      cutout, 2017.
 [1] Tsung-Yi Lin, Piotr Dollr, Ross Girshick, Kaiming He,
     Bharath Hariharan, and Serge Belongie. Feature pyra-        [13] Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu
     mid networks for object detection, 2016.                         Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. On
                                                                      the variance of the adaptive learning rate and beyond,
 [2] Mingxing Tan and Quoc V. Le. Efficientnet: Rethinking
                                                                      2019.
     model scaling for convolutional neural networks, 2019.
                                                                 [14] Michael R. Zhang, James Lucas, Geoffrey Hinton, and
 [3] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He,            Jimmy Ba. Lookahead optimizer: k steps forward, 1
     and Piotr Dollr. Focal loss for dense object detection,          step back, 2019.
     2017.
                                                                 [15] Yang You, Igor Gitman, and Boris Ginsburg. Large
 [4] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian                batch training of convolutional networks, 2017.
     Sun. Deep residual learning for image recognition,
     2015.

 [5] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian
     Sun. Faster r-cnn: Towards real-time object detection
     with region proposal networks, 2015.

 [6] Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong
     Zhang, Han Hu, and Yichen Wei. Deformable convolu-
     tional networks, 2017.

 [7] Saining Xie, Ross Girshick, Piotr Dollr, Zhuowen Tu,
     and Kaiming He. Aggregated residual transformations
     for deep neural networks, 2016.

 [8] Jie Hu, Li Shen, Samuel Albanie, Gang Sun, and Enhua
     Wu. Squeeze-and-excitation networks, 2017.

 [9] Sharib Ali, Felix Zhou, Christian Daul, Barbara Braden,
     Adam Bailey, Stefano Realdon, James East, Georges
     Wagnieres, Victor Loschenov, Enrico Grisan, et al. En-
     doscopy artifact detection (ead 2019) challenge dataset.
     arXiv preprint arXiv:1905.03209, 2019.

[10] Sharib Ali, Felix Zhou, Adam Bailey, Barbara Braden,
     James East, Xin Lu, and Jens Rittscher. A deep learn-
     ing framework for quality assessment and restoration
     in video endoscopy. arXiv preprint arXiv:1904.07073,
     2019.

[11] Sharib Ali, Felix Zhou, Barbara Braden, Adam Bai-
     ley, Suhui Yang, Guanju Cheng, Pengyi Zhang, Xiao-
     qiong Li, Maxime Kayser, Roger D. Soberanis-Mukul,
     Shadi Albarqouni, Xiaokang Wang, Chunqing Wang,