=Paper=
{{Paper
|id=Vol-2595/endoCV2020_Jadhav_et_al
|storemode=property
|title=Multi-plateau Ensemble For Endoscopic Artefact Segmentation And Detection
|pdfUrl=https://ceur-ws.org/Vol-2595/endoCV2020_paper_id_20.pdf
|volume=Vol-2595
|authors=Suyog Jadhav,Udbhav Bamba,Arnav Chavan,Rishabh Tiwari,Aryan Raj
|dblpUrl=https://dblp.org/rec/conf/isbi/JadhavBCTR20
}}
==Multi-plateau Ensemble For Endoscopic Artefact Segmentation And Detection==
MULTI-PLATEAU ENSEMBLE FOR ENDOSCOPIC ARTEFACT SEGMENTATION AND DETECTION Suyog Jadhav, Udbhav Bamba, Arnav Chavan, Rishabh Tiwari, Aryan Raj Indian Institute of Technology (ISM), Dhanbad ABSTRACT the number of samples in the dataset were relatively low and Endoscopic artefact detection challenge consists of 1) Arte- unbalanced, various augmentation techniques were adopted fact detection, 2) Semantic segmentation, and 3) Out-of- to prevent overfitting and achieve generalization. Horizontal sample generalisation. For Semantic segmentation task, and vertical flip, cutout (random holes)[12], random contrast, we propose a multi-plateau ensemble of FPN[1] (Feature gamma, brightness, rotation were tested out. To strongly reg- Pyramid Network) with EfficientNet[2] as feature extrac- ularize the data we propose the use of CutMix for segmenta- tor/encoder. For Object detection task, we used a three tion (Algorithm: 1) which was toggled on and off depending model ensemble of RetinaNet[3] with Resnet50[4] Back- upon the variance of model outputs. bone and FasterRCNN[5] (FPN + DC5[6]) with Resnext101 Backbone[7, 8]. A PyTorch implementation to our approach Algorithm 1 CutMix for Segmentation to the problem is available at github.com/ubamba98/EAD2020. for each iteration do Index Terms- Endoscopy, FPN, EfficientNet, RetinaNet, input, target = get minibatch(dataset) Faster RCNN, Artefact detection. if mode == training then input s, target s = shuffle minibatch(input, target) lambda = Unif(0,1) 1. DATASETS r x = Unif(0,W) r y = Unif(0,H) The given dataset of EndoCV-2020 [9, 10, 11] has a total of r w = Sqrt(1 - lambda) 643 images for the segmentation task which we divided into r h = Sqrt(1 - lambda) three parts - train (474 images), validation (99 images) and x1 = Round(Clip(r x - r w / 2, min=0)) holdout (70 images) in the sequence they were released. We x2 = Round(Clip(r x + r w / 2, max=W)) made sure that the distribution of train and holdout were simi- y1 = Round(Clip(r y - r h / 2, min=0)) lar and that of validation was different. Validation set ensured y2 = Round(Clip(r y + r h / 2, min=H)) that the model was not overfitting to the training data and at input[:, :, x1:x2, y1:y2] = input s[:, :, x1:x2, y1:y2] the same time generalizing well on holdout. For Detection, a target[:, :, x1:x2, y1:y2] = target s[:, :, x1:x2, y1:y2] similar strategy was adopted - train (2200 images), validation end if (232 images) and holdout (99 images) output = model forward(input) loss = compute loss(output, target) model update() 2. METHODS end for 2.1. Data Pre-processing and Augmentations For Object detection spatial transformations - flip and ran- Due to the variable aspect ratios and sizes in the training dom scaling and rotating were used. data, we adopted a stage-dependent rescaling policy. Dur- ing the training stage, we cropped the images to a fixed size of 512x512 without resizing. This made sure that the spa- 2.2. Multi-Plateau Approach tial information was not lost and at the same time, the in- Due to high variability and early overfitting nature in the put to models was within trainable limits. During validation dataset, the main focus was on making a strong ensemble by and testing time, we padded the images such that both dimen- training models on different optimisation plateaus. A total sions are a multiple of 128 which is required for the Efficient- of 8 different plateaus with permutations of two different Net backbone (to handle max-pooling in deeper models). As optimisers and four different loss functions were optimised Copyright c 2020 for this paper by its authors. Use permitted under with EfficientNet backbone increasing the depth, width and Creative Commons License Attribution 4.0 International (CC BY 4.0). resolution three times, going from B3 to B5 (Table 1). For Table 1. Multi-Plateau Results Encoder Optimizer Loss function Validation (DICE) FineTuning including Holdout DICE 0.4917 0.4900 BCE+DICE 0.4382 – Ranger BCE 0.4415 – BCE+DICE+JACCARD 0.4771 0.4630 Efficientnet B3 DICE 0.4509 – BCE+DICE 0.4500 – Over9000 BCE 0.4170 – BCE+DICE+JACCARD 0.4525 – DICE 0.4568 – BCE+DICE 0.4759 0.4720 Ranger BCE 0.4165 – BCE+DICE+JACCARD 0.4718 0.4666 Efficientnet B4 DICE 0.3890 – BCE+DICE 0.4597 – Over9000 BCE 0.4151 – BCE+DICE+JACCARD 0.4614 – DICE 0.4761 0.4987 BCE+DICE 0.4693 0.4643 Ranger BCE 0.4374 – BCE+DICE+JACCARD 0.4781 0.4900 Efficientnet B5 DICE 0.4352 – BCE+DICE 0.4730 0.4823 Over9000 BCE 0.4151 – BCE+DICE+JACCARD 0.4726 0.4798 optimisers, Ranger and Over9000 were used. Ranger is a syn- For every consecutive stage best checkpoint of the previous ergistic optimiser combining RAdam (rectified Adam)[13] stage was loaded. and LookAhead[14], and Over9000 is a combination of Ralamb[15] and LookAhead. A total of 2*4*3 = 24 models were trained, but in the final ensemble, only the models with a dice greater than 0.47 were considered. Average pixel-wise ensembling was adopted. 2.4. Triple Threshold 2.3. Multi Stage Training After analysing predictions on holdout, it was found that the number of false positives was quite high. To counter this, we Complete segmentation training pipeline was divided into implemented a novel post-processing algorithm which specif- four stages - ically reduced the number of false positives in the predictions Stage 1 - CutMix was disabled to reduce regularization effect, (Algorithm 2). Three sets of thresholds - max prob thresh, encoder was loaded with ImageNet weights and freezed for min prob thresh, min area thresh were tuned for this given the decoder to learn spatial features without being stuck into task. saddle point, crops were taken with at least one pixel having a positive mask. max prob thresh and min prob thresh were tuned using Stage 2 - CutMix was enabled for strong regularization, and grid search on holdout dataset, whereas min area thresh was encoder was unfreezed to learn spatial features of endoscopic calculated by sorting sum of positive pixels of every class and images. taking the 2.5th percent respectively. min area thresh used Stage 3 - Random crops were trained instead of non-empty after calculation were 2000, 128, 256, 256 and 1024 respec- crops for the model to learn negative samples. tively for every class. The results for triple threshold on sin- Stage 4 - Very few epochs with CutMix disabled and encoder gle best model are compiled in Table 3 and comparison of our freezed for generalization on original data. best performing models in Table 4. Table 2. Object Detection Ensembling Parameter Values Tested Description If two overlapping boxes have iou thresh 0.4, 0.5, 0.6 IoU value > iou thresh, one of the boxes is rejected. If a predicted box has a score thresh 0.4, 0.5, 0.6 confidence value < score thresh associated with it, the box is rejected. The weights are given to the predictions [1, 1, 1], [1, 1, 2], [1, 2, 1], by each of the models. weights [2, 1, 1], [1, 2, 2], [2, 1, 2], A model with higher weight has more [2, 2, 1] influence on the final output than a model with a lower weight. Algorithm 2 Triple Threshold Table 4. Precision Values on Best Performing Models for each sample do output masks = model(each sample) Model No Triple Triple final masks = [] B3-Ranger-DICE 0.492 0.494 i=0 B5-Ranger-DICE 0.597 0.608 for each output mask in output masks do B5-Ranger-BCE+DICE+JACCARD 0.549 0.561 max mask = output mask > max prob thresh B5-Over9000-BCE+DICE 0.520 0.530 if max mask.sum() < min area thresh[i] then B5-Over9000-BCE+DICE+JACCARD 0.505 0.515 output mask = zeros(output mask.shape) else output mask = output mask > min prob thresh end if i=i+1 bones of FPN and Resnext 101 32xd were trained (Table 5). final masks.append(output mask) Our ensemble strategy involves finding overlapping boxes of end for the same class and average their positions while adding their end for confidences. For finding the best parameters for ensembling the three models predictions, we ran a grid search with all possible combinations of the given range of values (Table 2). Table 3. Triple Threshold Results Min Thresh Max Thresh Val Precision 0.5 - 0.597 Table 5. Object Detection Results 0.5 0.6 0.601 0.5 0.7 0.608 Model Val. mAP Hold. mAP 0.5 0.8 0.598 RetinaNet (FPN backend) 26.07 24.66 0.4 - 0.588 Faster RCNN (FPN backend) 20.11 21.47 0.4 0.6 0.593 Faster RCNN (DC5 backend) 27.64 26.15 0.4 0.7 0.600 Ensembled 32.33 30.12 0.4 0.8 0.591 ” - ” indicates no triple threshold 2.5. Object Detection 3. RESULTS For Object Detection, individual models were trained with SGD as optimizer and confidence threshold of 0.5. To counter We achieved a Segmentation score which was a weighted lin- the variance and improve the performance of our model pre- ear combination of dice, IOU and F2 of 0.5675 on the final dictions, general ensembling was performed. Retinanet with leader board and for object detection task, an mAP of 0.2061 backbones of FPN and Resnet50 and Faster RCNN with back- was obtained. 4. DISCUSSION & CONCLUSION Seiryo Watanabe, Ilkay Oksuz, Qingtian Ning, Shu- fan Yang, Mohammad Azam Khan, Xiaohong W. Gao, Gastric cancer accounts for around 1 million deaths each year Stefano Realdon, Maxim Loshchenov, Julia A. Schn- which can be prevented by early diagnosis. In this paper we abel, James E. East, Geroges Wagnieres, Victor B. explored multi-plateau ensemble to generalize pixel level seg- Loschenov, Enrico Grisan, Christian Daul, Walter Blon- mentation and localization of artefacts in endoscopic images. del, and Jens Rittscher. An objective comparison of de- We developed novel augmentation and post-processing algo- tection and segmentation algorithms for artefacts in clin- rithms for better and robust model convergence. ical endoscopy. Scientific Reports, 10, 2020. [12] Terrance DeVries and Graham W. Taylor. Improved 5. REFERENCES regularization of convolutional neural networks with cutout, 2017. [1] Tsung-Yi Lin, Piotr Dollr, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyra- [13] Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu mid networks for object detection, 2016. Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. On the variance of the adaptive learning rate and beyond, [2] Mingxing Tan and Quoc V. Le. Efficientnet: Rethinking 2019. model scaling for convolutional neural networks, 2019. [14] Michael R. Zhang, James Lucas, Geoffrey Hinton, and [3] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Jimmy Ba. Lookahead optimizer: k steps forward, 1 and Piotr Dollr. Focal loss for dense object detection, step back, 2019. 2017. [15] Yang You, Igor Gitman, and Boris Ginsburg. Large [4] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian batch training of convolutional networks, 2017. Sun. Deep residual learning for image recognition, 2015. [5] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks, 2015. [6] Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. Deformable convolu- tional networks, 2017. [7] Saining Xie, Ross Girshick, Piotr Dollr, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks, 2016. [8] Jie Hu, Li Shen, Samuel Albanie, Gang Sun, and Enhua Wu. Squeeze-and-excitation networks, 2017. [9] Sharib Ali, Felix Zhou, Christian Daul, Barbara Braden, Adam Bailey, Stefano Realdon, James East, Georges Wagnieres, Victor Loschenov, Enrico Grisan, et al. En- doscopy artifact detection (ead 2019) challenge dataset. arXiv preprint arXiv:1905.03209, 2019. [10] Sharib Ali, Felix Zhou, Adam Bailey, Barbara Braden, James East, Xin Lu, and Jens Rittscher. A deep learn- ing framework for quality assessment and restoration in video endoscopy. arXiv preprint arXiv:1904.07073, 2019. [11] Sharib Ali, Felix Zhou, Barbara Braden, Adam Bai- ley, Suhui Yang, Guanju Cheng, Pengyi Zhang, Xiao- qiong Li, Maxime Kayser, Roger D. Soberanis-Mukul, Shadi Albarqouni, Xiaokang Wang, Chunqing Wang,