Pyramid-Focus-Augmentation: Medical Image Segmentation with Step-Wise Focus Vajira Thambawita1,2 , Steven Hicks1,2 , Pål Halvorsen1,2 , Michael A. Riegler1 1 SimulaMet, Norway 2 Oslo Metropolitan University, Norway Contact:vajira@simula.no ABSTRACT GT Converted GT Loss calculation between converted GT and Segmentation of findings in the gastrointestinal tract is a challeng- predicted output ing but also important task which is an important building stone for sufficient automatic decision support systems. In this work, we Input + grid present our solution for the Medico 2020 task, which focused on the problem of colon polyp segmentation. We present our simple but DL Model (Unet) efficient idea of using an augmentation method that uses grids in a pyramid-like manner (large to small) for segmentation. Our results Predicted mean and std show that the proposed methods work as indented and can also lead to comparable results when competing with other methods. Figure 1: Training steps for a segmentation model with the new augmentation technique. 1 INTRODUCTION provided by the organizers has 1000 polyp images with correspond- ing ground truth masks. We divided it into two parts such that 800 Segmented polyp regions in Gastrointestinal Tract (GI) images [1] images are used for model training and 200 for testing. can provide detailed analysis to doctors to identify correct areas to proceed with treatments compared to other computer-aided analy- 2.1 PYRA Data Augmentation sis such as classification [2, 9, 10] and detection [7] which provide less detailed information about the exact region and size of the As the first step in PYRA, we generate checker board grids as il- affected area. However, training Deep Learning (DL) models to lustrated in the first row of Figure 2 with sizes of 𝑁 × 𝑁 with 𝑁 perform segmentation for medical data is challenging because of values of 4, 8, 16, 32, 64, 128 and 256. 𝑁 should be selected such that the lack of medical domain images as a result of tight privacy re- 𝑖𝑚𝑎𝑔𝑒_𝑠𝑖𝑧𝑒 % 𝑁 = 0. Applying these eight grid augmentations to strictions, the high cost for annotating medical data using experts, the training dataset with 800 images increases the training data to and a lower number of true positive findings compared to true 800 × 8 = 6400 images. negatives. In this paper, we present our approach for the partici- For the second step, we convert the Ground Truth (GT) segmen- pation in the 2020 Medico Segmentation Challenge [4], for which tation masks into a grid-based representation of the GT correspond- we introduce a novel augmentation technique called pyramid- ing to the grid sizes. For example, if the grid size is 8 × 8, then the focus-augmentation (PYRA). PYRA can be used to improve the corresponding GT is a 8 × 8 converted GT. performance of segmentation tasks when we have a small dataset The transformation of the ground truth masks to gridded masks to train our DL models or if the number of positive findings is is performed as following: (i) we divide the gt into the input grid small. Further, our method can focus doctors’ attention to regions size, (ii) we counted true pixels of each grid cell, (iii) if the number of polyps gradually. In addition to that the output of the method is of true pixels is larger than 0, we converted the whole cell into a also adjustable meaning, we could present a lower resolution of the true cell. An example of a converted GT is depicted on the top of grid if this is sufficient for the task at hand which can help to save Figure 1. processing time. Finally, our technique can also be applied to any segmentation task using any deep learning segmentation model. 2.2 Experimental Setup and Model training We have set up four experiments: Exp-1, Exp-2, Exp-3, and Exp-4 2 METHOD to show the performance of PYRA. Exp-1 and Exp-2 represent two Our method has two main steps: data augmentation with PYRA baseline experiments. Exp-1 uses only the 800 training images with- using pre-defined grid sizes followed by training of a DL model with out any augmentations. In Exp-2, we used general augmentations the resulting augmented data. The source code for our method can such as Affine, Coarse Dropout, and Additive Gaussian Noise from be found in our GitHub1 repository. The development dataset [5] the library called imgaug [6]. Exp-3 and Exp-4 are using our PYRA with the data from Exp-1 and Exp-2, respectively. The training 1 GitHub: https://vlbthambawita.github.io/PYRA/ dataset size was changed from 800 to 6400 after applying PYRA. However, we validated our experiments only using 200 images re- served for testing. We have used one data loader for all experiments Copyright 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). to maintain a fair evaluation. The baseline experiments Exp-1 and MediaEval’20, December 14-15 2020, Online Exp-2 used the data loader with a grid size of 256 × 256 which represents the original GT masks without any conversion. MediaEval’20, December 14-15 2020, Online Thambawita et al. Image 2×2 4×4 8×8 16 × 16 32 × 32 64 × 64 128 × 128 256 × 256 Ground Truth Predictions Std from 30 samples Figure 2: A representation of input and corresponding outputs of grid-augmentation-based segmentation. The first row shows an input image and all grid sizes used as stacked grid image with the input image. The second row represent ground truth. The third and fourth rows show predicted mean and std output images calculated from 30 samples. Table 1: Result collected from validation data and test data. mean images calculated by sampling 50 outputs for the same input All test data results were provided by organizers of Medico with the grid size of 256. Additionally, we have calculated std im- task in MediaEval 2020. ages for the validation dataset to show the benefits of using PYRA. Example outputs for a given input image are illustrated in Figure 2. Validation results Official test results According to the results in Table 1, Exp-3 which use only Pyramid- focus-augmentation shows the best validation results with mIoU Method mIOU Dice mIOU Dice of 0.7693 and DC of 0.8447, and the best test results with mIoU Exp-1 0.7640 0.8422 0.6934 0.7817 of 0.6981 and DC of 0.7887. The advantage of our Pyramid-focus- Exp-2 0.7077 0.7957 0.6759 0.7700 augmentation can be identified using the third row of Figure 2 Exp-3 0.7693 0.8447 0.6981 0.7887 along the fourth row of the same figure. We can see that our Exp-4 0.6898 0.7822 0.6696 0.7665 model can focus on polyp regions step by step. The third row We have used the Unet architecture [8] as our DL model to of Figure 2 shows how our model predicts correct polyp cells in perform the polyp segmentation task. We trained the Unet model 2 × 2, 4 × 4, 8 × 8, 16 × 16, 32 × 32, 64 × 64, 128 × 128 and 256 × 256 grid with a stacked input using a polyp image and a random grid image sizes, respectively. When we compare this row with the last row of selected from the eight sizes. Then, the model was trained to predict the images of std, we can see that the model has high confidence for converted GT which were formed by converting the real GT into a the identified polyp regions. For example, it shows high confidence grid-based GT as in the previous section. (black color region) for the middle part of the polyps. In contrast, The Unet model used dropout layers with the probability of 0.5. our model shows less confidence (yellow color region) for a polyps’ Then, we used our Unet model as a stochastic model to perform outer borders. Monte Carlo sampling for the validation data. We kept our Unet model in the training state to perform this sampling while pre- 4 CONCLUSION AND FUTURE WORK dicting the output for the validation data. In the Pytorch library, In this paper, we presented a novel augmentation method called which is used for all our implementations, we can do this simply by Pyramid-focus-augmentation (PYRA), which can be used to train keeping the model state in the model.train() state. We iterated segmentation DL methods. Our method shows a large benefit in 50 times for a single input to predict the output. We calculated the the medical diagnosis use-case, by focusing a doctors’ attention on mean from these 50 predictions, which is used as the final prediction regions with findings step by step. for the competition and Standard Deviation (std) images to know Our experiments did not use post-processing to clean up output the model’s confidence for the predictions. The whole training pro- corresponding to the input grid. In future work, we will evaluate cess is illustrated in Figure 1 with an example image and a grid size our approach with additional post-processing steps for smaller of 8 × 8 as an input. However, we submitted the predicted mean grid sizes. For example, we can do convolution operations to the images for the gird size of 256 × 256 which generate predictions output using a convolutional window equal to the input grid size with the size of true GT (without any transformations). All the to clean the results. However, post-processing techniques will not experiments used a fixed learning rate of 0.001 with the RMSprop improve the final results when the grid size equals the input images’ optimizer [3], which were selected from preliminary experiments. resolution. 3 RESULT AND DISCUSSION 5 ACKNOWLEDGMENT Table 1 summarizes the Mean Intersection over Union (mIoU) and The research has benefited from the Experimental Infrastructure for the Dice Coefficient (DC) for the validation dataset and the test Exploration of Exascale Computing (eX3), which is financially sup- dataset. The final results to the competition were collected from ported by the Research Council of Norway under contract 270053. Medico Multimedia Task MediaEval’20, December 14-15 2020, Online REFERENCES Zheng Rui, Jirka Borovec, Christian Vallentin, Semen Zhydenko, Kilian Pfeiffer, [1] M. Akbari, M. Mohrekesh, E. Nasr-Esfahani, S. M. R. Soroushmehr, N. Karimi, Ben Cook, Ismael Fernández, François-Michel De Rainville, Chi-Hung Weng, S. Samavi, and K. Najarian. 2018. Polyp Segmentation in Colonoscopy Images Abner Ayala-Acevedo, Raphael Meudec, Matias Laporte, and others. 2020. imgaug. Using Fully Convolutional Network. In 2018 40th Annual International Conference https://github.com/aleju/imgaug. (2020). Online; accessed 01-Nov-2020. of the IEEE Engineering in Medicine and Biology Society (EMBC). 69–72. https: [7] Ji Young Lee, Jinhoon Jeong, Eun Mi Song, Chunae Ha, Hyo Jeong Lee, Ja Eun //doi.org/10.1109/EMBC.2018.8512197 Koo, Dong-Hoon Yang, Namkug Kim, and Jeong-Sik Byeon. 2020. Real-time [2] Steven Alexander Hicks, Pia H Smedsrud, Pål Halvorsen, and Michael Riegler. detection of colon polyps during colonoscopy using deep learning: systematic 2018. Deep Learning Based Disease Detection Using Domain Specific Transfer validation with four independent datasets. Scientific Reports 10, 1 (2020), 8379. Learning. Proc. of MediaEval. [8] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional [3] Geoffrey Hinton, Nitish Srivastava, and Kevin Swersky. 2012. Neural networks networks for biomedical image segmentation. In International Conference on for machine learning lecture 6a overview of mini-batch gradient descent. (2012). Medical image computing and computer-assisted intervention. Springer, 234–241. [4] Debesh Jha, Steven A. Hicks, Krister Emanuelsen, Håvard Johansen, Dag Jo- [9] Vajira Thambawita, Debesh Jha, Hugo Lewi Hammer, Håvard D. Johansen, Dag Jo- hansen, Thomas de Lange, Michael A. Riegler, and Pål Halvorsen. 2020. Medico hansen, Pål Halvorsen, and Michael A. Riegler. 2020. An Extensive Study on Cross- Multimedia Task at MediaEval 2020: Automatic Polyp Segmentation. In Proc. of Dataset Bias and Evaluation Metrics Interpretation for Machine Learning Ap- the MediaEval 2020 Workshop. plied to Gastrointestinal Tract Abnormality Classification. ACM Trans. Comput. [5] Debesh Jha, Pia H Smedsrud, Michael A Riegler, Pål Halvorsen, Thomas de Lange, Healthcare 1, 3, Article 17 (June 2020), 29 pages. https://doi.org/10.1145/3386295 Dag Johansen, and Håvard D Johansen. 2020. Kvasir-seg: A segmented polyp [10] Vajira Thambawita, Debesh Jha, Michael Riegler, Pål Halvorsen, Hugo Lewi dataset. In International Conference on Multimedia Modeling. Springer, 451–462. Hammer, Håvard D Johansen, and Dag Johansen. 2018. The medico-task 2018: [6] Alexander B. Jung, Kentaro Wada, Jon Crall, Satoshi Tanaka, Jake Graving, Disease detection in the gastrointestinal tract using global features and deep Christoph Reinders, Sarthak Yadav, Joy Banerjee, Gábor Vecsei, Adam Kraft, learning. Proc. of MediaEval (2018).