=Paper=
{{Paper
|id=Vol-2882/MediaEval_20_paper_13
|storemode=property
|title=Pyramid-Focus-Augmentation: Medical Image Segmentation with Step-Wise
Focus
|pdfUrl=https://ceur-ws.org/Vol-2882/paper13.pdf
|volume=Vol-2882
|authors=Vajira Thambawita,Steven
Hicks,Pål Halvorsen,Michael A.
Riegler
|dblpUrl=https://dblp.org/rec/conf/mediaeval/ThambawitaHHR20
}}
==Pyramid-Focus-Augmentation: Medical Image Segmentation with Step-Wise
Focus==
Pyramid-Focus-Augmentation: Medical Image Segmentation
with Step-Wise Focus
Vajira Thambawita1,2 , Steven Hicks1,2 , Pål Halvorsen1,2 , Michael A. Riegler1
1 SimulaMet, Norway 2 Oslo Metropolitan University, Norway
Contact:vajira@simula.no
ABSTRACT GT Converted GT Loss calculation
between
converted GT and
Segmentation of findings in the gastrointestinal tract is a challeng- predicted output
ing but also important task which is an important building stone
for sufficient automatic decision support systems. In this work, we
Input + grid
present our solution for the Medico 2020 task, which focused on the
problem of colon polyp segmentation. We present our simple but DL Model
(Unet)
efficient idea of using an augmentation method that uses grids in a
pyramid-like manner (large to small) for segmentation. Our results Predicted mean and std
show that the proposed methods work as indented and can also
lead to comparable results when competing with other methods. Figure 1: Training steps for a segmentation model with the
new augmentation technique.
1 INTRODUCTION provided by the organizers has 1000 polyp images with correspond-
ing ground truth masks. We divided it into two parts such that 800
Segmented polyp regions in Gastrointestinal Tract (GI) images [1] images are used for model training and 200 for testing.
can provide detailed analysis to doctors to identify correct areas to
proceed with treatments compared to other computer-aided analy-
2.1 PYRA Data Augmentation
sis such as classification [2, 9, 10] and detection [7] which provide
less detailed information about the exact region and size of the As the first step in PYRA, we generate checker board grids as il-
affected area. However, training Deep Learning (DL) models to lustrated in the first row of Figure 2 with sizes of 𝑁 × 𝑁 with 𝑁
perform segmentation for medical data is challenging because of values of 4, 8, 16, 32, 64, 128 and 256. 𝑁 should be selected such that
the lack of medical domain images as a result of tight privacy re- 𝑖𝑚𝑎𝑔𝑒_𝑠𝑖𝑧𝑒 % 𝑁 = 0. Applying these eight grid augmentations to
strictions, the high cost for annotating medical data using experts, the training dataset with 800 images increases the training data to
and a lower number of true positive findings compared to true 800 × 8 = 6400 images.
negatives. In this paper, we present our approach for the partici- For the second step, we convert the Ground Truth (GT) segmen-
pation in the 2020 Medico Segmentation Challenge [4], for which tation masks into a grid-based representation of the GT correspond-
we introduce a novel augmentation technique called pyramid- ing to the grid sizes. For example, if the grid size is 8 × 8, then the
focus-augmentation (PYRA). PYRA can be used to improve the corresponding GT is a 8 × 8 converted GT.
performance of segmentation tasks when we have a small dataset The transformation of the ground truth masks to gridded masks
to train our DL models or if the number of positive findings is is performed as following: (i) we divide the gt into the input grid
small. Further, our method can focus doctors’ attention to regions size, (ii) we counted true pixels of each grid cell, (iii) if the number
of polyps gradually. In addition to that the output of the method is of true pixels is larger than 0, we converted the whole cell into a
also adjustable meaning, we could present a lower resolution of the true cell. An example of a converted GT is depicted on the top of
grid if this is sufficient for the task at hand which can help to save Figure 1.
processing time. Finally, our technique can also be applied to any
segmentation task using any deep learning segmentation model. 2.2 Experimental Setup and Model training
We have set up four experiments: Exp-1, Exp-2, Exp-3, and Exp-4
2 METHOD to show the performance of PYRA. Exp-1 and Exp-2 represent two
Our method has two main steps: data augmentation with PYRA baseline experiments. Exp-1 uses only the 800 training images with-
using pre-defined grid sizes followed by training of a DL model with out any augmentations. In Exp-2, we used general augmentations
the resulting augmented data. The source code for our method can such as Affine, Coarse Dropout, and Additive Gaussian Noise from
be found in our GitHub1 repository. The development dataset [5] the library called imgaug [6]. Exp-3 and Exp-4 are using our PYRA
with the data from Exp-1 and Exp-2, respectively. The training
1 GitHub: https://vlbthambawita.github.io/PYRA/
dataset size was changed from 800 to 6400 after applying PYRA.
However, we validated our experiments only using 200 images re-
served for testing. We have used one data loader for all experiments
Copyright 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).
to maintain a fair evaluation. The baseline experiments Exp-1 and
MediaEval’20, December 14-15 2020, Online Exp-2 used the data loader with a grid size of 256 × 256 which
represents the original GT masks without any conversion.
MediaEval’20, December 14-15 2020, Online Thambawita et al.
Image 2×2 4×4 8×8 16 × 16 32 × 32 64 × 64 128 × 128 256 × 256
Ground
Truth
Predictions
Std from 30
samples
Figure 2: A representation of input and corresponding outputs of grid-augmentation-based segmentation. The first row shows
an input image and all grid sizes used as stacked grid image with the input image. The second row represent ground truth.
The third and fourth rows show predicted mean and std output images calculated from 30 samples.
Table 1: Result collected from validation data and test data. mean images calculated by sampling 50 outputs for the same input
All test data results were provided by organizers of Medico with the grid size of 256. Additionally, we have calculated std im-
task in MediaEval 2020. ages for the validation dataset to show the benefits of using PYRA.
Example outputs for a given input image are illustrated in Figure 2.
Validation results Official test results According to the results in Table 1, Exp-3 which use only Pyramid-
focus-augmentation shows the best validation results with mIoU
Method mIOU Dice mIOU Dice
of 0.7693 and DC of 0.8447, and the best test results with mIoU
Exp-1 0.7640 0.8422 0.6934 0.7817 of 0.6981 and DC of 0.7887. The advantage of our Pyramid-focus-
Exp-2 0.7077 0.7957 0.6759 0.7700 augmentation can be identified using the third row of Figure 2
Exp-3 0.7693 0.8447 0.6981 0.7887 along the fourth row of the same figure. We can see that our
Exp-4 0.6898 0.7822 0.6696 0.7665 model can focus on polyp regions step by step. The third row
We have used the Unet architecture [8] as our DL model to of Figure 2 shows how our model predicts correct polyp cells in
perform the polyp segmentation task. We trained the Unet model 2 × 2, 4 × 4, 8 × 8, 16 × 16, 32 × 32, 64 × 64, 128 × 128 and 256 × 256 grid
with a stacked input using a polyp image and a random grid image sizes, respectively. When we compare this row with the last row of
selected from the eight sizes. Then, the model was trained to predict the images of std, we can see that the model has high confidence for
converted GT which were formed by converting the real GT into a the identified polyp regions. For example, it shows high confidence
grid-based GT as in the previous section. (black color region) for the middle part of the polyps. In contrast,
The Unet model used dropout layers with the probability of 0.5. our model shows less confidence (yellow color region) for a polyps’
Then, we used our Unet model as a stochastic model to perform outer borders.
Monte Carlo sampling for the validation data. We kept our Unet
model in the training state to perform this sampling while pre- 4 CONCLUSION AND FUTURE WORK
dicting the output for the validation data. In the Pytorch library, In this paper, we presented a novel augmentation method called
which is used for all our implementations, we can do this simply by Pyramid-focus-augmentation (PYRA), which can be used to train
keeping the model state in the model.train() state. We iterated segmentation DL methods. Our method shows a large benefit in
50 times for a single input to predict the output. We calculated the the medical diagnosis use-case, by focusing a doctors’ attention on
mean from these 50 predictions, which is used as the final prediction regions with findings step by step.
for the competition and Standard Deviation (std) images to know Our experiments did not use post-processing to clean up output
the model’s confidence for the predictions. The whole training pro- corresponding to the input grid. In future work, we will evaluate
cess is illustrated in Figure 1 with an example image and a grid size our approach with additional post-processing steps for smaller
of 8 × 8 as an input. However, we submitted the predicted mean grid sizes. For example, we can do convolution operations to the
images for the gird size of 256 × 256 which generate predictions output using a convolutional window equal to the input grid size
with the size of true GT (without any transformations). All the to clean the results. However, post-processing techniques will not
experiments used a fixed learning rate of 0.001 with the RMSprop improve the final results when the grid size equals the input images’
optimizer [3], which were selected from preliminary experiments. resolution.
3 RESULT AND DISCUSSION 5 ACKNOWLEDGMENT
Table 1 summarizes the Mean Intersection over Union (mIoU) and The research has benefited from the Experimental Infrastructure for
the Dice Coefficient (DC) for the validation dataset and the test Exploration of Exascale Computing (eX3), which is financially sup-
dataset. The final results to the competition were collected from ported by the Research Council of Norway under contract 270053.
Medico Multimedia Task MediaEval’20, December 14-15 2020, Online
REFERENCES Zheng Rui, Jirka Borovec, Christian Vallentin, Semen Zhydenko, Kilian Pfeiffer,
[1] M. Akbari, M. Mohrekesh, E. Nasr-Esfahani, S. M. R. Soroushmehr, N. Karimi, Ben Cook, Ismael Fernández, François-Michel De Rainville, Chi-Hung Weng,
S. Samavi, and K. Najarian. 2018. Polyp Segmentation in Colonoscopy Images Abner Ayala-Acevedo, Raphael Meudec, Matias Laporte, and others. 2020. imgaug.
Using Fully Convolutional Network. In 2018 40th Annual International Conference https://github.com/aleju/imgaug. (2020). Online; accessed 01-Nov-2020.
of the IEEE Engineering in Medicine and Biology Society (EMBC). 69–72. https: [7] Ji Young Lee, Jinhoon Jeong, Eun Mi Song, Chunae Ha, Hyo Jeong Lee, Ja Eun
//doi.org/10.1109/EMBC.2018.8512197 Koo, Dong-Hoon Yang, Namkug Kim, and Jeong-Sik Byeon. 2020. Real-time
[2] Steven Alexander Hicks, Pia H Smedsrud, Pål Halvorsen, and Michael Riegler. detection of colon polyps during colonoscopy using deep learning: systematic
2018. Deep Learning Based Disease Detection Using Domain Specific Transfer validation with four independent datasets. Scientific Reports 10, 1 (2020), 8379.
Learning. Proc. of MediaEval. [8] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional
[3] Geoffrey Hinton, Nitish Srivastava, and Kevin Swersky. 2012. Neural networks networks for biomedical image segmentation. In International Conference on
for machine learning lecture 6a overview of mini-batch gradient descent. (2012). Medical image computing and computer-assisted intervention. Springer, 234–241.
[4] Debesh Jha, Steven A. Hicks, Krister Emanuelsen, Håvard Johansen, Dag Jo- [9] Vajira Thambawita, Debesh Jha, Hugo Lewi Hammer, Håvard D. Johansen, Dag Jo-
hansen, Thomas de Lange, Michael A. Riegler, and Pål Halvorsen. 2020. Medico hansen, Pål Halvorsen, and Michael A. Riegler. 2020. An Extensive Study on Cross-
Multimedia Task at MediaEval 2020: Automatic Polyp Segmentation. In Proc. of Dataset Bias and Evaluation Metrics Interpretation for Machine Learning Ap-
the MediaEval 2020 Workshop. plied to Gastrointestinal Tract Abnormality Classification. ACM Trans. Comput.
[5] Debesh Jha, Pia H Smedsrud, Michael A Riegler, Pål Halvorsen, Thomas de Lange, Healthcare 1, 3, Article 17 (June 2020), 29 pages. https://doi.org/10.1145/3386295
Dag Johansen, and Håvard D Johansen. 2020. Kvasir-seg: A segmented polyp [10] Vajira Thambawita, Debesh Jha, Michael Riegler, Pål Halvorsen, Hugo Lewi
dataset. In International Conference on Multimedia Modeling. Springer, 451–462. Hammer, Håvard D Johansen, and Dag Johansen. 2018. The medico-task 2018:
[6] Alexander B. Jung, Kentaro Wada, Jon Crall, Satoshi Tanaka, Jake Graving, Disease detection in the gastrointestinal tract using global features and deep
Christoph Reinders, Sarthak Yadav, Joy Banerjee, Gábor Vecsei, Adam Kraft, learning. Proc. of MediaEval (2018).