Medico 2021: Medical Image Augmentation and Segmentation
       using Combination of Segmentation Neural Networks
         Zeshan Khan, Mubasher Khan, Mubashir Yasin, Muhammad Hassan, Muhammad Atif Tahir
                                   {zeshan.khan,p180111,p180143,p176157,atif.tahir}@nu.edu.pk
                    FAST School of Computing, National University of Computer and Emerging Sciences, Pakistan

ABSTRACT                                                                             segmentation of the polyps. The basic idea of the detection was de-
Polyp identification is a critical task for the pre-detection of colon               convolution of the images before providing to the neural network
cancer. Proper removal of a polyp requires accurate estimation                       for the detection of pixels if these are polyps or not.
of the size and shape of the polyp. The identification of the size
and shape of the polyp can be done using polyps segmentation.                        3   APPROACH
This research investigated various polyps segmentation approaches                    The methodology of the research is data augmentation and seg-
evaluated on the benchmark dataset of Kvasir. The best results by                    mentation. The data augmentation is done by applying several
the use of UNet++ architecture on the augmented data resulted in                     noises and reshaping methodologies on the images. The sequence
an accuracy of 0.92 with a Dice coefficient of 0.53.                                 of the image operations for the noise and reshape are crop, adding
                                                                                     noise, horizontal and vertical flipping, mirroring, scaling, brightness
                                                                                     change, contrast, and sharpness. The parameters for the various
1    INTRODUCTION                                                                    augmentation operations were random in a range such that the
                                                                                     output image should be of size 224 × 224 with three channels. These
Capsule endoscopy has been used for endoscopic abnormalities                         operations resulted in augmented images. The augmentation is
diagnostics for more than 10 years. Endoscopic images provide diag-                  applied to generate 1300 images from the training set and the same
nosis capability for the detection of several abnormalities including                operation is applied on the ground truths as well for the augmenta-
various types of cancers in the Gastrointestinal Tract (GI-Tract),                   tion.
ulcer, and polyps detection. The analysis of such video frames takes                    The segmentation of the image is done using various neural
a lot of time of medical experts, which can be reduced by the use of                 networks and clustering techniques. The methodologies were eval-
computer-aided diagnostics. with the increase in processing powers                   uated on the evaluation data, which was taken as 20% from training
of computational machines, deep learning (DL) based automated                        data. The methodologies of the neural network approaches worked
diagnostics can result in good accuracy and efficiency.                              better on the evaluation data than clustering techniques.
                                                                                        The neural network approaches for the segmentation of the
2    RELATED WORK                                                                    images were based on auto-encoder architectures inspired by U-
The GI-Tract disease detection is an active area of research with                    Net [12], U-Net++ [15], ResUNet++ [9] and SegNet [2]. The various
the benefits of computer-aided diagnostics of various endoscopic                     auto-encoder approaches were evaluated on the validation dataset
diseases. There are various works on the segmentation of the polyps                  and the approaches of UNet++ architecture gave good accuracy and
using neural network approaches.                                                     the Dice score. Based on the validation data, the best approaches
   Jha et al. [9] investigated the semantic segmentation of polyps                   used for the segmentation of the test data were a 10-layered UNet++
in the GI-Tract. An auto-encoder-based architecture of ResUNet                       based auto-encoder and a ResUNet++ auto-encoder.
is used in the research for the segmentation of polyp. A modified
version of the ResUNet with the name ResUNet++ was proposed
[7].
   Trinh et al. [13] used an auto-encoder-based approach with the
replacement of RelU with the leaky ReLU. The Leaky-ReLU is a
ReLU with some dead neurons enhances the results by ignoring
some of the neurons in the computation of ReLU. The network for
the encoder and decoder used by the authors is based on resnet50
trained on imagenet [5]. The approach resulted in 0.95 accuracies
when tested on the MediaEval 2020 challenge dataset [8].
   Brandao et al. [4] converted Convolution Neural Network (CNN)
to Fully Connected Network (FCN) in their architecture for the
                                                                                                    Figure 1: Architecture of Unet++.

This research work was funded by Higher Education Commission (HEC) Pakistan
under NRPU Project 10225/2017.                                                          The UNet++ auto-encoder has 4 CNN layers in the encoder and
Copyright 2021 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).
                                                                                     then a CNN layer as a Bridge then another set 4 of CNN with the
MediaEval’21, December 13-15 2021, Online                                            same filters as in the encoder but in reverse. The auto-encoder
                                                                                     output is again passed through a CNN layer to compute the masks
MediaEval’21, December 13-15 2021, Online                                                                                        Khan et. al.


of the provided image. The architecture of the Unet++ is shown         the application of the Unet++ with Adam optimizer on augmented
in Figure 1. The auto-encoder is trained for the 100 epochs with a     images.
batch size of 25 based on the availability of the resources. These
100 epochs training with loss function of binary cross-entropy and                         Table 1: Test Data Results
learning rate of 0.0001 resulted in the best validation Dice and
accuracy.                                                               Model             Dice       Accuracy        Jaccard    P       R
   The auto-encoder build on the inspiration of the ResUNet++                                                        Index
[9] consists of encoder and decoder using residual blocks in the        UNet++ With       0.53       0.92            0.42       0.66 0.53
network. The encoder used one stem and three residual blocks of         Augmenta-
three convolutions in each of the residual and a stem block with        tion
some pooling and the fully connected layers. There is a similar         UNet++ With-      0.34       0.88            0.25       0.49    0.36
decoder block with the same number of convolution and pooling           out Augmen-
layers with the addition of the attention layers in between them.       tation
The decoder’s convolution layers are in a similar pattern as in         ResNet++          0.5        0.91            0.40       0.64    0.50
the encoder with the reverse order. The model is trained for the        With    Aug-
200 epochs with the learning rate of 0.0001 with a batch size of 8      mentation
images. Similar to the ResNet++, the loss function of the binary        ResNet++          0.34       0.88            0.25       0.45    0.37
cross-entropy was used with the Adam optimizer.                         Without Aug-
   The overall methodology of the training can be expressed as the      mentation
data augmentation and the segmentation by the Figure 2.                 Averaged All      0.44       0.91            0.34       0.65    0.40
                                                                        With    Aug-
                                                                        mentation

                                                                          The P in Table 1 is used for the precision and R for the recall. Ta-
                                                                       ble 1 shows a comparison of the test results for the top 5 approaches
                                                                       applied for the polyps segmentation. There are three evaluation
                                                                       measures used in the comparison. The Dice coefficient, pixel accu-
                                                                       racy, and the Jaccard Index. The Dice coefficient is the two times
                                                                       ratio of the common area of both images over the total area of both
                                                                       the images. The incorrect segmentation reduces the common area
                                                                       a lot and affects the Dice coefficient a lot. So, the Dice coefficient
                                                                       of the various approaches is 0.34 to 0.53 only. The pixel accuracy
                                                                       is the proportion of the pixels classified correctly over the total
                                                                       number of pixels. The polyps segmentation task has less than 20%
                Figure 2: System Architecture.                         of the image as polyps and the rest 80% image is not the polyps.
                                                                       So, most of the approaches where the polyps were present but not
                                                                       detected correctly caused a lesser impact on accuracy and the ac-
4   DATASET                                                            curacy remained higher than 0.88. The Jaccard is a measure of the
The dataset for the research is provided by the Simula Lab as Hyper    intersection over the union. The incorrectly segmented images can
Kvasir dataset [3] for the training and the test data as MediaEval     miss the huge portion of the intersection, polyps, that causes a low
2021 [6]. This data consists of 1360 RGB images for the segmentation   Jaccard Index.
of the polyps with the ground truth masks for these images. The
images were of various dimensions from 352 × 449 to 1072 × 1024        6   CONCLUSION AND FUTURE WORK
with RGB channels. The test data consists of 200 unlabeled RGB         The research is conducted using augmentation of the polyps images
images of sizes varying from 576 × 576 to 1072 × 1072.                 and segmentation using CNN-based auto-encoder architecture. The
                                                                       results of the approach show a good segmentation accuracy for
5   RESULTS AND ANALYSIS                                               the validation results. The results of the segmentation for some of
The various approaches for the polyps segmentation were done           the endoscopic images are not correct and the segmented region
on the training dataset with the 30% validation data from training.    for polyps in those images is of zero pixels. This problem of no
The various image segmentation networks and the pixel cluster-         detection can be resolved by the application of detection before
ing approaches were investigated with the 70% training and 30%         segmentation. In the future, we will investigate the detection as a
validation dataset for the evaluation measure of accuracy and Dice     pre-step for the segmentation of the polyps images based on Kvasir
coefficient. The task rules were the submission of the 5 best runs.    polyps detection datasets [10, 11]. The second problem in the results
So, the top 5 methodologies based on validation results were se-       is low segmentation accuracy for the images with light reflections
lected for the submission. The official results from organizers show   on the polyp [1, 14] which will be investigated for reflection removal
the evaluations as in Table 1. The best results were received by       methodologies to improve the segmentation accuracy.
Medico: Transparency in Medical Image Segmentation                                          MediaEval’21, December 13-15 2021, Online


REFERENCES
 [1] Mojtaba Akbari, Majid Mohrekesh, Kayvan Najariani, Nader Karimi, Shadrokh
     Samavi, and SM Reza Soroushmehr. 2018. Adaptive specular reflection detection
     and inpainting in colonoscopy video frames. In 2018 25th IEEE International
     Conference on Image Processing (ICIP). IEEE, 3134–3138.
 [2] Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. Segnet: A
     deep convolutional encoder-decoder architecture for image segmentation. IEEE
     transactions on pattern analysis and machine intelligence 39, 12 (2017), 2481–2495.
 [3] Hanna Borgli, Vajira Thambawita, Pia H Smedsrud, Steven Hicks, Debesh Jha,
     Sigrun L Eskeland, Kristin Ranheim Randel, Konstantin Pogorelov, Mathias Lux,
     Duc Tien Dang Nguyen, et al. 2020. HyperKvasir, a comprehensive multi-class
     image and video dataset for gastrointestinal endoscopy. Scientific data 7, 1 (2020),
     1–14.
 [4] Patrick Brandao, Odysseas Zisimopoulos, et al. 2018. Towards a computed-
     aided diagnosis system in colonoscopy: automatic polyp segmentation using
     convolution neural networks. Journal of Medical Robotics Research 3, 02 (2018),
     1840002.
 [5] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual
     learning for image recognition. In Proceedings of the IEEE conference on computer
     vision and pattern recognition. 770–778.
 [6] Steven Hicks, Debesh Jha, Vajira Thambawita, Hugo Hammer, Thomas de Lange,
     Sravanthi Parasa, Michael Riegler, and Pål Halvorsen. 2021. Medico Multime-
     dia Task at MediaEval 2021: Transparency in Medical Image Segmentation. In
     Proceedings of MediaEval 2021 CEUR Workshop.
 [7] Debesh Jha, Sharib Ali, Nikhil Kumar Tomar, Håvard D Johansen, Dag Johansen,
     Jens Rittscher, Michael A Riegler, and Pål Halvorsen. 2021. Real-time polyp
     detection, localization and segmentation in colonoscopy using deep learning.
     Ieee Access 9 (2021), 40496–40510.
 [8] Debesh Jha, Steven A Hicks, Krister Emanuelsen, Håvard Johansen, Dag Jo-
     hansen, Thomas de Lange, Michael A Riegler, and Pål Halvorsen. 2020. Medico
     multimedia task at mediaeval 2020: Automatic polyp segmentation. arXiv preprint
     arXiv:2012.15244 (2020).
 [9] Debesh Jha, Pia H Smedsrud, Michael A Riegler, Dag Johansen, Thomas De Lange,
     Pål Halvorsen, and Håvard D Johansen. 2019. Resunet++: An advanced architec-
     ture for medical image segmentation. In 2019 IEEE International Symposium on
     Multimedia (ISM). IEEE, 225–2255.
[10] Konstantin Pogorelov, Michael Riegler, Pål Halvorsen, Steven Hicks, Kristin Ran-
     heim Randel, Duc Tien Dang Nguyen, Mathias Lux, Olga Ostroukhova, and
     Thomas de Lange. 2018. Medico multimedia task at mediaeval 2018. In CEUR
     Workshop Proceedings, Vol. 2283. Technical University of Aachen, 1–4.
[11] Michael Riegler, Konstantin Pogorelov, Pål Halvorsen, Carsten Griwodz, Thomas
     Lange, Kristin Randel, Sigrun Eskeland, Duc-Tien Dang-Nguyen, Mathias Lux,
     and Concetto Spampinato. 2017. Multimedia for medicine: the medico task at
     mediaeval 2017. (2017).
[12] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional
     networks for biomedical image segmentation. In International Conference on
     Medical image computing and computer-assisted intervention. Springer, 234–241.
[13] Quoc-Huy Trinh, Minh-Van Nguyen, Thiet-Gia Huynh, and Minh-Triet Tran.
     2020. HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep
     Neural Network and U-Net for Polyps Segmentation. (2020).
[14] Rui Yao, Yilun Wu, Wei Yang, Xiaolin Lin, Shidan Chen, and Su Zhang. 2010.
     Specular reflection detection on gastroscopic images. In 2010 4th International
     Conference on Bioinformatics and Biomedical Engineering. IEEE, 1–4.
[15] Zhou Z., Rahman Siddiquee M.M., Tajbakhsh N., and Liang J. 2018. UNet++: A
     Nested U-Net Architecture for Medical Image Segmentation. IEEE transactions
     on pattern analysis and machine intelligence 11045 (2018). https://doi.org/10.
     1007/978-3-030-00889-5_1