=Paper=
{{Paper
|id=Vol-3181/paper32
|storemode=property
|title=HCMUS-Juniors at Medico Polyp Segmentation Task 2021: Efficient U-Net for
						Polyps Segmentation
|pdfUrl=https://ceur-ws.org/Vol-3181/paper32.pdf
|volume=Vol-3181
|authors=Quoc-Huy Trinh,Trong-Hieu Nguyen Mau,Minh-Van Nguyen,Van-Son Ho,Tan-Cong Nguyen,Hai-Dang Nguyen,Minh-Triet Tran
|dblpUrl=https://dblp.org/rec/conf/mediaeval/TrinhMNHNNT21
}}
==HCMUS-Juniors at Medico Polyp Segmentation Task 2021: Efficient U-Net for
						Polyps Segmentation==
<pdf width="1500px">https://ceur-ws.org/Vol-3181/paper32.pdf</pdf>
<pre>
         HCMUS-Juniors at Medico Polyp Segmentation Task 2021:
               Efficient U-Net for Polyps Segmentation
                       Quoc-Huy Trinh1,4 , Trong-Hieu Nguyen Mau1,4 ,Minh-Van Nguyen1,4 ,
                 Van-Son Ho 1,4 , Tan-Cong Nguyen 2,3,4 , Hai-Dang Nguyen1,3,4 , Minh-Triet Tran1,3,4
                                     1 Faculty of Information Technology, University of Science, VNU-HCM
                                             2 University of Social Sciences and Humanities, VNU-HCM
                                                         3 John von Neumann Institute, VNU-HCM
                                              4 Vietnam National University, Ho Chi Minh city, Vietnam

     {20120013,20127094,20120081,20120021}@student.hcmus.edu.vn,ntcong@hcmussh.edu.vn,nhdang@selab.hcmus.edu.
                                             vn,tmtriet@fit.hcmus.edu.vn
1    ABSTRACT                                                                        dataset. This dataset consists of 1000 Ground Truth images with
Medico task in the Mediaeval with the target to segment the Polyps                   masks to experiment on the segmentation tasks. With the test
in the endoscopic images. In this paper, we propose methods that                     dataset, we evaluate our on the test dataset of Medico task orga-
use Efficient Unet and propose the Multiscale Efficient Unet to deal                 nizer of Mediaeval.
with this task. In the experiment, we also benchmark our method
with others previous methods.

2    INTRODUCTION
With the developing of bio-medical and the information technol-
ogy, medical images now are stored on a digital database. Moreover,
with the increase of cases that have abnormal findings and symp-
toms on the digestive system, it is necessary to have a system to                    Figure 1: Polyps and corresponding masks from Hyper Kvasir
help the doctor accurately diagnose and detect the position of the                   Segmented.
abnormal in the medical images. That is why many methods have
been proposed to help diagnose the polyps or the abnormal in the
digestive system through endoscopic images.
On the other hand, the improvement of the Convolutional Neural
Network architecture leads to improving the task for the segmen-
tation of the medical images, and several architectures have been
proposed such as U-Net[7], PSP-Net[9], PraNet[3], etc. However,
there are many drawbacks in each method and need to be improved                            Figure 2: Examples polyps from the test images.
and many challenges to the researchers to improve the performance
of their methods.[8]
The goal of the Medico automatic polyp segmentation challenge                        4     METHODS
is to evaluate various methods for automatic polyp segmentation                      We consider five solutions corresponding to our five submitted
that can be used to detect and mask out various types of polyps                      runs. To evaluate the performance of the proposed method, we also
(including irregular, small or flat polyps) with high accuracy[4]. In                compare the results with those from other methods, such as are
this challenge, our goals are to segment the mask of all types of                    ResUNet and PraNet.
polyps in the dataset.[5]
                                                                                     4.1    Propose Architecture
3    DATASET                                                                         In our proposed methods, we propose the architecture that uses the
To evaluate our proposed method, we use the Hyper Kavsir dataset                     Efficient Net for the encoder block. Moreover, we propose using the
proposed in 2020. This open dataset includes a comprehensive                         low scale feature to get a better mask of the output and improve the
multi-class image and video dataset for gastrointestinal endoscopy,                  model’s performance, which is Multiscale Efficient U-Net. This ar-
including the ground truth with mask and the bounding boxes value                    chitecture includes three main blocks Multiscale Block (MC Block),
for the multi-task on the endoscopic images.[2]                                      EfficientNet Encoder block (EEN Block), and Decoder Block.
In this task, we use the segmentation part of the Hyper Kvasir                       Initially, the input with the shape (𝑤, ℎ, 𝑐) passes through the MC
                                                                                     Block; this block includes 3 Max Pooling layers, 2 Convolution 2D
Copyright 2021 for this paper by its authors. Use permitted under Creative Commons
                                                                                     layers, and 1 Batch Normalization layer. MC layers to create the
License Attribution 4.0 International (CC BY 4.0).
MediaEval’21, December 13-15 2021, Online                                            new low-scale feature for the model[6]. The parameter of the layers
                                                                                     can be changed to adapt to the feature representation of images.
MediaEval’21, December 13-15 2021, Online                                                                                                       Quoc-Huy Trinh et al.


From the output of the MC block, there are two types of features:                              5    RESULTS
low scale and high scale features. Then these features pass the en-                            Evaluation is on the test set of Mediaeval- Medico task, which in-
coder block created by the Efficient Encoder Block, with high scale                            cludes 200 images of Polyps in endoscopic images. The Benchmark
features, they map to the decoder block while low scale features                               table shows that the Efficient Unet model with low features per-
pass the encoder block.                                                                        forms better than the original Efficient Unet and PraNet overall.
After passing the EEN Blocks, the feature will be concatenated at                              However, if we compare Precision or Recall metrics, the model gets
the block Encoder 4, then continue to the Decoder Block.                                       a lower performance than the other two models.


                               Input               Ground Truth                 Output

                                                                      Jaccard
                                                                       Loss


                            MC-Block

                              E-Block 1                                         D-Block 1
                                                                                 Concat
             EfficientNet


                              E-Block 2                                         D-Block 2
                                                                                 Concat

                              E-Block 3                                         D-Block 3                          Figure 5: Visualization result
                                                                                 Concat

                                                      E-Block 4

                                       Multi-scale Block (MC-Block)                            The performance of the proposed architecture is positive. The mask
                Max Pool                        Batch        Max Pool                 Max
                  2D
                                 Conv2D
                                             Normalization     2D
                                                                            Conv2D
                                                                                     Pool 2D   can cover almost all the tissue on the images, and it can cover cases
                                                                                               with difficult shapes.
      Figure 3: Multiscale Efficient UNet presentation
                                                                                               Method          Jaccard    Dice    Recall   Precision   Accuracy     F1
By using MC Block, the model can use the low-scale feature to                                  EfficientUnet    0.6572   0.7425   0.7264    0.8442      0.9529    0.7425
enrich the feature in the learning process, particularly when the                              MCEU            0.7059    0.7961   0.8167    0.8295      0.9565    0.7961
model has to adapt to the small dataset.                                                       PraNet           0.6929   0.7774   0.8204    0.8160      0.9511    0.7774
There is a limitation for this architecture because there are two                              ResUnet          0.6739   0.7737   0.8371    0.7766      0.9495    0.7737
types of features to the encoder block; this is why this architecture                          Table 1: MediaEval 2021 challenge’s result base on the team
costs more computing resources than the traditional U-Net.                                     method.

4.2    Loss Function                                                                           With the benchmark table, the Efficient Unet that uses the low
                                                                                               feature achieves the high score. The reason is the data for the
To use the proposed architecture, we propose using the Jaccard
                                                                                               training and validation is the limitation, and augmentation can
Loss Function with the following formula [1]:
                                           Í                                                   be used to enrich the quantity of data. However, some features
                                       𝛼+ 𝐶  𝑐 𝑦𝑐 ∗ 𝑦ˆ𝑐                                        can be as similar as the original sample. That is why the lower
      𝐽𝑎𝑐𝑐𝑎𝑟𝑑𝐿𝑜𝑠𝑠 (𝑦, 𝑦)
                      ˆ = 𝛼 ∗ (1 −                         ) (1)
                                                                                               scale feature can help the model adapt better to the low quantity of
                                      Í
                                   𝛼+ 𝐶 𝑐 𝑐 𝑦ˆ − 𝑦𝑐 ∗ 𝑦ˆ𝑐
                                          𝑦  +  𝑐

This loss function enable the segmentation process better and can                              sample dataset.
control the performance of model on the pitch of the tissues.
                                                                                               6    CONCLUSION
4.3    Data Augmentation                                                                       In general, we propose the Multiscale Efficient U-Net to deal with
To enrich the dataset, we propose some augmentation methods. We                                the segmentation task. MCEU has the merit that can enrich the
use Center Crop, Random Rotate, GridDistortion, Horizontal, and                                feature for the training process. Moreover, this architecture can
Vertical Flip to improve the quantity of the dataset.                                          help normalize the high-scale feature to help the model adapt to
Following is the sample of the data after augmentation:                                        the small dataset; however, some limitations exist. Regarding the
                                                                                               evaluation of the experiment, the result we achieved is quite pos-
                                                                                               itive, compared to the PraNet, Res-UNet, and Efficient-UNet, our
                                                                                               model achieves better performance. This positive impact can help
                                                                                               the later architecture have another approach to deal with this task.

                                                                                               ACKNOWLEDGMENT
                                                                                               This research is funded by Vietnam National University Ho Chi
                            Figure 4: Data Augmentation                                        Minh City (VNU-HCM) under grant number DS2020-42-01.
Medico: Transparency in Medical Image Segmentation                                                                     MediaEval’21, December 13-15 2021, Online


REFERENCES                                                                                     Proceedings of MediaEval 2021 CEUR Workshop.
[1] Jeroen Bertels, Tom Eelbode, Maxim Berman, Dirk Vandermeulen, Frederik Maes,           [5] Debesh Jha, Steven A. Hicks, Krister Emanuelsen, Håvard Johansen, Dag Johansen,
    Raf Bisschops, and Matthew B. Blaschko. 2019. Optimizing the Dice Score and                Thomas de Lange, Michael A. Riegler, and Pål Halvorsen. 2020. Medico Multimedia
    Jaccard Index for Medical Image Segmentation: Theory and Practice. Medical                 Task at MediaEval 2020: Automatic Polyp Segmentation. In Proc. of the MediaEval
    Image Computing and Computer Assisted Intervention – MICCAI 2019 (2019), 92–100.           2020 Workshop.
    https://doi.org/10.1007/978-3-030-32245-8_11                                           [6] Pardha Saradhi Mittapalli and Thanikaiselvan V. 2021. Multiscale CNN with
[2] Hanna Borgli, Vajira Thambawita, Pia H Smedsrud, Steven Hicks, Debesh Jha,                 compound fusions for false positive reduction in lung nodule detection. Artificial
    Eskeland Sigrun L, Kristin Ranheim Rand l, Konstantin Pogorelov, Mathias Lux,              Intelligence in Medicine 113 (2021), 102017. https://doi.org/10.1016/j.artmed.2021.
    Duc Tien Dang Nguyen, Dag Johansen, Carsten Griwodz, Stensland Håkon K,                    102017
    Enrique Garcia-Ceja, Peter T Schmidt, Hugo L Hammer, Michael A Riegler, Pål            [7] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional
    Halvorsen, and Thomas de Lange. 2020. HyperKvasir, a comprehensive multi-class             Networks for Biomedical Image Segmentation. LNCS 9351, 234–241. https:
    image and video dataset for gastrointestinal endoscopy. Scientific Data 7, 1 (2020),       //doi.org/10.1007/978-3-319-24574-4_28
    283. https://doi.org/10.1038/s41597-020-00622-y                                        [8] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional
[3] Deng-Ping Fan, Ge-Peng Ji, Tao Zhou, Geng Chen, Huazhu Fu, Jianbing Shen,                  Networks for Biomedical Image Segmentation. In Medical Image Computing and
    and Ling Shao. 2020. PraNet: Parallel Reverse Attention Network for Polyp                  Computer-Assisted Intervention – MICCAI 2015, Nassir Navab, Joachim Hornegger,
    Segmentation. In Medical Image Computing and Computer Assisted Intervention                William M. Wells, and Alejandro F. Frangi (Eds.). Springer International Publishing,
    – MICCAI 2020, Anne L. Martel, Purang Abolmaesumi, Danail Stoyanov, Diana                  Cham, 234–241.
    Mateus, Maria A. Zuluaga, S. Kevin Zhou, Daniel Racoceanu, and Leo Joskowicz           [9] Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2016.
    (Eds.). Springer International Publishing, Cham, 263–273.                                  Pyramid Scene Parsing Network. CoRR abs/1612.01105 (2016). arXiv:1612.01105
[4] Steven Hicks, Debesh Jha, Vajira Thambawita, Hugo Hammer, Thomas de Lange,                 http://arxiv.org/abs/1612.01105
    Sravanthi Parasa, Michael Riegler, and Pål Halvorsen. 2021. Medico Multime-
    dia Task at MediaEval 2021: Transparency in Medical Image Segmentation. In

</pre>