=Paper= {{Paper |id=Vol-2595/endoCV2020_Choi_et_al |storemode=property |title=Centernet-based Detection Model And U-net-based Multi-class Segmentation Model For Gastrointestinal Diseases |pdfUrl=https://ceur-ws.org/Vol-2595/endoCV2020_paper_id_32.pdf |volume=Vol-2595 |authors=Yoon Ho Choi,Yeong Chan Lee,Sanghoon Hong,Junyoung Kim,Hong-Hee Won,Taejun Kim |dblpUrl=https://dblp.org/rec/conf/isbi/ChoiLHKWK20 }} ==Centernet-based Detection Model And U-net-based Multi-class Segmentation Model For Gastrointestinal Diseases== https://ceur-ws.org/Vol-2595/endoCV2020_paper_id_32.pdf
         CENTERNET-BASED DETECTION MODEL AND U-NET-BASED MULTI-CLASS
              SEGMENTATION MODEL FOR GASTROINTESTINAL DISEASES

     Yoon Ho Choi1 , Yeong Chan Lee2 , Sanghoon Hong2 , Junyoung Kim3 , Hong-Hee Won2† , Taejun Kim2,3†
 1
     Dept. of Health Sciences & Tech., Samsung Advanced Institute for Health Sciences & Tech. (SAIHST),
                Sungkyunkwan University, Samsung Medical Center, Seoul, Republic of Korea
       2
         Dept. of Digital Health, Samsung Advanced Institute for Health Sciences and Tech. (SAIHST),
               & Sungkyunkwan University, Samsung Medical Center, Seoul, Republic of Korea
                   3
                     Dept. of Medicine, Samsung Medical Center, Sungkyunkwan University
                                      School of Medicine, Seoul, Korea


                           ABSTRACT                                       adenoma detection rates of colorectal polyps significantly
                                                                          increased when endoscopists co-worked with real-time auto-
From the perspective of the computer-aided diagnosis sys-
                                                                          matic detection system [2]. Another randomized controlled
tem, it is important to build automated techniques that detect
                                                                          trial showed that deep convolutional neural network using
and diagnose lesions to reduce the missing rate of clinicians.
                                                                          deep reinforcement learning achieved real-time monitoring
Recently, various diagnosis techniques using computer vi-
                                                                          blind spots with a high accuracy during esophagogastroduo-
sion and artificial intelligence have been developed.However,
                                                                          denoscopy [3].
they need to diagnose various lesions more accurately to be
                                                                               We participated in sub-challenge II: Endoscopic Dis-
used in actual clinical practice. Accordingly, we developed
                                                                          ease Detection and Segmentation (EDD2020) of Endoscopy
CenterNet-based object detection model and U-Net-based
                                                                          Computer Vision Challenges on Segmentation and Detection
class-wise binary segmentation model. These models were
                                                                          (EndoCV2020). Deep learning models were developed for
trained with random augmentation methods including color
                                                                          detecting or segmenting lesions from 4 different organs for
and morphological changes. For the 43 test set images, our
                                                                          this challenge.
model shows 0.1932 ± 0.0622 of mean average precision
                                                                               For this challenge, A CenterNet-based model was de-
with standard deviation in detection, and 0.2544 ± 0.2080 of
                                                                          signed to detect lesions and a class-wise U-Net-based model
semantic score in segmentation.
                                                                          was developed to segment lesions.

                                                                                                 2. DATASETS
                     1. INTRODUCTION
                                                                              In total, 386 endoscopic images of the training set were
    Endoscopists can recognize diverse lesions related with               obtained from 5 multi-centers [4]. Every image was assigned
digestive disorders in gastrointestinal organs through en-                to at least 1 class from 5 disease classes with Barretts esoph-
doscopic examinations. The detected lesion is clinically                  agus (BE), high grade dysplasia (HGD), cancer, polyp and
managed or resected in compliance with medical guidelines.                suspicious region from 4 different organs. These images had
However, it is not typically diagnosed until the results of               corresponding bounding boxes and pixel-level labels of each
pathological examination are known. Some endoscopic ex-                   lesion and were annotated by medical experts. The number of
aminations are effective for the early diagnosis and prevention           images in the entire training set was imbalanced across dis-
of gastrointestinal disease, but detecting lesions is highly de-          ease classes. (BE : 160, HGD : 74, cancer : 53, polyp : 127,
pendent on the skill and experience of the endoscopists. For              and suspicious : 88).
example, some studies have reported that the missing rate of
polyps during colonoscopy ranges from 17% to 28% [1].
    Recently, computer-aided system has remarkably im-                                           3. METHODS
proved with medical imaging. Especially, recent studies have
shown that artificial intelligence can meet the endoscopists’             3.1. Image preprocessing
needs. A prospective randomized controlled trial showed that
                                                                              Class imbalance can lead to biased results towards a par-
    Copyright c 2020 for this paper by its authors. Use permitted under   ticular class during the training of the model. Thus, prior
Creative Commons License Attribution 4.0 International (CC BY 4.0).       to image pre-processing, we randomly duplicated images
                         Fig. 1. The architecture of U-Net-based class-wise binary segmentation model


in insufficient classes to balance the number of images in         performance in real-time target detection, we applied Center-
all classes. At this point, it was important to minimize the       Net to endoscopic disease detection. Our CenterNet-based
number of duplicated images, since indiscriminately dupli-         EDD detection model predicts the center points of the le-
cated images may cause substantial bias in the trained model.      sions, offsets to the x and y axes, and the width and height of
Therefore, every round we identified the class with the high-      bounding boxes.
est number of images and the class with the lowest number.             The backbone architecture of our detection model is a
Then we randomly duplicated images of the lowest class. To         ResNet50 [6] model pre-trained on the PASCAL VOC 2012
ensure that, the images containing objects of the highest class    and EDD2020 datasets [4] for multiclass classification. We
were excluded from the random duplication.                         fine-tuned this detection model with the following training
    After balancing the number of images belonging to each         options. The batch size and epoch were 8 and 150 times, re-
class, we preprocessed the training data to reduce overfit-        spectively, and the initial learning rate was 5e-4 and divided
ting of our models to it and generalize the models to the test     by 10 after every 80 epochs. The input image size was 512
data. Firstly, all images in the training data were standardized   and the test image was restored to its original size by apply-
for each channel and randomly augmented 86 times using             ing an affine transformation. The threshold of the confidence
rotation, flipping, contrast enhancement, and brightness ad-       score was set to 0.2.
justment. Next, to train the model with invariant properties
for the scale, we randomly changed the resolution of the
original image from 320 to 602 every 10 epochs and then            3.3. Model development for segmentation
converted it to a size of 512 × 512 pixels.
                                                                       For disease segmentation, we modified the decoder part of
                                                                   Vanilla U-Net [7] to build a multi- class segmentation model
3.2. Model development for detection                               that can infer independent result for each class. Because some
                                                                   classes overlap with other disease classes in the EDD2020
    For disease detection, we focused on single-stage object       data, it would be inappropriate to implement general multi
detection model with fast execution speed that is appropri-        class segmentation that constitutes the final layer as softmax
ate for real-time object detection and can possibly be used        operation. Therefore, we replaced the final layer of vanilla U-
in clinical practice because the endoscopic image consist of       Net with class-wise binary segmentation branches for multi
video frame images rather than still images.                       class segmentation. As shown in Fig 1, we designed a branch
    CenterNet was shown to work more simply and efficiently        structure in which the last up-convolution layer of U-Net per-
by predicting both key points and bounding boxes of objects        formed segmentations for each class independently. Through
in images at the same time instead of sliding anchors that         these branches, the class-wise binary segmentation model was
compute image features by identifying possible bounding            trained by dice similarity coefficient loss. The same backbone
boxes [5]. Because it has recently demonstrated excellent          architecture used for the detection model was used for of our
segmentation model. Training of our segmentation model was         [4] Sharib Ali, Noha Ghatwary, Barbara Braden, Dominique
carried out with batch size of 4 and 150 epochs, and the ini-          Lamarque, Adam Bailey, Stefano Realdon, Renato Can-
tial learning rate was 5e-4 and divided by 10 after every 80           nizzaro, Jens Rittscher, Christian Daul, and James East.
epochs.                                                                Endoscopy disease detection challenge 2020. arXiv
                                                                       preprint arXiv:2003.03376, 2020.

                        4. RESULTS                                 [5] Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi,
                                                                       Qingming Huang, and Qi Tian. Centernet: Keypoint
For the 43 test set images, our model showed mean average              triplets for object detection. In Proceedings of the IEEE
precision of 0.1932 ± 0.0622 in detection, and semantic score          International Conference on Computer Vision, pages
of 0.2544 ± 0.2080 in segmentation.                                    6569–6578, 2019.
                                                                   [6] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian
           5. DISCUSSION & CONCLUSION                                  Sun. Deep residual learning for image recognition. arxiv
                                                                       2015. arXiv preprint arXiv:1512.03385, 2015.
EndoCV2020 is an annual global competition for detecting           [7] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-
and segmenting lesions of endoscopic images from gastroin-             net: Convolutional networks for biomedical image seg-
testinal organs. We developed deep learning models for each            mentation. In International Conference on Medical im-
task. The detection model achieved mean average precision              age computing and computer-assisted intervention, pages
of 0.1932 ± 0.0622 and the segmentation model achieved se-             234–241. Springer, 2015.
mantic score of 0.2544 ± 0.2080 in the test dataset.
    The challenging problem was extremely small data size.
Only 386 images were given as a training set to classify and
localize 5 imbalanced classes. Even suspicious class literally
comprised unclear regions that endoscopists could not define.
To overcome this problem, the images of minority classes
from the training set were oversampled to balance with other
classes, and all images were augmented through various im-
age preprocessing techniques.
    Further research is required to develop an artificial intel-
ligence model that can fulfill the standard for practical endo-
scopic examination.


                     6. REFERENCES

[1] Nam Hee Kim, Yoon Suk Jung, Woo Shin Jeong, Hyo-
    Joon Yang, Soo-Kyung Park, Kyuyong Choi, and Dong Il
    Park. Miss rate of colorectal neoplastic polyps and risk
    factors for missed polyps in consecutive colonoscopies.
    Intestinal research, 15(3):411, 2017.

[2] Pu Wang, Tyler M Berzin, Jeremy Romek Glissen Brown,
    Shishira Bharadwaj, Aymeric Becq, Xun Xiao, Peixi Liu,
    Liangping Li, Yan Song, Di Zhang, et al. Real-time auto-
    matic detection system increases colonoscopic polyp and
    adenoma detection rates: a prospective randomised con-
    trolled study. Gut, 68(10):1813–1819, 2019.

[3] Lianlian Wu, Jun Zhang, Wei Zhou, Ping An, Lei Shen,
    Jun Liu, Xiaoda Jiang, Xu Huang, Ganggang Mu, Xinyue
    Wan, et al. Randomised controlled trial of wisense, a real-
    time quality improving system for monitoring blind spots
    during esophagogastroduodenoscopy. Gut, 68(12):2161–
    2169, 2019.