-

Eifcient S upervision N et: P olyp S egmentation Using EifcientNet and Attention Unit

Sabari Nathan

Suganya Ramamoorthy

1 0 Couger Inc , Japan 1 Thiagarajar college of engineering

2020

14 15

Colorectal cancer is the third most common cause of cancer worldwide. In the era of medical Industry, identifying colorectal cancer in its early stages has been a challenging problem. Inspired by these issues, the main objective of this paper is to develop a Multi supervision net algorithm for segmenting polys on a comprehensive dataset. We have taken the Medico polyp challenge dataset, which consists of 1000 segmented polyp images from the gastrointestinal tract. We proposed an eficient Net B4 as a pre-trained architecture in multi-supervision net. The model is trained with multiple output layers. We present quantitative results on colorectal dataset to evaluate the performance and achieved good results in all the performance metrics. The experimental results proved that the proposed model is robust and provides a good level of accuracy in segmenting polyps on a comprehensive dataset for diferent metrics such as dice coeficient, recall, precision, and F2.

( 1 ) We created a novel Multi-Supervision Net architecture is proposed, i.e., the model is trained with multiple output layer. ( 2 ) We evaluated the proposed architecture on a challenging polyp segmentation data set. ( 3 ) We used EficientNetB4 as a backbone for the proposed architecture. ( 4 ) Achieved good experimental results in terms of accuracy,

F1 score, and loss.

METHODS

To deal with the challenge of a robust Medico automatic polyp segmentation task, we have proposed a Multi-Supervision Net architecture. The detailed architecture is presented in Figure 3. In the following subsection, we will discuss the details of each module we used in the proposed architecture. Our proposed polyp segmentation method uses a deep convolutional network to learn a connection between the input polyps’ images. The overall block diagram of the proposed architecture is shown in Figure 3, which mainly consist of five layers ( 1 ) Convolutional layer, ( 2 ) Eficient layer [ 6], ( 3 ) Encoder block with eficient Net B4 ( 4 ) Decoder block combination of dense block and Concurrent Spatial and Channel Attention (CSCA) [5] blocks ( 5 ) Convolution Block Attention Module (CBAM) [7]. In this architecture, the input image pixels 332×487 is resized to 384×256 and divided by 255 are passed in to the convolutional layer. The proposed network was inspired by the multilevel hyper vision Net [1] and had properties of an encoder, decoder structure with supervision layers. Encoder block used eficient Net B4 as the backbone of the architecture. Decoder block consists of a combination of dense block [2] and Concurrent Spatial and Channel Attention (CSCA) block. Both the encoder and decoder are connected by Convolution block attention CBAM block. All the output of encoder are supervision, i.e., take individual decoder output and upsampled with output layer and supervised by the loss function, and also all upsampled output are concatenated and fed into CBAM. Totally six outputs are obtained from our proposed architecture. In the upsampling, we have used a convolution transpose layer. CBAM, an efective attention module for feed-forward convolutional neural networks. CBAM has two sequential sub-modules: channel attention module and spatial attention module. The intermediate feature map is adaptively refined through CBAM at every convolutional block of deep networks. The CBAM block is utilized to generate spatial attention on the channel attention of the encoder output and the decoder output. The output of this attention is added with the input image. Equation 1 and Equation 2 shows the details of mathematical operations = [ 1 0

! + 1 ( ( )))] = [ 0

∥ ( ) ∥] 2.2

Decoder part

The output of the CSCA block was passed into CBAM. The CBAM shares the processing results into another dense block. Now the algorithm starts from a bottom-up approach. Add the dense block and 5th eficient layer output results and forward to the consecutive dense and CSCA blcok. The CSCAB was connected to the average pooling and convolutional layer. The result of pooling and convolution is fed into the output layer. Now upsample the image or results of CSCA block and concatenate with 4th eficient layer, and remaining are same as the previous layer. After completing the operations up to 5 times, we concatenate every “average pooling + conv layer” result pass the output into CBAM and another concatenator. Finally, the concatenator sent output to the output layer. The output layer produces the final result of the given image. Loss function combination of categorical and dice loss 3 3.1

RESULTS Dataset

The Kvaris-SEG [4] contains 1,000 polyp images and their corresponding ground truth mask, as shown in Figure 1. The dataset was collected from real routine clinical examinations at Baerum Hospital in Norway by expert gastroenterologists. The resolution of images varies from 332 × 487 to 1920 × 1072 pixels. Some of the images contain a green thumbnail in the lower-left corner of the images showing the scope position marking from the ScopeGuide (Olympus) refer to Figure 2. The Medico 2020 team annotate another separate dataset and use it as the test set to benchmark our proposed approach. Figure 2 shows some examples of test images used in the challenge. 3.2

Training

The proposed architecture has been trained and validated with the Medico automatic polyp segmentation challenges Task 1: Polyp Segmentation dataset. The proposed network is inspired by the Hyper Vision Net [4]. The shared dataset consists of 1000 segmented polyp images from the gastrointestinal tract images. We randomly split it into 70 percentage for training and 30 percentage for validation from the whole dataset. We used the Adam optimizer with an initial learning rate of 0.001. The learning rate was decreased to 0.00001 while we trained our models for 500 epochs. The proposed network was trained with the IntelCore i7 processor, GeForce GTX We divided the image by 255 to normalize it between 0-1. We applied data augmentation techniques such as HorizontalFlip, VerticalFlip, Blur (limit = 3), and Rotate(-10, 10) to increase the image count. A mixture of two diferent loss functions is used to supervise the network model outputs: categorical cross-entropy and dice loss. In this work, Adam optimizer is applied, which perfectly revise network weights in an iterative approach in training data. We used Adam optimizer with a learning rate of 0.001 to 0.00001 and 500 epochs for training the model. Our Loss function is defined by categorical Cross entropy and dice loss. 4

DISCUSSION

Currently, there is a growing interest in the development of computeraided diagnosis (CADx) systems that could act as a second observer and digital assistant for endoscopists. Algorithmic benchmarking is an eficient approach to analyze the results of diferent methods. The task will use mean Intersection over Union (mIoU), or Jaccard index, as an evaluation metric, which is a standard metric for all medical segmentation task. Our proposed algorithm shows the Jaccard value as 0.777 in run 1 and run 2. The other evaluation metrics, such as dice coeficient, precision, recall, F2, and frame per second (FPS) for a comprehensive evaluation, also show efectiveness. In the challenge overview paper [3], the organizers will be calculating the metrics such as the Dice coeficient, mIoU, recall, precision, overlap, F2, FPS, the method submitted by each team and presented in Table 1. 5

CONCLUSION

We have presented a novel and unique Multi-supervision Net with EficientNetB4 as the architecture’s backbone to improve image segmentation accuracy under diferent factors. We accomplished this by training a multilevel attention network to take images from the Medico challenge 2020 polyp segmentation dataset. Moreover, we present a CSCA block in the decoder to improve image quality. As a major contribution, CBAM enhances the overall mechanism and utilizes significant features from the Encoder block. Many evaluations and comparisons to previous state-of-the-art approaches show that we can achieve good performance in qualitative and quantitative experimentation and proved the eficacy of the proposed architecture.

[1]

Sabarinathan ,

M.Parisa

Beham , and

S.M.

Md . Mansoor Roomi . 2019 . Hyper Vision Net: Kidney Tumor Segmentation Using Coordinate Convolutional Layer and Attention Unit . arXiv preprint arXiv: 1908 .03339.

[2]

Gao

Huang , Zhuang Liu, Laurens van der Maaten, and Kilian

Weinberger . 2017 . Densely Connected Convolutional Networks . arXiv:1608 . 06993 .

[3]

Debesh

Jha , Steven A. Hicks , Krister Emanuelsen, Håvard Johansen, Dag Johansen, Thomas de Lange, Michael A . Riegler , and Pål Halvorsen . 2020 . Medico Multimedia Task at MediaEval 2020: Automatic Polyp Segmentation . In Proc. of the MediaEval 2020 Workshop.

[4]

Debesh

Jha , Pia H Smedsrud , Michael A Riegler, Pål Halvorsen , Thomas de Lange, Dag Johansen, and

Håvard D

Johansen . 2020 . Kvasir-SEG: A segmented polyp dataset . In Proc. of International Conference on Multimedia Modeling . 451 - 462 .

[5]

Abhijit

Guha Roy , Nassir Navab, and

Christian

Wachinger . 2018 . Concurrent Spatial and Channel Squeeze Excitation in Fully convolutional Networks . arXiv: 1803 .02579.

[6]

Mingxing

Tan and Quoc V. 2019 . EficientNet: Rethinking Model Scaling for Convolutional Neural Networks . arXiv: 1905 .11946.

[7] Woo and Sanghyun . 2018 . CBAM: Convolutional block attention module . In Proceedings of the European Conference on Computer Vision (ECCV).