ARTEFACT DETECTION AND SEGMENTATION BASED ON A DEEP LEARNING SYSTEM

                                           Xiaohong (Sharon) Gao1 , Barbara Braden2
               1
                   Department of Computer Science, Middlesex University, London, NW4 4BT, UK
                           2
                             John Radcliff Hospital, University of Oxford, Oxford, UK


                           ABSTRACT                                              Model parameters          Scored        Rank
                                                                                 Epoch = 412,
This paper presents the results of detection and segmentation                                          0.2020 0.0768       18
                                                                                 Threshold = 0.17
of artefact from endoscopic video frames for EAD2020 com-
                                                                                 Epoch = 226,
petition. In this competition, a deep learning based system                                            0.2205 0.0843        7
                                                                                 Threshold = 0.13
is applied, which is built upon RetinaNet. Since RetinaNet
employs a one-stage method that lacks facilitating masks of               Table 1. Leaderborad scores for the two submissions put in
segmented objects, inspired by the work of real-time instance             in the EAD2020 competition.
segmentation, this system accomplishes object segmentation
through two parallel branches to generate a set of prototype                                        mAP at IoU thresholds
masks and to predict per-object mask coefficients respec-                        Region type
                                                                                                 0.50  0.70     0.95 All
tively. Overall, top 7 (out of 32 entries) position was achieved                 box             77.85 61.64 0.99 47.42
in this competition on the leaderboard.                                          mask            76.91 61.51 0.87 46.17

                                                                                  Table 2. The mAP values for segmentation.
                          1. METHODS

Figure 1 illustrates the network system applied in this com-              [2] Sharib Ali, Felix Zhou, Adam Bailey, Barbara Braden,
petition [1, 2, 3, 4, 5, 6, 7, 8, 9] built upon RetinaNet. It ac-             James East, Xin Lu, and Jens Rittscher. A deep learning
complishes object segmentation through two parallel strands                   framework for quality assessment and restoration in video
(Prototype and Prediction coefficient), which are to generate                 endoscopy. arXiv preprint arXiv:1904.07073, 2019.
a set of prototype masks and to predict per-object mask co-               [3] Sharib Ali, Noha Ghatwary, Barbara Braden, Dominique
efficients respectively. The backbone model of Resnet101 is                   Lamarque, Adam Bailey, Stefano Realdon, Renato Can-
applied for all three tasks.

                           2. RESULTS

Table 1 presents the final results submitted from this work
whereas Table 2 gives the mAP for segmentation when 95%
the overall classification accuracy was 63%. Although the
results remain on the top 7, it is felt more enhancement is
needed to further improve this model to improve its robust-
ness.

                       3. REFERENCES

[1] Sharib Ali, Felix Zhou, Christian Daul, Barbara Braden,
    Adam Bailey, Stefano Realdon, James East, Georges
    Wagnieres, Victor Loschenov, Enrico Grisan, et al. En-
    doscopy artifact detection (ead 2019) challenge dataset.
    arXiv preprint arXiv:1905.03209, 2019.
    Copyright c 2020 for this paper by its authors. Use permitted under
Creative Commons License Attribution 4.0 International (CC BY 4.0).               Fig. 1. The network applied in competition.
    nizzaro, Jens Rittscher, Christian Daul, and James East.
    Endoscopy disease detection challenge 2020. arXiv
    preprint arXiv:2003.03376, 2020.
[4] Sharib Ali, Felix Zhou, Barbara Braden, Adam Bai-
    ley, Suhui Yang, Guanju Cheng, Pengyi Zhang, Xiao-
    qiong Li, Maxime Kayser, Roger D. Soberanis-Mukul,
    Shadi Albarqouni, Xiaokang Wang, Chunqing Wang,
    Seiryo Watanabe, Ilkay Oksuz, Qingtian Ning, Shufan
    Yang, Mohammad Azam Khan, Xiaohong W. Gao, Ste-
    fano Realdon, Maxim Loshchenov, Julia A. Schnabel,
    James E. East, Geroges Wagnieres, Victor B. Loschenov,
    Enrico Grisan, Christian Daul, Walter Blondel, and Jens
    Rittscher. An objective comparison of detection and seg-
    mentation algorithms for artefacts in clinical endoscopy.
    Scientific Reports, 10, 2020.
[5] T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollr. Fo-
    cal loss for dense object detection. In 2017 IEEE Inter-
    national Conference on Computer Vision (ICCV), pages
    2999–3007, 2017.
[6] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra
    Malik. Rich feature hierarchies for accurate object detec-
    tion and semantic segmentation. In Proceedings of the
    2014 IEEE Conference on Computer Vision and Pattern
    Recognition, page 580587, USA, 2014. IEEE Computer
    Society.
[7] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian
    Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexan-
    der C. Berg. Ssd: Single shot multibox detector. In ECCV
    (1), volume 9905 of Lecture Notes in Computer Science,
    pages 21–37. Springer, 2016.
[8] Joseph Redmon, Santosh Kumar Divvala, Ross B. Gir-
    shick, and Ali Farhadi. You only look once: Unified,
    real-time object detection. 2016 IEEE Conference on
    Computer Vision and Pattern Recognition (CVPR), pages
    779–788, 2015.
[9] Daniel Bolya, Chong Zhou, Fanyi Xiao, and Yong Jae
    Lee. Yolact: Real-time instance segmentation. In ICCV,
    2019.