=Paper=
{{Paper
|id=Vol-1984/Mediaeval_2017_paper_48
|storemode=property
|title=A Comparison of Deep Learning with Global Features for Gastrointestinal Disease Detection
|pdfUrl=https://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_48.pdf
|volume=Vol-1984
|authors=Konstantin Pogorelov,Michael Riegler,Pål Halvorsen,Carsten Griwodz,Thomas de Lange,Kristin Ranheim Randel,Sigrun Losada Eskeland,Duc-Tien Dang-Nguyen,Olga Ostroukhova,Mathias Lux,Concetto Spampinato
|dblpUrl=https://dblp.org/rec/conf/mediaeval/PogorelovRHGLRE17
}}
==A Comparison of Deep Learning with Global Features for Gastrointestinal Disease Detection==
<pdf width="1500px">https://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_48.pdf</pdf>
<pre>
         A Comparison of Deep Learning with Global Features for
                   Gastrointestinal Disease Detection
                        Konstantin Pogorelov1,2 , Michael Riegler1 , Pål Halvorsen1,2 ,
         Carsten Griwodz1,2 , Thomas de Lange3 , Kristin Ranheim Randel2,3 , Sigrun Losada Eskeland4 ,
             Duc-Tien Dang-Nguyen5 , Olga Ostroukhova8 , Mathias Lux6 , Concetto Spampinato7
        1 Simula Research Laboratory, Norway                      2 University of Oslo, Norway        3 Cancer Registry of Norway, Norway
        4 Vestre Viken Hospital Trust, Norway                     5 Dublin City University, Ireland       6 University of Klagenfurt, Austria
     7 University of Catania, Italy                   8 Research Institute of Multiprocessor Computation Systems n.a. A.V. Kalyaev, Russia

                                                          konstantin@simula.no,michael@simula.no
ABSTRACT                                                                           source high-level neural networks API with Google Tensorflow [1]
This paper presents our approach for the 2017 Multimedia for                       as a computational back-end.
Medicine Medico Task of the MediaEval 2017 Benchmark. We pro-
pose a system based on global features and deep neural networks,                   2.1    Global-features-based
and preliminary results comparing the approaches are presented.                    For the GF-based approaches, we use features that represent the
                                                                                   overall image visual properties, they are easy and fast to calculate,
1    INTRODUCTION                                                                  and they can be used for image comparison, distance computing
Following the initiative to investigate how multimedia can improve                 and image collection search. Here, we use the indexes of visual
medical systems [15], the 2017 Multimedia for Medicine Medico                      features extracted from training image set. A classifier is used to
Task [18] addresses the challenge of detecting diseases based on                   search the index for the image that is most similar to a given in-
multimedia data collected in hospitals [13], i.e., the task focuses                put image. The GFs we use are JCD, Tamura, Color Layout, Edge
on detecting abnormalities, diseases and anatomical landmarks                      Histogram, Auto Color Correlogram and Pyramid Histogram of
in images in the gastrointestinal (GI) tract. There do exist some                  Oriented Gradients [10]. We decided for these combinations based
proposals in this area using various approaches [20, 21], and in this              on our previous findings and experiments in [14, 16]. Multi-class
paper, we describe our solutions, based on both our global-features-               classification is implemented as an additional classification step
based and neural-network-based EIR prototypes [12, 14, 16, 17].                    to determine the final image class based on the the ranked lists
                                                                                   of a search-based classifier for each class of findings. We use the
2    CLASSIFICATION APPROACHES                                                     random tree (RT), random forest (RF) and logistic model tree (LMT)
The proposed approaches are based on the hypothesis that GI tract                  classifiers [7] from WEKA.
diseases and findings can be recognized and classified based on
color, shape and texture properties. In this challenge, there is no                2.2    Deep-features-based
detailed ground truth ROIs provided for the training dataset, thus,                For the deep-features-based approaches, we use a combined method
already existing and well performing approaches to objects recogni-                with deep residual networks for image recognition as features ex-
tion are not suitable for this particular task. Moreover, a relatively             tractor and machine-learning classifier with the input of extracted
low amount of training data is provided making it difficult to use                 deep-features as a multi-class classifier. We use the Inception v3 [19]
modern convolutional neural network (CNN) image segmentation                       and ResNet50 [8] models pre-trained on a set of general images.
and region-based classification approaches. Furthermore, some ob-                  The models were modified in order to produce numerical probabil-
jects like polyps and resection margins have a compact body and                    ity output for all recognized object classes. Then, we use the class
can be easily differentiated from the surrounding tissue, but other                (concept) probabilities (1000 values for both networks) directly in
findings like ulcerative colitis have only tissue with a slightly differ-          the Concepts runs. For the Features runs, we have used the same
ent color properties. To address these different detection challenges,             pre-trained models without including the fully-connected layer
we present 17 different approaches that implement our idea of using                at the top of the network, which give us an output of high-level
visual properties of images for performing multi-class classification              feature probabilities (16384 values for Inception v3 and 2048 for
with the limited training set size. For the final classification step,             ResNet50). Finally, we combine the probabilities by simple early
we use the WEKA machine learning support library [7] which is an                   fusion in one big vector of floating point numbers and use it as an
open source collection of algorithms for machine learning and data                 input for the same classifiers we used in the GF-based approaches.
mining. For all the approaches based on global features (GFs), we
use Lucene Image Retrieval (LIRE) [10], an open source implemen-                   2.3    CNN-based
tation of global and local features extraction and comparison. For                 For the CNN-based approach, we created and trained a custom
all the deep-learning-based approaches, we use Keras [3], an open                  CNN from scratch. Our CNN consist of six convolution layers. As
                                                                                   an activation function, we used the rectified linear unit (ReLU) [6]
Copyright held by the owner/author(s).
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                and maxpooling for pooling. In all the layers, we also included a
                                                                                   0.5 dropout, and the final classification step was performed using
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                                                                                           Pogorelov et al.


two dense layers with first ReLU and then Sigmoid as activation            Table 1: Initial performance evaluation based on the random
functions. Both networks were trained for 200 epochs using the             split of the task development dataset.
Adam optimizer [9].                                                               Method                                  PREC REC       SPEC    ACC       F1    RK      FPS
                                                                                  6 Layer CNN                             0.659 0.642    0.947   0.900   0.640   0.600    43
                                                                                  Inception v3 TFL                        0.700 0.695    0.961   0.925   0.704   0.661    53
2.4    Transfer-learning-based                                                    Inception v3 Concepts RT                0.405 0.402    0.915   0.851   0.403   0.318    66
                                                                                  Inception v3 Concepts RF                0.704 0.701    0.957   0.925   0.699   0.659    50
For the transfer-learning-based (TFL) approach, we use the pre-                   Inception v3 Concepts LMT               0.771 0.763    0.970   0.940   0.745   0.721    37
trained Inception v3 [19] model and transfer learning technique [2]               Inception v3 Features RT                0.287 0.288    0.898   0.822   0.287   0.186    56
                                                                                  Inception v3 Features RF                0.436 0.447    0.921   0.862   0.436   0.362    43
to train the network on our specific training set. We re-trained
                                                                                  Inception v3 Features LMT               0.444 0.438    0.920   0.859   0.438   0.360    30
the base model and fine-tuned the last layers on the training set                 ResNet50 Concepts RT                    0.507 0.500    0.929   0.875   0.501   0.431    88
following the DeCAF approach [5]. We did not perform complex                      ResNet50 Concepts RF                    0.762 0.753    0.965   0.938   0.751   0.720    78
                                                                                  ResNet50 Concepts LMT                   0.781 0.799    0.983   0.970   0.797   0.750    53
data augmentation and only relied on transfer learning. We froze                  ResNet50 Features RT                    0.479 0.478    0.925   0.869   0.477   0.403    79
all the basic convolutional layers of the network and only retrained              ResNet50 Features RF                    0.790 0.782    0.980   0.928   0.769   0.763    70
the two top dense layers. The dense layers were retrained using the               ResNet50 Features LMT                   0.841 0.839    0.985   0.972   0.856   0.828   46
                                                                                  6 Global Features RT                    0.576 0.578    0.940   0.894   0.576   0.516   130
RMSprop [4] optimizer that allows an adaptive learning rate during                6 Global Features RF                    0.744 0.734    0.981   0.951   0.784   0.705   105
the training process. After 1,000 epochs, we stopped the retraining               6 Global Features LMT                   0.800 0.785    0.980   0.964   0.781   0.748    80
of the dense layers and started fine tuning the convolutional layers.
For that step, we did the analysis of the Inception v3 model lay-          Table 2: The official classification performance evaluation
ers structure and decided to apply the fine-tuning on the top two          results (provided by the organizers) of the submitted runs.
convolutional layers. For this training step, we used a stochastic             Run #              Method                        PREC REC      SPEC    ACC       F1    RK       FPS
                                                                               1                  Inception v3 TFL              0.735 0.715   0.963   0.725   0.725   0.686     53
gradient descent method with a low learning rate to achieve the                2                  Inception v3 Concepts LMT     0.742 0.738   0.963   0.934   0.737   0.701     37
best effect in terms of speed and accuracy [11].                               3                  ResNet50 Concepts LMT         0.766 0.763   0.966   0.941   0.761   0.729     53
                                                                               4                  ResNet50 Features LMT         0.829 0.826   0.975   0.957   0.826   0.802    46
                                                                               5                  6 Global Features LMT         0.766 0.760   0.966   0.940   0.757   0.727     80
3     EXPERIMENTAL RESULTS
First, we have performed an initial evaluation of the approaches           Table 3: Confusion matrix for the ResNet50 Features LMT run #4.
                                                                                                                                             Detected class
using the development dataset only randomly splitting it into new                                                                 A   B   C    D     E       F   G   H
training and test sets with the equal number of 2, 000 images in each.                             Esophagitis (A)               319   0  4     2   174       0   1   0
                                                                                                   Dyed and Lifted Polyps (B)     0  385  0     6     0      59  47   3
We assessed 17 different methods executed in 17 internal runs using
                                                                                   Actual class


                                                                                                   Pylorus (C)                    6    0 460    7    19       0   7   1
the new sets generated. An overview of the conducted internal runs                                 Ulcerative colitis (D)         5    0  1   460     0       2  14  18
                                                                                                   Z-line (E)                    104   0  8     0   385       0   3   0
can be found in table 1 where we provide the measured performance                                  Dyed Resection Margins (F)     0   84  1     5     0     403   5   2
metrics [13]. We can see that not all our approaches can perform                                   Polyps (G)                     1    3  1    19     1       1 441 33
                                                                                                   Cecum (H)                      0    1  0    29     0       0  18 452
efficiently on the given dataset. In general, we can conclude that for
all the machine-learning-based classification approaches, the LMT          is the inflammation of Z-Line area, thus local image characteristics
classifier is performing the best, the RF classifier is slightly worse,    should be used to distinguish between these classes more precisely.
and the RT classifier performs the worst. The 6 Layers CNN and             The same reason can explain some cases of miss-classification with
Inception v3 TFL approaches performs with the comparable preci-            Dyed and Lifted Polyps, Dyed Resection Margins and Polyps classes.
sion, but Inception v3 TFL have slightly better results. The Inception
v3 Concepts and ResNet50 Concepts approaches performs with the             4     CONCLUSION
comparable precision too, but all the ResNet50 Concepts approaches         In this paper, we presented 17 different combined approaches de-
perform slightly better. The Inception v3 Features approaches per-         signed for multi-class classification of medical imaging data with
form the worst compared to all other features-based approaches             the limited training dataset. We presented a novel comparison of
even for the efficient LMT classifier, which can be caused by the          the performance of the various visual-features-based methods with
huge feature values vector generated by the Inception v3 network.          traditional custom CNN and Inception v3 with transfer-learning-
Finally, the best performing approach is the ResNet50 Features ap-         based approaches. We used modified Inception v3 and ResNet50
proach with the LMT classifier showing the performance of 0.828            networks and the LIRE library for the features extraction, with
for R K and 0.856 for F1 score.                                            machine-learning classification algorithms from WEKA. Despite
    Based on the initial evaluation, we have selected the five different   the limited training dataset and a presence of visually similar image
approaches for the official competition submission. The approaches         classes, we achieved a good multi-class classification performance
selected (see table 2) are the best performing in the internal runs        with the R K value of 0.802 and a classification speed of 46 frames
while keeping as much diversity of the methods as possible. The            per second. For our future research, we will investigate the com-
official evaluation results provided by the organizers is presented in     bined approach with the fusion of multiple deep-network-based
table 2. The best performing approach is again the ResNet50 Features       feature extractors for the initial coarse image classification together
approach with the LMT classifier (run #4) with the R K value of            with the fine-tuned local-feature-based sub-classification for the
0.802 and F1 score of 0.826. The confusion matrix of this run is           efficient cross-class detection between visually similar images.
presented in table 3. The often miss-classified classes are Esophagitis
and Z-line that is caused by the nature of the used visual features.       ACKNOWLEDGMENTS
Both of these classes consist of pictures of Z-Line, but Esophagitis       This work is founded by the FRINATEK project ”EONS” #231687.
A Comparison of Deep Learning with Global Features for GI disease detection                   MediaEval’17, 13-15 September 2017, Dublin, Ireland


REFERENCES                                                                           Wallapak Tavanapong, Peter T Schmidt, Cathal Gurrin, Dag Johansen,
 [1] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng               Håvard Johansen, and Pål Halvorsen. 2016. Multimedia and Medicine:
     Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu           Teammates for better disease detection and survival. In Proceedings of
     Devin, and others. 2016. Tensorflow: Large-scale machine learning on            the 2016 ACM Multimedia Conference (ACM MM). 968–977.
     heterogeneous distributed systems. arXiv preprint arXiv:1603.04467         [16] Michael Riegler, Konstantin Pogorelov, Sigrun Losada Eskeland, Peter
     (2016).                                                                         Thelin Schmidt, Zeno Albisser, Dag Johansen, Carsten Griwodz, Pål
 [2] Souad Chaabouni, Jenny Benois-Pineau, and Chokri Ben Amar. 2016.                Halvorsen, and Thomas de Lange. 2017. From Annotation to Com-
     Transfer learning with deep networks for saliency prediction in natural         puter Aided Diagnosis: Detailed Evaluation of a Medical Multimedia
     video. In Proceedings of the 2016 IEEE International Conference on Image        System. Transactions on Multimedia Computing, Communications and
     Processing (ICIP). 1604–1608.                                                   Applications 9, 4 (2017).
 [3] François Chollet. 2015. Keras: Deep learning library for theano and        [17] Michael Riegler, Konstantin Pogorelov, Pål Halvorsen, Thomas de
     tensorflow. (2015). https://keras.io/ Accessed: 2017-09-01.                     Lange, Carsten Griwodz, Peter Thelin Schmidt, Sigrun Losada Es-
 [4] YN Dauphin, H De Vries, J Chung, and Y Bengio. 2015. RMSProp                    keland, and Dag Johansen. 2016. EIR - Efficient Computer Aided
     and equilibrated adaptive learning rates for non-convex optimization.           Diagnosis Framework for Gastrointestinal endoscopies. In Proceed-
     arXiv preprint arXiv:1502.04390 (2015).                                         ings of the 14th International Workshop on Content-based Multimedia
 [5] Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang,            Indexing (CBMI). 1–6.
     Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A Deep Convolutional          [18] Michael Riegler, Konstantin Pogorelov, Pål Halvorsen, Kristin Ranheim
     Activation Feature for Generic Visual Recognition. In Proceedings of            Randel, Sigrun Losada Eskeland, Duc-Tien Dang-Nguyen, Mathias
     the 31st International Conference on Machine Learning (ICML), Vol. 32.          Lux, Carsten Griwodz, Concetto Spampinato, and Thomas de Lange.
     647–655.                                                                        2017. Multimedia for Medicine: The Medico Task at MediaEval 2017.
 [6] Richard HR Hahnloser, Rahul Sarpeshkar, Misha A Mahowald, Rod-                  In Proceedings of the 2017 MediaEval Benchmarking Initiative for Mul-
     ney J Douglas, and H Sebastian Seung. 2000. Digital selection and               timedia Evaluation.
     analogue amplification coexist in a cortex-inspired silicon circuit. Na-   [19] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens,
     ture 405, 6789 (2000), 947–951.                                                 and Zbigniew Wojna. 2015. Rethinking the inception architecture for
 [7] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter              computer vision. arXiv preprint arXiv:1512.00567 (2015).
     Reutemann, and Ian H Witten. 2009. The WEKA data mining software:          [20] Yi Wang, Wallapak Tavanapong, Johnny Wong, JungHwan Oh, and
     an update. ACM SIGKDD explorations newsletter 11, 1 (2009), 10–18.              Piet C De Groen. 2011. Computer-aided detection of retroflexion in
 [8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep               colonoscopy. In Proceeding of the 24th International Symposium on
     residual learning for image recognition. In Proceedings of the 2016             Computer-Based Medical Systems (CBMS). 1–6.
     IEEE Conference on Computer Vision and Pattern Recognition (CVPR).         [21] Yi Wang, Wallapak Tavanapong, Johnny Wong, Jung Hwan Oh, and
     770–778.                                                                        Piet C De Groen. 2015. Polyp-alert: Near real-time feedback during
 [9] Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic               colonoscopy. Computer methods and programs in biomedicine 120, 3
     optimization. arXiv preprint arXiv:1412.6980 (2014).                            (2015), 164–179.
[10] Mathias Lux, Michael Riegler, Pål Halvorsen, Konstantin Pogorelov,
     and Nektarios Anagnostopoulos. 2016. LIRE: open source visual in-
     formation retrieval. In Proceedings of the 2016 ACM Conference on
     Multimedia Systems (MMSys). Article no. 30.
[11] Jiquan Ngiam, Adam Coates, Ahbik Lahiri, Bobby Prochnow, Quoc V
     Le, and Andrew Y Ng. 2011. On optimization methods for deep learn-
     ing. In Proceedings of the 28th International Conference on Machine
     Learning (ICML). 265–272.
[12] Konstantin Pogorelov, Sigrun Losada Eskeland, Thomas de Lange,
     Carsten Griwodz, Kristin Ranheim Randel, Håkon Kvale Stens-
     land, Duc-Tien Dang-Nguyen, Concetto Spampinato, Dag Johansen,
     Michael Riegler, and others. 2017. A holistic multimedia system for
     gastrointestinal tract disease detection. In Proceedings of the 8th ACM
     Conference on Multimedia Systems (MMSys). 112–123.
[13] Konstantin Pogorelov, Kristin Ranheim Randel, Carsten Griwodz,
     Sigrun Losada Eskeland, Thomas de Lange, Dag Johansen, Con-
     cetto Spampinato, Duc-Tien Dang-Nguyen, Mathias Lux, Peter Thelin
     Schmidt, Michael Riegler, and Pål Halvorsen. 2017. Kvasir: A Multi-
     Class Image Dataset for Computer Aided Gastrointestinal Disease
     Detection. In Proceedings of the 8th ACM on Multimedia Systems Con-
     ference (MMSys). 164–169.
[14] Konstantin Pogorelov, Michael Riegler, Sigrun Losada Eskeland,
     Thomas de Lange, Dag Johansen, Carsten Griwodz, Peter Thelin
     Schmidt, and Pål Halvorsen. 2017. Efficient disease detection in gas-
     trointestinal videos – global features versus neural networks. Mul-
     timedia Tools and Applications (2017), 1–33. https://doi.org/10.1007/
     s11042-017-4989-y
[15] Michael Riegler, Mathias Lux, Carsten Gridwodz, Concetto Spamp-
     inato, Thomas de Lange, Sigrun L Eskeland, Konstantin Pogorelov,

</pre>