The Medico-Task 2018: Disease Detection in the Gastrointestinal
       Tract using Global Features and Deep Learning
                        Vajira Thambawita1,3 , Debesh Jha1,4 , Michael Riegler1,3,5 , Pål Halvorsen1,3,5 ,
                              Hugo Lewi Hammer2 , Håvard D. Johansen4 , and Dag Johansen4
       1 Simula Research Laboratory, Norway                    2 Oslo Metropolitan University, Norway       3 Simula Metropolitan, Norway
                                            4 University of Tromsø, Norway       5 University of Oslo, Norway

                                                         Contact:vajira@simula.no,debesh@simula.no
ABSTRACT                                                                         of extracted GFs that are sent to SimpleLogistic (SL) classifier. We
In this paper, we present our approach for the 2018 Medico Task                  input the same selected set of features to the logistic model tree
classifying diseases in the gastrointestinal tract. We have proposed             (LMT) classifier in Method 2.
a system based on global features and deep neural networks. The
best approach combines two neural networks, and the reproducible                 2.2    Transfer learning based approaches
experimental results signify the efficiency of the proposed model                Our CNN approaches use transfer learning mechanism with pre-
with an accuracy rate of 95.80%, a precision of 95.87%, and an F1-               trained models using the ImageNet dataset [18]. Resnet-152 [3] and
score of 95.80%.                                                                 Densenet-161 [4] have been selected, and this selection is based
                                                                                 on top 1-error and top-5-errors rate of pre-trained networks in the
1     INTRODUCTION                                                               Pytorch [8] deep learning framework.
Our main goal for the Medico Task [15] is to classify findings in                    One of the main problems of the given dataset is the "out of
images from the Gastrointestinal (GI) tract. This task provides two              patient"-category which has only four images while other classes
types of input data: Global Features (GFs) and original images.                  have a considerable number. The colour distribution of this class
The 2017 Medico Task consisted of a balanced dataset with only                   shows a completely different colour domain compared to the other
8 classes [12] whereas the current task consists of a highly imbal-              categories. We identified this difference via manual investigations
anced dataset with 16 classes [11, 12], i.e., making this years task             of the dataset and moved all four images of this category into the
more complicated. Different approaches have been used in the last                corresponding validation set folder. Then, the training set folder
year medico task [5, 7, 9, 10, 14, 17] based on GFs extractions and              is filled with random Google images which are not related to the
Convolutional Neural Networks (CNN) methods. We extend upon                      GI tract. To overcome the problems of stopping training in a local
these solutions and present our solutions based on both GFs and                  minima, we use the stochastic gradient descent [1] method with
transfer learning mechanisms using CNN. We achieve best results                  dynamic learning rate scheduling. The losses (loss 1 and loss 2
combining two CNNs and using an extra multilayer perceptron to                   in Figure 1) of CNN methods were calculated for each network
combine the outputs of the two networks.                                         separately. Additionally, horizontal flips, vertical flips, rotations
                                                                                 and re-sizing data augmentations have been applied to overcome
2     APPROACHES                                                                 the problem of over-fitting.
We approach the problem of GI tract disease detection with small                     Method 3 uses transfer learning with Resnet-152 which has the
training datasets using five different methods: two based on GF ex-              top-1-error and top-5-error rates. The last fully connected layer of
tractions, and three based on CNN with transfer learning described               Resnet-152, which is originally designed to classify 1000 classes of
below.                                                                           the ImageNet dataset, has been changed to classify the 16 classes in
2.1     Global-feature-based approaches                                          the MEdico task. Usually, the transfer learning freezes pre-trained
                                                                                 layers to avoid back propagation of large errors. This is because
Method 1 and Method 2 use the concept of GFs. For the extraction
                                                                                 of newly added layers with random weights. However, we did not
of GFs, we use Lucence Image Retrieveal (LIRE) [6]. GFs are easy and
                                                                                 freeze the pre-trained layers, because modifying only the last layer
fast to calculate, and can also be used for image comparison, image
                                                                                 cannot propagate huge errors backwards in transfer learning. The
collection search and distance computing [14]. Based on [13, 16],
                                                                                 network was trained until it reached to the maximum validation
we use Joint Composite feature (JCD), Tamura, Color layout, Edge
                                                                                 accuracy of the validation dataset.
Histogram, Auto Color Correlogram and Pyramid Histogram of
                                                                                     Method 4 extends Method 3 by using two parallel pre-trained
Oriented Gradients (PHOG). These features represent the overall
                                                                                 models, Resnet-152 and Densenet-161, to get a cumulative decision
properties of the images. Adding more GFs is possible, but it may
                                                                                 at the end as depicted in Figure 1. The classification is based on an
increase the redundant information which can reduce the overall
                                                                                 average of the two output probability vectors. Finally, one loss value
classification performance.
                                                                                 was calculated and propagated for updating weights. However,
   The extracted features are sent to the different machine learning
                                                                                 this yields a restriction of updating weights of networks Resnet-
classifier for the multi-class classification. Method 1 makes the use
                                                                                 152 and Densenet-161 separately as they required. Therefore, we
Copyright held by the owner/author(s).                                           calculated two different loss values (loss 1 and loss 2 in Figure
MediaEval’18, 29-31 October 2018, Sophia Antipolis, France                       1) from each network to update their weights separately. Both
MediaEval’18, 29-31 October 2018, Sophia Antipolis, France                                                                                                             Thambawita et. al.


               Resnet-152                                     out
                                                                            Table 1: The Confusion Matrix of Method 5 in our study
                                       O1
                               16                          (method 3)   A:blurry-nothing, B:colon-clear, C:dyed-lifted-polyps, D:dyed-resection-margins,
                      loss 1                                            E:esophagitis,F:instruments, G:normal-cecum, H:normal-pylorus, I:normal-z-line,
                                             (o1 + o2)/2      out
                                                                        J:out-of-patient, K:polyps, L:retroflex-rectum, M:retroflex-stomach, N:stool-inclusions,
      X        Base Network                                (method 4)   O:stool-plenty, P:ulcerative-colitis
                                                                                                                                     Predicted class
                      loss 2                fc1     fc2
                                                              out                               A     B     C      D      E     F      G     H       I      J    K     L    M     N    O     P
                               16      O2   (32)   (16)    (method 5)
               Densenet-                                                                    A   53    _     _      _      _     _      _      _      _      _    _     _    _     _    _     _
                                                                                            B   _     81    _      _      _     _      _      _      _      _    _     _    _     _    _     _
                 161                                                                        C   _     _     130    7      _     _      _      _      _      _    _     _    _     _    _     1
                                                                                            D   _     _     3      122    _     _      _      _      _      _    _     _    _     _    _     _
                                                                                            E   _     _     _      _      115   _      _      _      19     _    _     _    _     _    _     _
                                                                                            F   _     _     _      _      _     10     _      _      _      _    1     _    _     _    _     _
          Figure 1: Block diagram of the CNN methods                                        G   _     _     _      _      _     _      125    _      _      _    _     _    _     _    _     _


                                                                             Actual class
                                                                                            H   _     _     _      _      _     _      _      132    _      _    _     _    _     _    _     _
                                                                                            I   _     _     _      _      11    _      _      _      121    _    _     _    _     _    _     _
networks were trained simultaneously until it reached to the best                           J
                                                                                            K
                                                                                                _
                                                                                                _
                                                                                                      _
                                                                                                      1
                                                                                                            _
                                                                                                            _
                                                                                                                   _
                                                                                                                   _
                                                                                                                          _
                                                                                                                          _
                                                                                                                                1
                                                                                                                                _
                                                                                                                                       _
                                                                                                                                       6
                                                                                                                                              _
                                                                                                                                              2
                                                                                                                                                     _
                                                                                                                                                     _
                                                                                                                                                            3
                                                                                                                                                            _
                                                                                                                                                                 _
                                                                                                                                                                 172
                                                                                                                                                                       _
                                                                                                                                                                       _
                                                                                                                                                                            _
                                                                                                                                                                            _
                                                                                                                                                                                  _
                                                                                                                                                                                  _
                                                                                                                                                                                       _
                                                                                                                                                                                       _
                                                                                                                                                                                             _
                                                                                                                                                                                             _
validation accuracy by changing hyper-parameters manually.                                  L   _     _     _      _      _     _      1      _      _      _    _     71   _     _    _     _
                                                                                            M   _     _     _      _      _     _      _      _      _      _    _     2    118   _    _     _
   Method 5 was constructed to overcome the limitation of calcu-                            N   _     _     _      _      _     _      _      _      _      _    _     _    _     39   _     _
                                                                                            O   _     _     _      _      _     _      _      _      _      _    _     _    _     _    110   _
lating the average of the probabilistic output of the two networks                          P   _     _     _      _      1     1      2      _      _      _    4     1    _     _    _     129

used in Method 4. Instead of calculating the average using the sim-
ple mathematical formula, another multilayer perceptron (MLP)                                                     Table 2: Validation results
has been merged with the above network to identify complex math-
                                                                            Method                   REC           PREC             SPEC            ACC           MCC             F1         FPS
ematical formula to get the cumulative decision as illustrated in
Figure 1. Therefore, we passed the probability output of two net-                     1              0.855        0.793             0.989           0.816        0.814          0.823        79
works (16 probabilities from each network) to a new MLP with 32                       2              0.816        0.817             0.984           0.816        0.800          0.815        12
                                                                                      3              0.9536       0.9543            0.9968          0.9536       0.9498         0.9535       64
inputs, 16 outputs (via sigmoid layer) and one hidden layer with
                                                                                      4              0.9555       0.9563            0.9969          0.9555       0.9519         0.9554       29
32 units. In this, we used pre-trained Resnet-152 and Densenet-161
                                                                                      5              0.9580       0.9587            0.9971          0.9580       0.9546         0.9580       29
using the dataset and froze them before training the MLP. Then,
we trained only the MLP to identify the best mathematical formula
to get the cumulative decision.                                                                                    Table 3: Official results
                                                                                    Method                 REC           PREC           SPEC               ACC         MCC             F1
3   RESULTS AND ANALYSIS                                                                    1             0.8457         0.8457         0.9897           0.9807        0.8353      0.8456
We have divided the development dataset into a training set (70%)                           2             0.8457         0.8457         0.9897           0.9807        0.8350      0.8457
and a validation set (30%). For the GFs based approach, ensembles of                        3             0.9376         0.9376         0.9958           0.9922        0.9335      0.9376
six extracted GFs were fetched to all the available machine learning                        4             0.9400         0.9400         0.9960           0.9925        0.9360      0.9400
classifiers (with different parameters) using WEKA[2] library. The                          5             0.9458         0.9458         0.9964           0.9932        0.9421      0.9458
SL and LMT classifiers outperform all other available classifiers for      The main considerable point in the confusion matrix in Table 1
the dataset. The other promising classifier were Sequential minimal     is misclassification between categories E: esophagitis and I: normal-
optimization (RBF kernel), and a combination of PCA with LibSVM         z-line. A large number of misclassifications like 30 images from
(RBF) classifier.                                                       the validation set occurred and a manual investigation was done
   On validation set, all the CNN methods (3-5) show accuracies of      to identify the reason. We notice that the images of these two
around 95% and specificities of around 99%. These are always better     categories were very similar to each other because of the close
than the GFs based extraction methods (1,2) which have accuracies       location in the GI tract, and identifying these is also a challeng for
of around 82% and specificities of around 98%. According to the         physicians.
task organizers’ evaluation results of the test dataset, Methods 3
to 5 show accuracies and specificities of around 99% again,which        4             CONCLUSION
demonstrates our CNN methods are not overfitted with validation
                                                                        In this paper, we presented five different methods for the multi-class
dataset.
                                                                        classification of GI tract diseases. The proposed approach are based
   Method 5 and 4 with Resnet-152 and Densenet-161 performs bet-
                                                                        on the GFs, and pre-trained CNN with transfer learning mecha-
ter compared to the Method 3 which has only Resnet-152 because
                                                                        nism. The combination of Resnet-152 and Densenet-161 with an
of the capability of deciding the final answer based on two answers
                                                                        additional MLP achieved the highest performance with both the
generated from two deep learning networks. However, getting a
                                                                        validation dataset and the test dataset provided by the task organiz-
cumulative decision based on simple averaging function (Method
                                                                        ers. We show that a combination of pre-trained deep neural models
4) shows poor performance than the decision taken from a MLP
                                                                        on ImageNet has better capabilities to classify images into the cor-
(Method 5). As a result, Method 5 shows better results than method
                                                                        rect classes because of cumulative decision-making capabilities. For
4 by increasing the accuracy from 0.955 to 0.958. Therefore, Method
                                                                        future work, we will combine deeper CNNs parallelly to add more
5 has been selected as our best method and confusion matrix rep-
                                                                        cumulative decision taking capabilities for classifying multi-class
resented in Table 1 was generated. An overview of the individual
                                                                        objects. In addition to that, Generative Adversarial Network (GAN)
results obtained from five different experiments along with their
                                                                        methods can be utilized to handle imbalance dataset by generating
performance metrics is presented in Table 2. Results obtained from
                                                                        more data to train deep neural networks.
the organizers for the test dataset is presented in the Table 3.
Medico: The 2018 Multimedia for Medicine Task                                        MediaEval’18, 29-31 October 2018, Sophia Antipolis, France


REFERENCES                                                                          Michael Riegler, and Pål Halvorsen. 2017. Nerthus: A Bowel Prepara-
 [1] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio.             tion Quality Video Dataset. In Proceedings of the 8th ACM on Multime-
     2016. Deep learning. Vol. 1. MIT press Cambridge.                              dia Systems Conference (MMSYS). ACM, 170–174.
 [2] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter        [12] Konstantin Pogorelov, Kristin Ranheim Randel, Carsten Griwodz,
     Reutemann, and Ian H Witten. 2009. The WEKA data mining software:              Sigrun Losada Eskeland, Thomas de Lange, Dag Johansen, Con-
     an update. ACM SIGKDD explorations newsletter (SIGKDD Explor.                  cetto Spampinato, Duc-Tien Dang-Nguyen, Mathias Lux, Peter Thelin
     Newsl.) 11, 1 (2009), 10–18.                                                   Schmidt, and others. 2017. Kvasir: A multi-class image dataset for
 [3] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep              computer aided gastrointestinal disease detection. In Proceedings of the
     residual learning for image recognition. In Proceedings of the IEEE            8th ACM on Multimedia Systems Conference (MMSYS). ACM, 164–169.
     conference on computer vision and pattern recognition (CVPR). 770–        [13] Konstantin Pogorelov, Michael Riegler, Sigrun Losada Eskeland,
     778.                                                                           Thomas de Lange, Dag Johansen, Carsten Griwodz, Peter Thelin
 [4] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Wein-              Schmidt, and Pål Halvorsen. 2017. Efficient disease detection in gas-
     berger. 2017. Densely Connected Convolutional Networks. In Proceed-            trointestinal videos–global features versus neural networks. An In-
     ings of the IEEE Conference on Computer Vision and Pattern Recognition         ternational Journal Multimedia Tools and Applications 76, 21 (2017),
     (CVPR). 2261–2269.                                                             22493–22525.
 [5] Yang Liu, Zhonglei Gu, and William K Cheung. 2017. HKBU at Media-         [14] Konstantin Pogorelov, Michael Riegler, Pål Halvorsen, Carsten Gri-
     Eval 2017 Medico: Medical multimedia task. In Working Notes Proceed-           wodz, Thomas de Lange, Kristin Ranheim Randel, Sigrun Eskeland,
     ings of the MediaEval 2017 Workshop (MediaEval 2017).                          Dang Nguyen, Duc Tien, Olga Ostroukhova, and others. 2017. A
 [6] Mathias Lux, Michael Riegler, Pål Halvorsen, Konstantin Pogorelov,             comparison of deep learning with global features for gastrointestinal
     and Nektarios Anagnostopoulos. 2016. LIRE: open source visual infor-           disease detection. In Working Notes Proceedings of the MediaEval 2017
     mation retrieval. In Proceedings of the 7th International Conference on        Workshop (MediaEval 2017).
                                                                               [15] Konstantin Pogorelov, Michael Riegler, Pål Halvorsen, Thomas De
     Multimedia Systems (MMSys). ACM, 30.
                                                                                    Lange, Kristin Ranheim Randel, Duc-Tien Dang-Nguyen, Mathias Lux,
 [7] Syed Sadiq Ali Naqvi, Shees Nadeem, Muhammad Zaid, and Muham-
                                                                                    and Olga Ostroukhova. 2018. Medico Multimedia Task at MediaEval
     mad Atif Tahir. 2017. Ensemble of Texture Features for Finding Ab-
                                                                                    2018. In Working Notes Proceedings of the MediaEval 2018 Workshop.
     normalities in the Gastro-Intestinal Tract. Working Notes Proceedings
                                                                               [16] Michael Riegler, Konstantin Pogorelov, Sigrun Losada Eskeland, Pe-
     of the MediaEval 2017 Workshop (MediaEval 2017).
                                                                                    ter Thelin Schmidt, Zeno Albisser, Dag Johansen, Carsten Griwodz, Pål
 [8] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward
                                                                                    Halvorsen, and Thomas De Lange. 2017. From annotation to computer-
     Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga,
                                                                                    aided diagnosis: Detailed evaluation of a medical multimedia system.
     and Adam Lerer. 2017. Automatic differentiation in PyTorch. In Pro-
                                                                                    ACM Transactions on Multimedia Computing, Communications, and
     ceedings of 31st Conference on Neural Information Processing Systems
                                                                                    Applications (TOMM) 13, 3 (2017), 26.
     (NIPS).
                                                                               [17] Michael Riegler, Konstantin Pogorelov, Pål Halvorsen, Carsten Gri-
 [9] Stefan Petscharnig and Klaus Schöffmann. 2018. Learning laparoscopic
                                                                                    wodz, Thomas Lange, Kristin Ranheim Randel, Sigrun Eskeland, Dang
     video shot classification for gynecological surgery. An International
                                                                                    Nguyen, Duc Tien, Mathias Lux, and others. 2017. Multimedia for
     Journal of Multimedia Tools and Applications 77, 7 (2018), 8061–8079.
                                                                                    medicine: the medico Task at mediaEval 2017. In Working Notes Pro-
[10] Stefan Petscharnig, Klaus Schöffmann, and Mathias Lux. 2017. An
                                                                                    ceedings of the MediaEval 2017 Workshop (MediaEval 2017).
     Inception-like CNN Architecture for GI Disease and Anatomical Land-
                                                                               [18] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev
     mark Classification. In Working Notes Proceedings of the MediaEval
                                                                                    Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla,
     2017 Workshop (MediaEval 2017).
                                                                                    Michael Bernstein, and others. 2015. ImageNet Large Scale Visual
[11] Konstantin Pogorelov, Kristin Ranheim Randel, Thomas de Lange,
                                                                                    Recognition Challenge. International Journal of Computer Vision (IJCV)
     Sigrun Losada Eskeland, Carsten Griwodz, Dag Johansen, Concetto
                                                                                    (2015).
     Spampinato, Mario Taschwer, Mathias Lux, Peter Thelin Schmidt,