=Paper=
{{Paper
|id=Vol-2283/MediaEval_18_paper_17
|storemode=property
|title=Using Preprocessing as a Tool in Medical Image Detection
|pdfUrl=https://ceur-ws.org/Vol-2283/MediaEval_18_paper_17.pdf
|volume=Vol-2283
|authors=Mathias Kirkerød,Vajira Thambawita,Michael Riegler,Pål Halvorsen
|dblpUrl=https://dblp.org/rec/conf/mediaeval/KirkerodTRH18
}}
==Using Preprocessing as a Tool in Medical Image Detection==
<pdf width="1500px">https://ceur-ws.org/Vol-2283/MediaEval_18_paper_17.pdf</pdf>
<pre>
          Using preprocessing as a tool in medical image detection
                    Mathias Kirkerød 1,3 , Vajira Thambawita1,2 , Michael Riegler1,2,3 , Pål Halvorsen1,3
                                                             1 Simula Research Laboratory, Norway
                                                                2 Oslo Metropolitan University
                                                                     3 University of Oslo

                           mathias.kirkerod@gmail.com,vajira@simula.no,michael@simula.no,paalh@simula.no

ABSTRACT
In this paper we describe our approach to gastrointestinal disease
classification for the medico task at MediaEval 2018. We propose
multiple ways to inpaint problematic areas in the test and training
set to help with classification. We discuss the effect that prepro-
cessing does to the input data with respect to removing regions
with sparse information. We also discuss how preprocessing affects
the training and evaluation of a dataset that is limited in size. We
will also compare the different inpainting methods with transfer
learning using a convolutional neural network.                                   (a) Image before inpainting              (b) Image after inpainting

                                                                                        Figure 1: Differences of images after inpainting
1    INTRODUCTION
Medical image diagnosis is a challenging task in the industry of
computer vision. In the last couple of years, as computing power
has increased, machine learning has become a tool in the task of
image detection, segmentation and classification. In this paper we               results should only come from the different training datasets we
are looking in depth how to use machine learning to help solve                   use.
classification tasks on the data-set from the Medico task [8]. The                  The medical data has 1 main feature that we focus on during the
Medico task focuses on image classification in the gastrointestinal              preprocessing, namely the green square in the bottom left corner.
(GI) tract. The data is divided in to 16 different classes.                      A neural network often struggle with areas with really sparse infor-
   Similar to other parts of image detection, the Medico dataset                 mation. Our hypothesis is that just replacing the green area with a
encounter the challenges that the amount of data is too small, or                similar black area will not yield a better result.
that the training data does not cover the full distribution of the data             We have a dataset that we use as a base case. This dataset was
in the test case. The main goal of this task is to classify medical              not augmented, other than shrinking the size of every image to a
images. Our proposal is to use unsupervised machine learning for                 fixed resolution. The other datasets were augmented in a way that
removal of the green corners that are in the Medico dataset. The                 would cover up the green square in one way or another.
details of the task are described in [5, 7].                                        Our hypothesis it that if we recreate the areas as they would
                                                                                 look like without any sparse areas, the classifier can focus on the
                                                                                 right features for classifications. We propose 4 different methods
2    APPROACH                                                                    on how to inpaint the corner area of the medical images.
                                                                                 An autoencoder [4], a context conditional generative adversarial
Our approach is divided in to two steps: first preprocessing, then
                                                                                 network[2, 3], a context encoder [6], and a simple crop of the image.
classifying. Our focus is mainly on the preprocessing of the data to
remove the green corners in the medical images.
   After the preprocessing the dataset we run it through a Con-                  2.1    Autoencoder
volutional Neural Network (CNN) based on transfer learning. We                   For the autoencoder approach, we created and trained a custom
chose the CNN model based on the top 5 and top 1 accuracy of the                 autoencoder from scratch. Our autoencoder consist of a encoder-
pre-trained networks on the Keras documentation pages.                           decoder network, with 2D convolutions as well as rectified linear
   In our approach we use the InceptionResNetV2 [9] network.                     units as activation functions. In the layer between the encoder and
We also remove the top layer and replace it with a global average                the decoder we included a 25% dropout. [1]
pooling layer and a dense 16 layer output, to match the number of                   To preprocess the medical data we feed the whole image through
classes wanted. In addition, we do not freeze any layers of the model.           the encoder-decoder network. We take the loss of the whole recon-
The five submissions that we run is with the same hyperparameters                structed image, but only keep the inpainted part. Under training,
in the transferlearning model. This means that the difference in                 the goal is to minimize the loss: L(x, д(f (x̃))) Where x is an image
                                                                                 without a green corner, and x̃ is the same image with an artificial
Copyright held by the owner/author(s).
                                                                                 green corner. In theory we can replace any part of the image with
MediaEval’18, 29-31 October 2018, Sophia Antipolis, France
                                                                                 this method.
MediaEval’18, 29-31 October 2018, Sophia Antipolis, France                                                                                                  Kirkerød et al.

                 Table 1: Validation set’ results                                                       Table 2: Official Results

        Method        REC     PREC    SPEC    ACC     MCC      F1                  Method                REC           PREC         SPEC       ACC           MCC            F1
      Autoencoder     0.929   0.929   0.981   0.929   0.923   0.928           Autoencoder                0.915         0.915        0.994      0.989        0.910          0.915
       CC-GAN         0.931   0.932   1.000   0.931   0.926   0.931            CC-GAN                    0.915         0.915        0.994      0.989        0.910          0.915
     Contextencoder   0.926   0.928   0.945   0.926   0.920   0.926          Contextencoder              0.910         0.910        0.994      0.988        0.905          0.910
        Clipping      0.903   0.904   0.980   0.903   0.895   0.903             Clipping                 0.904         0.904        0.993      0.988        0.898          0.904
    Non-augmenteted   0.925   0.927   0.981   0.925   0.919   0.924         Non-augmenteted              0.917         0.917        0.994      0.989        0.911          0.917


                                                                                                       Table 3: Confusion Matrix
2.2     Context encoder
                                                                         A:ulcerative-colitis , B:esophagitis , C:normal-z-line , D:dyed-lifted-polyps , E:dyed-
For the context encoder approach, we created a new encoder-             resection-margins , F:out-of-patient , G:normal-pylorus , H:stool-inclusions , I:stool-
decoder network. Here the encoder has a similar structure to the        plenty , J:blurry-nothing , K:polyps , L:normal-cecum , M:colon-clear , N:retroflex-
                                                                        rectum , O:retroflex-stomach , P:instruments
autoencoder, but our decoder is only making outputs at the size of
the desired area to inpaint. In addition to the loss generated from                  A     B     C     D     E     F    G     H     I    J    K     L     M    N     O     P
taking a MSE loss[6]:                                                          A     510   0     1     0     1     0    1     0     69   0    5     24    0    3     0     13
                                                                               B     3     401   68    0     1     0    5     0     0    0    0     0     0    0     1     0
L(x̂, д(f (x))) Where x̂ is an image with an artificial green corner,          C     0     153   489   0     0     0    3     0     0    0    0     0     0    0     0     0

and x is the part that was replaced by the corner, we include an               D
                                                                               E
                                                                                     0
                                                                                     0
                                                                                           0
                                                                                           0
                                                                                                 0
                                                                                                 0
                                                                                                       502
                                                                                                       46
                                                                                                             39
                                                                                                             517
                                                                                                                   0
                                                                                                                   1
                                                                                                                        0
                                                                                                                        0
                                                                                                                              0
                                                                                                                              0
                                                                                                                                    0
                                                                                                                                    0
                                                                                                                                         0
                                                                                                                                         0
                                                                                                                                              3
                                                                                                                                              1
                                                                                                                                                    0
                                                                                                                                                    0
                                                                                                                                                          0
                                                                                                                                                          0
                                                                                                                                                               1
                                                                                                                                                               0
                                                                                                                                                                     0
                                                                                                                                                                     0
                                                                                                                                                                           45
                                                                                                                                                                           15
                                                                               F     0     0     0     0     0     0    0     0     0    0    0     0     0    0     0     0
adversarial loss, as described in [6].                                         G     2     2     3     0     0     0    547   0     0    0    0     0     0    0     1     0
   With the context encoder we feed images without a green corner              H
                                                                               I
                                                                                     0
                                                                                     3
                                                                                           0
                                                                                           0
                                                                                                 0
                                                                                                 0
                                                                                                       0
                                                                                                       0
                                                                                                             0
                                                                                                             2
                                                                                                                   0
                                                                                                                   0
                                                                                                                        0
                                                                                                                        0
                                                                                                                              486
                                                                                                                              1
                                                                                                                                    35   0
                                                                                                                                    1857 0
                                                                                                                                              0
                                                                                                                                              3
                                                                                                                                                    0
                                                                                                                                                    1
                                                                                                                                                          0
                                                                                                                                                          0
                                                                                                                                                               0
                                                                                                                                                               0
                                                                                                                                                                     0
                                                                                                                                                                     0
                                                                                                                                                                           0
                                                                                                                                                                           3
in to the encoder-decoder network. The output of the network is                J     1     0     0     0     0     1    0     1     0    36   0     0     1    0     0     0
                                                                               K     8     0     1     5     2     3    4     0     0    0    349   17    0    2     1     55
the same size as the area we want to fill.                                     L     11    0     1     2     1     0    1     0     1    1    11    542   0    0     0     3
                                                                               M     2     0     0     0     0     0    0     18    2    0    1     0     1064 0     1     3
                                                                               N     2     0     0     1     1     0    0     0     0    0    1     0     0    183   4     5
2.3     Context conditional generative adversarial                             O
                                                                               P
                                                                                     0
                                                                                     0
                                                                                           0
                                                                                           0
                                                                                                 0
                                                                                                 0
                                                                                                       0
                                                                                                       0
                                                                                                             0
                                                                                                             0
                                                                                                                   0
                                                                                                                   0
                                                                                                                        0
                                                                                                                        0
                                                                                                                              0
                                                                                                                              0
                                                                                                                                    1
                                                                                                                                    0
                                                                                                                                         0
                                                                                                                                         0
                                                                                                                                              0
                                                                                                                                              0
                                                                                                                                                    0
                                                                                                                                                    0
                                                                                                                                                          0
                                                                                                                                                          0
                                                                                                                                                               2
                                                                                                                                                               1
                                                                                                                                                                     389
                                                                                                                                                                     0
                                                                                                                                                                           0
                                                                                                                                                                           131

        network
For the generative adversarial approach, we create a similar struc-     MCC score, though the base case got the best result. In both cases
ture as the autoencoder. We have a constant 10% dropout at each         the clipping gave significantly worse result.
layer in the discriminator. As with the autoencoder we have the            As expected, most of the images was classified correctly, but
same size input as output, but we only decide to keep the parts we      we had some problems distinguishing between esophagitis and
want to inpaint.                                                        normal-z-line. We also had a few cases of instruments where there
   We use the same type of loss as the context encoder, with 15%        were none.
of the loss coming from a MSE loss, and the remaining 85% coming
from the adversarial loss.                                              4     CONCLUSION
                                                                        In general, when training on a dataset that is homogeneous, the
2.4     Clipping instead of inpainting                                  preprocessing is less valuable. We want to remove areas with sparse-
The last method was just to crop the images in a way that excluded      ness, and areas that has nothing to do with the classification.
the green corner. Since every image is scaled down to 256x256 px        In our example we used 3 different methods to do this, and we had
during preprocessing, the same is done with the clipped version         no improvements in the results. As we can see from the validation
(after the clip the size was reduced to 256x256).                       set, we saved under a percent on the best method, and we got a
   The clipping was done in a way so that we had the most amount        worse score on the official results.
of center frame, and minimal amount of the bottom left corner,          We conclude that preprocessing the Medico dataset is not worth
without sacrificing to much of the image.                               the hassle. The effort put in to preprocess the images yields little to
                                                                        no improvement to the result. We recommend that the time is used
3     RESULTS AND ANALYSIS                                              to find the right network, with the right hyper-parameters instead.
We made the augmented datasets before we trained the prepro-            A reason to lackluster results might be caused that the training
cessing model. This means that the transferlearning model did not       and the test set have the same green squares in the same classes.
augment the images at runtime. We split the data into a 70% train       We suspect that the similarity in the test and train set makes the
set, and a 30% validation set.                                          squares an essential part of the image. We believe that the result
   Our results on the test set are tabulated in Table 1. The official   would be much better if the test set would be completely without
Results on the test set are tabulated in Table 2. Table 3 shows the     the squares, as they would if they were ”real time” images.
confusion matrix from the CC-GAN from the official test set.               In a future test we would also recommend removing the four
                                                                        black edges too. With the images being round, this might be a
   The results show that the CC-GAN got the highest MCC score           challenge, since there are no full-resolution images (without zoom)
with 0.926, and also the most realistic inpaintings. The context        that captures the edges. With the medico dataset, this method will
encoder had the lowest MCC score with 0.920, and also the worst         probably not give a better score, on the basis that every image in
inpainted areas. The official result did have the same pattern in       the dataset has the same four black corners.
Medico Multimedia Task                                                        MediaEval’18, 29-31 October 2018, Sophia Antipolis, France


REFERENCES
[1] Aaron Courville Yoshua Bengio David Warde-Farley, Ian J. Goodfellow.
    2013. An empirical analysis of dropout in piecewise linear networks.
    abs/1609.05158 (2013). arXiv:1312.6197v2 https://arxiv.org/pdf/1312.
    6197v2
[2] Emily L. Denton, Sam Gross, and Rob Fergus. 2016. Semi-Supervised
    Learning with Context-Conditional Generative Adversarial Networks.
    CoRR abs/1611.06430 (2016). arXiv:1611.06430 http://arxiv.org/abs/
    1611.06430
[3] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David
    Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014.
    Generative adversarial nets. In Advances in neural information process-
    ing systems. 2672–2680.
[4] Y. Kamp H. Bourlard. 1988. Auto-Association by Multilayer Percep-
    trons and Singular Value Decomposition. (1988). http://ace.cs.ohio.
    edu/~razvan/courses/dl6890/papers/bourlard-kamp88.pdf
[5] Pål Halvorsen Thomas de Lange Kristin Ranheim Randel Duc-Tien
    Dang-Nguyen Mathias Lux Konstantin Pogorelov, Michael Riegler.
    2018.       Mediaeval information.         http://multimediaeval.org/
    mediaeval2018/medico/. (2018). Accessed: 2018-10-16.
[6] Deepak Pathak, Philipp Krähenbühl, Jeff Donahue, Trevor Darrell,
    and Alexei A. Efros. 2016. Context Encoders: Feature Learning by
    Inpainting. CoRR abs/1604.07379 (2016). arXiv:1604.07379 http://arxiv.
    org/abs/1604.07379
[7] Konstantin Pogorelov, Kristin Ranheim Randel, Carsten Griwodz,
    Sigrun Losada Eskeland, Thomas de Lange, Dag Johansen, Con-
    cetto Spampinato, Duc-Tien Dang-Nguyen, Mathias Lux, Peter The-
    lin Schmidt, Michael Riegler, and Pål Halvorsen. 2017. KVASIR: A
    Multi-Class Image Dataset for Computer Aided Gastrointestinal Dis-
    ease Detection. In Proceedings of the 8th ACM on Multimedia Sys-
    tems Conference (MMSys’17). ACM, New York, NY, USA, 164–169.
    https://doi.org/10.1145/3083187.3083212
[8] Konstantin Pogorelov, Michael Riegler, Pål Halvorsen, Thomas De
    Lange, Kristin Ranheim Randel, Duc-Tien Dang-Nguyen, Mathias Lux,
    and Olga Ostroukhova. 2018. Medico Multimedia Task at MediaEval
    2018. (2018).
[9] Christian Szegedy, Sergey Ioffe, and Vincent Vanhoucke. 2016.
    Inception-v4, Inception-ResNet and the Impact of Residual Connec-
    tions on Learning. CoRR abs/1602.07261 (2016). arXiv:1602.07261
    http://arxiv.org/abs/1602.07261

</pre>