=Paper=
{{Paper
|id=Vol-2283/MediaEval_18_paper_17
|storemode=property
|title=Using Preprocessing as a Tool in Medical Image Detection
|pdfUrl=https://ceur-ws.org/Vol-2283/MediaEval_18_paper_17.pdf
|volume=Vol-2283
|authors=Mathias Kirkerød,Vajira Thambawita,Michael Riegler,Pål Halvorsen
|dblpUrl=https://dblp.org/rec/conf/mediaeval/KirkerodTRH18
}}
==Using Preprocessing as a Tool in Medical Image Detection==
Using preprocessing as a tool in medical image detection Mathias Kirkerød 1,3 , Vajira Thambawita1,2 , Michael Riegler1,2,3 , Pål Halvorsen1,3 1 Simula Research Laboratory, Norway 2 Oslo Metropolitan University 3 University of Oslo mathias.kirkerod@gmail.com,vajira@simula.no,michael@simula.no,paalh@simula.no ABSTRACT In this paper we describe our approach to gastrointestinal disease classification for the medico task at MediaEval 2018. We propose multiple ways to inpaint problematic areas in the test and training set to help with classification. We discuss the effect that prepro- cessing does to the input data with respect to removing regions with sparse information. We also discuss how preprocessing affects the training and evaluation of a dataset that is limited in size. We will also compare the different inpainting methods with transfer learning using a convolutional neural network. (a) Image before inpainting (b) Image after inpainting Figure 1: Differences of images after inpainting 1 INTRODUCTION Medical image diagnosis is a challenging task in the industry of computer vision. In the last couple of years, as computing power has increased, machine learning has become a tool in the task of image detection, segmentation and classification. In this paper we results should only come from the different training datasets we are looking in depth how to use machine learning to help solve use. classification tasks on the data-set from the Medico task [8]. The The medical data has 1 main feature that we focus on during the Medico task focuses on image classification in the gastrointestinal preprocessing, namely the green square in the bottom left corner. (GI) tract. The data is divided in to 16 different classes. A neural network often struggle with areas with really sparse infor- Similar to other parts of image detection, the Medico dataset mation. Our hypothesis is that just replacing the green area with a encounter the challenges that the amount of data is too small, or similar black area will not yield a better result. that the training data does not cover the full distribution of the data We have a dataset that we use as a base case. This dataset was in the test case. The main goal of this task is to classify medical not augmented, other than shrinking the size of every image to a images. Our proposal is to use unsupervised machine learning for fixed resolution. The other datasets were augmented in a way that removal of the green corners that are in the Medico dataset. The would cover up the green square in one way or another. details of the task are described in [5, 7]. Our hypothesis it that if we recreate the areas as they would look like without any sparse areas, the classifier can focus on the right features for classifications. We propose 4 different methods 2 APPROACH on how to inpaint the corner area of the medical images. An autoencoder [4], a context conditional generative adversarial Our approach is divided in to two steps: first preprocessing, then network[2, 3], a context encoder [6], and a simple crop of the image. classifying. Our focus is mainly on the preprocessing of the data to remove the green corners in the medical images. After the preprocessing the dataset we run it through a Con- 2.1 Autoencoder volutional Neural Network (CNN) based on transfer learning. We For the autoencoder approach, we created and trained a custom chose the CNN model based on the top 5 and top 1 accuracy of the autoencoder from scratch. Our autoencoder consist of a encoder- pre-trained networks on the Keras documentation pages. decoder network, with 2D convolutions as well as rectified linear In our approach we use the InceptionResNetV2 [9] network. units as activation functions. In the layer between the encoder and We also remove the top layer and replace it with a global average the decoder we included a 25% dropout. [1] pooling layer and a dense 16 layer output, to match the number of To preprocess the medical data we feed the whole image through classes wanted. In addition, we do not freeze any layers of the model. the encoder-decoder network. We take the loss of the whole recon- The five submissions that we run is with the same hyperparameters structed image, but only keep the inpainted part. Under training, in the transferlearning model. This means that the difference in the goal is to minimize the loss: L(x, д(f (x̃))) Where x is an image without a green corner, and x̃ is the same image with an artificial Copyright held by the owner/author(s). green corner. In theory we can replace any part of the image with MediaEval’18, 29-31 October 2018, Sophia Antipolis, France this method. MediaEval’18, 29-31 October 2018, Sophia Antipolis, France Kirkerød et al. Table 1: Validation set’ results Table 2: Official Results Method REC PREC SPEC ACC MCC F1 Method REC PREC SPEC ACC MCC F1 Autoencoder 0.929 0.929 0.981 0.929 0.923 0.928 Autoencoder 0.915 0.915 0.994 0.989 0.910 0.915 CC-GAN 0.931 0.932 1.000 0.931 0.926 0.931 CC-GAN 0.915 0.915 0.994 0.989 0.910 0.915 Contextencoder 0.926 0.928 0.945 0.926 0.920 0.926 Contextencoder 0.910 0.910 0.994 0.988 0.905 0.910 Clipping 0.903 0.904 0.980 0.903 0.895 0.903 Clipping 0.904 0.904 0.993 0.988 0.898 0.904 Non-augmenteted 0.925 0.927 0.981 0.925 0.919 0.924 Non-augmenteted 0.917 0.917 0.994 0.989 0.911 0.917 Table 3: Confusion Matrix 2.2 Context encoder A:ulcerative-colitis , B:esophagitis , C:normal-z-line , D:dyed-lifted-polyps , E:dyed- For the context encoder approach, we created a new encoder- resection-margins , F:out-of-patient , G:normal-pylorus , H:stool-inclusions , I:stool- decoder network. Here the encoder has a similar structure to the plenty , J:blurry-nothing , K:polyps , L:normal-cecum , M:colon-clear , N:retroflex- rectum , O:retroflex-stomach , P:instruments autoencoder, but our decoder is only making outputs at the size of the desired area to inpaint. In addition to the loss generated from A B C D E F G H I J K L M N O P taking a MSE loss[6]: A 510 0 1 0 1 0 1 0 69 0 5 24 0 3 0 13 B 3 401 68 0 1 0 5 0 0 0 0 0 0 0 1 0 L(x̂, д(f (x))) Where x̂ is an image with an artificial green corner, C 0 153 489 0 0 0 3 0 0 0 0 0 0 0 0 0 and x is the part that was replaced by the corner, we include an D E 0 0 0 0 0 0 502 46 39 517 0 1 0 0 0 0 0 0 0 0 3 1 0 0 0 0 1 0 0 0 45 15 F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 adversarial loss, as described in [6]. G 2 2 3 0 0 0 547 0 0 0 0 0 0 0 1 0 With the context encoder we feed images without a green corner H I 0 3 0 0 0 0 0 0 0 2 0 0 0 0 486 1 35 0 1857 0 0 3 0 1 0 0 0 0 0 0 0 3 in to the encoder-decoder network. The output of the network is J 1 0 0 0 0 1 0 1 0 36 0 0 1 0 0 0 K 8 0 1 5 2 3 4 0 0 0 349 17 0 2 1 55 the same size as the area we want to fill. L 11 0 1 2 1 0 1 0 1 1 11 542 0 0 0 3 M 2 0 0 0 0 0 0 18 2 0 1 0 1064 0 1 3 N 2 0 0 1 1 0 0 0 0 0 1 0 0 183 4 5 2.3 Context conditional generative adversarial O P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 2 1 389 0 0 131 network For the generative adversarial approach, we create a similar struc- MCC score, though the base case got the best result. In both cases ture as the autoencoder. We have a constant 10% dropout at each the clipping gave significantly worse result. layer in the discriminator. As with the autoencoder we have the As expected, most of the images was classified correctly, but same size input as output, but we only decide to keep the parts we we had some problems distinguishing between esophagitis and want to inpaint. normal-z-line. We also had a few cases of instruments where there We use the same type of loss as the context encoder, with 15% were none. of the loss coming from a MSE loss, and the remaining 85% coming from the adversarial loss. 4 CONCLUSION In general, when training on a dataset that is homogeneous, the 2.4 Clipping instead of inpainting preprocessing is less valuable. We want to remove areas with sparse- The last method was just to crop the images in a way that excluded ness, and areas that has nothing to do with the classification. the green corner. Since every image is scaled down to 256x256 px In our example we used 3 different methods to do this, and we had during preprocessing, the same is done with the clipped version no improvements in the results. As we can see from the validation (after the clip the size was reduced to 256x256). set, we saved under a percent on the best method, and we got a The clipping was done in a way so that we had the most amount worse score on the official results. of center frame, and minimal amount of the bottom left corner, We conclude that preprocessing the Medico dataset is not worth without sacrificing to much of the image. the hassle. The effort put in to preprocess the images yields little to no improvement to the result. We recommend that the time is used 3 RESULTS AND ANALYSIS to find the right network, with the right hyper-parameters instead. We made the augmented datasets before we trained the prepro- A reason to lackluster results might be caused that the training cessing model. This means that the transferlearning model did not and the test set have the same green squares in the same classes. augment the images at runtime. We split the data into a 70% train We suspect that the similarity in the test and train set makes the set, and a 30% validation set. squares an essential part of the image. We believe that the result Our results on the test set are tabulated in Table 1. The official would be much better if the test set would be completely without Results on the test set are tabulated in Table 2. Table 3 shows the the squares, as they would if they were ”real time” images. confusion matrix from the CC-GAN from the official test set. In a future test we would also recommend removing the four black edges too. With the images being round, this might be a The results show that the CC-GAN got the highest MCC score challenge, since there are no full-resolution images (without zoom) with 0.926, and also the most realistic inpaintings. The context that captures the edges. With the medico dataset, this method will encoder had the lowest MCC score with 0.920, and also the worst probably not give a better score, on the basis that every image in inpainted areas. The official result did have the same pattern in the dataset has the same four black corners. Medico Multimedia Task MediaEval’18, 29-31 October 2018, Sophia Antipolis, France REFERENCES [1] Aaron Courville Yoshua Bengio David Warde-Farley, Ian J. Goodfellow. 2013. An empirical analysis of dropout in piecewise linear networks. abs/1609.05158 (2013). arXiv:1312.6197v2 https://arxiv.org/pdf/1312. 6197v2 [2] Emily L. Denton, Sam Gross, and Rob Fergus. 2016. Semi-Supervised Learning with Context-Conditional Generative Adversarial Networks. CoRR abs/1611.06430 (2016). arXiv:1611.06430 http://arxiv.org/abs/ 1611.06430 [3] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information process- ing systems. 2672–2680. [4] Y. Kamp H. Bourlard. 1988. Auto-Association by Multilayer Percep- trons and Singular Value Decomposition. (1988). http://ace.cs.ohio. edu/~razvan/courses/dl6890/papers/bourlard-kamp88.pdf [5] Pål Halvorsen Thomas de Lange Kristin Ranheim Randel Duc-Tien Dang-Nguyen Mathias Lux Konstantin Pogorelov, Michael Riegler. 2018. Mediaeval information. http://multimediaeval.org/ mediaeval2018/medico/. (2018). Accessed: 2018-10-16. [6] Deepak Pathak, Philipp Krähenbühl, Jeff Donahue, Trevor Darrell, and Alexei A. Efros. 2016. Context Encoders: Feature Learning by Inpainting. CoRR abs/1604.07379 (2016). arXiv:1604.07379 http://arxiv. org/abs/1604.07379 [7] Konstantin Pogorelov, Kristin Ranheim Randel, Carsten Griwodz, Sigrun Losada Eskeland, Thomas de Lange, Dag Johansen, Con- cetto Spampinato, Duc-Tien Dang-Nguyen, Mathias Lux, Peter The- lin Schmidt, Michael Riegler, and Pål Halvorsen. 2017. KVASIR: A Multi-Class Image Dataset for Computer Aided Gastrointestinal Dis- ease Detection. In Proceedings of the 8th ACM on Multimedia Sys- tems Conference (MMSys’17). ACM, New York, NY, USA, 164–169. https://doi.org/10.1145/3083187.3083212 [8] Konstantin Pogorelov, Michael Riegler, Pål Halvorsen, Thomas De Lange, Kristin Ranheim Randel, Duc-Tien Dang-Nguyen, Mathias Lux, and Olga Ostroukhova. 2018. Medico Multimedia Task at MediaEval 2018. (2018). [9] Christian Szegedy, Sergey Ioffe, and Vincent Vanhoucke. 2016. Inception-v4, Inception-ResNet and the Impact of Residual Connec- tions on Learning. CoRR abs/1602.07261 (2016). arXiv:1602.07261 http://arxiv.org/abs/1602.07261