The 2018 Medico Multimedia Task Submission of Team NOAT using Neural Network Features and Search-based Classification Michael Steiner1 , Mathias Lux1 , and Pål Halvorsen2,3 1 Alpen-Adria-Universität Klagenfurt, Austria; 2 SimulaMet, Norway; 3 University of Oslo, Norway; michstei@edu.aau.at,mlux@itec.aau.at,paalh@simula.no ABSTRACT do this we create one entry per model per file in the index, resulting In this paper, we describe our approach for the classification of in 8 index entries per file. medical images depicting the human gastrointestinal tract. Search- based classification is performed in three stages. In the first stage, 3 INDEXING we extract deep features for each image using pre-trained deep- In order to index the extracted features (see section 2) with LIRE [3], learning models [1]. In the second stage, we use LIRE [3] to index we first created java classes for each model to represent the feature. the generated features, so that we can then, in the final stage, search The classes are based on the existing classes for global features, the index for similar images and make our predictions based on the to leave the option of combining the deep-learning features with results. With this approach, we achieved a MCC score of 0, 54 and the already implemented global features open. These new classes a accuracy of 0, 94, which shows that deep features combined with were necessary because we need a way to retrieve the features search-based classification are a viable option for medical image from the CSV files rather than the images themselves. Next, we analysis. adapted LIRE’s document builder class to support the new feature classes and then indexed the images using the pre-computed feature 1 INTRODUCTION vectors with locality sensitive hashing (bit sampling) and/or metric spaces hashing. The aim of the 2018 MEDICO task [5] is to classify images of the gastrointestinal (GI) tract into the provided categories. The task provides images and several pre-extracted global image features. 4 SEARCHING The images of the development set [4] are categorized into 16 classes. Analog to the indexing in section 3, we had to adapt the existing In our approach, we use the extracted features of pre-trained deep- LIRE classes for searching the index to support the new feature learning models [1] to perform a search-based classification [6] classes and work with file paths instead of images. To find most using LIRE [3]. similar images, we used the cosine distance function, which was already implemented in LIRE. 2 FEATURE EXTRACTION The deep-learning features are extracted with a Python script, using 5 CLASSIFICATION the Keras API [1]. For this task we used features from the follow- For the classification, we took the best nine results of each model ing models: DenseNet121, DenseNet169, DenseNet201, ResNet50, and generated predictions for each model by counting the returned MobileNet, VGG16, VGG19, Xception. categories. These intermediate predictions are weighted and com- The models are pre-trained on the ImageNet dataset [2] and cho- bined to form the final prediction. The weights are manually gener- sen based on their result on the ImageNet dataset. For the VGG16 ated based on experiments on the development data set [4]. and VGG19 models, we are not using the four top layers, which are used for the ImageNet predictions, and we use a GlobalMaxPool- 6 RESULTS ing2D layer as final layer. This leaves us with feature vectors of 512 values for these two models. For our submitted runs we used all the files of the training dataset For the other models we are also not using the top layers, but in to create the index. We used following configurations for our four addition to the GlobalMaxPooling2D layer we also added a Dense submitted runs: layer to get a length of 1024 values for our feature vectors. The • Run 1: Integer Features with Bitsampling Hashing generated feature vectors are then stored in comma-separated- The extracted features are quantized, to use integer values (CSV) files. We create one file for each model and it contains values instead of double values, and then indexed using the filename followed by the feature vector. Before the indexing of bitsampling hashing. the files we also perform a quantization step to bring the feature • Run 2: Integer Features with Metric Spaces Hashing vectors from double range to integer range. This step not only The extracted features are quantized, to use integer increased the accuracy in our tests, but it also made the index values instead of double values, and then indexed using smaller and reduced the processing time for indexing and searching. metric spaces hashing. The different models are then combined in the indexing stage. To • Run 3: Integer Features with Bitsampling and Metric Spaces Hashing Copyright held by the owner/author(s). MediaEval’18, 29-31 October 2018, Sophia Antipolis, France The extracted features are quantized, to use integer val- ues instead of double values. Then they are indexed using MediaEval’18, 29-31 October 2018, Sophia Antipolis, France M. Steiner, M. Lux, P. Halvorsen Table 1: Official run submission results Recall Specificity Precision Accuracy F1 MCC Rk statistic Run 1 0.5756 0.9717 0.5756 0.9469 0.5756 0.5473 0.5368 Run 2 0.3677 0.9578 0.3677 0.9209 0.3677 0.3255 0.3194 Run 3 0.5371 0.9691 0.5371 0.9421 0.5371 0.5063 0.5039 Run 4 0.5667 0.9711 0.5667 0.9458 0.5667 0.5378 0.5282 Table 2: Confusion Matrix for Run 1. (Classes: blurry-nothing (BLN), colon-clear (COC), dyed-lifted-polyps (DLP), dyed-resection-margins (DRM), esophagitis (ESO), instruments (INS), normal-cecum (NOC), normal-pylorus (NOP), normal-z-line (NZL), out-of-patient (OOP), polyps (POL), retroflex-rectum (RER), retroflex-stomach (RES), stool-inclusions (STI), stool-plenty (STP), ulcerative-colitis (ULC) ) Pred \Act class ULC ESO NZL DLP DRM OOP NOP STI STP BLN POL NOC COC RER RES INS ULC 366 15 5 53 51 0 44 0 23 0 53 7 4 53 38 40 ESO 0 1 0 2 4 0 5 0 1 0 2 0 0 1 2 2 NZL 2 275 485 4 1 1 51 3 3 0 4 0 7 3 17 4 DLP 55 52 26 241 169 0 130 98 123 0 148 37 122 61 180 96 DRM 48 83 24 163 252 3 114 36 62 14 64 12 40 35 66 84 OOP 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NOP 1 1 0 3 1 0 38 0 0 0 7 0 0 0 2 2 STI 0 0 0 0 0 0 0 365 0 0 0 0 0 0 0 1 STP 3 6 0 2 2 0 4 0 1732 0 1 0 0 0 1 2 BLN 0 0 0 11 12 0 0 0 2 23 0 0 2 0 1 3 POL 21 5 2 30 26 0 89 2 8 0 41 2 2 27 22 25 NOC 43 0 0 40 40 1 2 0 2 0 45 526 0 5 0 8 COC 0 0 0 0 0 0 0 1 5 0 0 0 888 0 0 0 RER 0 0 0 1 0 0 0 0 0 0 0 0 0 4 1 1 RES 3 118 21 6 6 0 84 1 4 0 9 0 0 3 67 3 INS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 bitsampling hashing and metric spaces hashing, resulting OOP and 36 for INS, it is almost impossible to find similar images, in two indexes that are searched for the classification. especially since images in these two classes can vary a lot more • Run 4: Double Features with Bitsampling Hashing compared to the other classes. The extracted features are not quantized and then in- dexed using bitsampling hashing. 7 DISCUSSION & OUTLOOK As shown in Table 1, we achieved the best results with Run 1. In this paper, we showed an approach to utilize deep feature vectors Here, we used the features from the deep learning Models, config- for search-based classification, by extracting the features with deep ured as described in Section 2, and quantized them from the double neural networks and indexing those features to make them search- range to integer range. Run 2 shows, that metric spaces hashing able. We got an accuracy of 0, 94 and MCC score of 0, 54. This was seems to be less suited for this task. It has given better result in achieved by using bitsampling indexing combined with features predicting Esophargitis correctly, but over all the results are signif- quantized to the integer range, which may cause a noise reduction icantly worse. And in the combination of both hashing methods compared to the double features. Further experiments could be (run 3), we could notice that for example the correct prediction made, to see if better results can be achieved by training the models of Esophargitis got slightly better, but overall the worse perfor- on medical images rather than using the models trained on the mance of metric spaces had a negative impact on the prediction. ImageNet dataset[2]. Run 4 shows that the quantization from double to int brings a slight increase in overall performance. ACKNOWLEDGEMENTS Table 2 shows the confusion matrix of our best results in Run 1. We would like to thank our colleagues who helped for all the input The main problem areas here are the classification of Esophargitis as and discussions. Travel was funded by Alpen-Adria Universität Normal-Z-Line and mixing up Dyed-Resection-Margins with Dyed- Klagenfurt. Lifted-Polyps. The results for the classes Out-Of-Patient (OOP) and Instruments (INS) are very low, this is mostly because of the lack REFERENCES of example images in the dataset [5]. With only four images for [1] François Chollet and others. 2015. Keras. https://keras.io. (2015). Team NOAT @ Medico MediaEval’18, 29-31 October 2018, Sophia Antipolis, France [2] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. Ima- Detection. In Proceedings of the 8th ACM on Multimedia Systems Con- geNet: A Large-Scale Hierarchical Image Database. In CVPR09. ference. ACM, 164–169. [3] Mathias Lux and Savvas A. Chatzichristofis. 2008. Lire: Lucene Image [5] Konstantin Pogorelov, Michael Riegler, Thomas De Lange, Kristin Ran- Retrieval: An Extensible Java CBIR Library. In Proceedings of the 16th heim Randel, Mathias Lux, and Olga Ostroukhova. 2018. Medico ACM International Conference on Multimedia (MM ’08). ACM, 1085– Multimedia Task at MediaEval 2018. 7 (2018), 7–9. 1088. https://doi.org/10.1145/1459359.1459577 [6] Michael Riegler, Martha Larson, Mathias Lux, and Christoph Kofler. [4] Konstantin Pogorelov, Kristin Ranheim Randel, Carsten Griwodz, 2014. How ’How’ Reflects What’s What: Content-based Exploitation Sigrun Losada Eskeland, Thomas de Lange, Dag Johansen, Con- of How Users Frame Social Images. In Proceedings of the 22Nd ACM cetto Spampinato, Duc-Tien Dang-Nguyen, Mathias Lux, Peter Thelin International Conference on Multimedia (MM ’14). ACM, New York, NY, Schmidt, Michael Riegler, and Pål Halvorsen. 2017. Kvasir: A Multi- USA, 397–406. https://doi.org/10.1145/2647868.2654894 Class Image Dataset for Computer Aided Gastrointestinal Disease