INTRODUCTION

Transfer learning with prioritized classification and training dataset equalization for medical objects detection

Olga Ostroukhova

Konstantin Pogorelov

konstantin@simula.no 2 4

Michael Riegler

michael@simula.no 1 4

Duc-Tien Dang-Nguyen

Pål Halvorsen

1 4 0 Research Institute of Multiprocessor Computation Systems n.a. A.V. Kalyaev , Russia 1 Simula Metropolitan Center for Digital Engineering , Norway 2 Simula Research Laboratory , Norway 3 University of Bergen , Norway 4 University of Oslo , Norway

2018

29 31

This paper presents the method proposed by the organizer team (SIMULA) for MediaEval 2018 Multimedia for Medicine: the Medico Task. We utilized the recent transfer-learning-based image classification methodology and focused on how easy it is to implement multi-class image classifiers in general and how to improve the classification performance without deep neural network model redesign. The goal for this was both to provide a baseline for the Medico task and to show the performance of out-of-the-box classiifers for the medical use-case scenario.

INTRODUCTION

This paper provides a detailed description of the methods proposed by team SIMULA for MediaEval 2018 Multimedia for Medicine Medico Task [ 11 ]. The main goal of the task is to perform medical image classification. The use case scenario is gastrointestinal endoscopies. The 2018-year version of the task is designed as an sixteen classes classification problem. Compared to the 2017-year version which was limited to eight classes [ 9 ], the current version of the task comes with several additional challenges such as an imbalanced number of samples in the classes to make it more realistic [ 8, 9 ]. In the previous year of the task, participants proposed diferent methods ranging from simple handcrafted features to deep neural networks [ 3–6, 10, 12 ]. For our approach, we propose a convolutional neural network approach (CNN) in combination with transfer learning. To compensate for the imbalanced dataset, we perform prioritized classification and dataset equalization.

PROPOSED APPROACH

As the organizer’s team for the Medico task, our aim is not achieving the best possible classification performance. Instead, we decided to check how low is the entry threshold to the medical images classification and corresponding lesion detection challenge. To achieve this, and also to provide a baseline for the competing teams, we involved the recent transfer-learning-based image classification methodology and checked how well we are able to (i) easily implement multi-class image classifier and (ii) improve the classification performance without deep neural network model redesign.

Thus, for the basic classification algorithm, we used a CNN architecture and a transfer learning-based classifier, which has been previously introduced for the medical images classification in our previous work [ 7 ]. This approach is based on the Inception v3 architecture [ 13 ]. To achieve the highest possible performance on the provided limited development set, we used the model pre-trained on the ImageNet dataset [ 1 ]. We performed the model retraining using the method described in [ 2 ]. We kept all the basic convolutional layers of the network and only retrained the two top fully connected (FC) layers after random initialization of their weights. The FC layers were retrained using the RMSprop [ 14 ] optimizer which allows an adaptive learning rate during the training process. We did not used any additional enhancing or pre-processing for the images provided in the datasets. In order to increase the number of training samples, we performed various augmentation operations on the images in the training set. Specifically, we performed horizontal and vertical flipping and a change of brightness in the interval of ±20%.

The initial experimental studies showed that the pre-trained Inception v3 model is able to eficiently extract high-level features from the given medical images, and it is converge quickly during the retraining process with suficient resulting classification performance (see section 3). However, due to a heavily imbalanced training dataset and despite the used training data augmentation, the detection performance of some classes was not good enough. To solve this issue, we implemented an additional training dataset balancing procedure that performs equalization of the training set by the random duplication of the training samples for the underiflled classes, like instruments, blurry, etc. This nearly doubled the number of the training samples allowing for better classification performance for the classes with a low number of images provided.

An additional classifier output post-processing step was implemented in order to address the diferent importance of the diferent classes as it was stated in the task dataset description [ 11 ]. Specifically, we performed the prioritized selection of the resulting output class for each image based of the model’s probability output. This was implemented as the selection of the first class with the detection probability higher than a set threshold from the array of classes sorted in order of their importance. 3

RESULTS AND ANALYSIS

For the oficial task submission creation, two separate models were used, trained on the diferent datasets. The first model was trained on the training set created from the development set using the described (see section 2) data augmentation procedure. The trained model was used to process the task’s test set, and the classification output was post-processed using the prioritized classification selector with four diferent probability threshold settings from 0.75 to 0.1 resulting in the runs #2 - #5. For the run #1, we used the max probability selector without class prioritization. The results using the first model were submitted as the speed runs. The second model was trained using the equalized training set, and the same rules for the five runs generation were submitted as the detection run.

The oficial evaluation results for all the runs are shown in table 1. As one can see, all the runs significantly outperform the ZeroR and Random baselines and show good classification performance. All the runs that utilize the equalized training set have slightly better classification performance. Surprisingly, the introduced prioritized classification method did not result in improved detection performance, not for the original nor for the equalized training sets. With the threshold of 0.75, the classification performance is equal to the non-prioritized runs. It means that the trained classifier is performing as well as it can, and additional re-classification using the class priorities does not make sense for this particular dataset. However, it still can be potentially interesting for bigger datasets or a higher number of classes. The best performing run was the detection run #1 generated using the equalized training set and non-prioritized classifier with the classification performance of 0.854 for Rk statistic (MCC for k diferent classes). The confusion matrix for this run is depicted in table 2, and the class imbalance and corresponding training and classification challenges can be easily observed. The most challenging class was Instruments that is mostly caused by the diferent shapes, positions and visibilities of the instruments in the images. There also was a number of miss-classification cases for the Dyed classes as well as for Esophagitis and Normal Z-line classes.

With respect to the classification performance in terms of processing speed, the proposed classified can process approximately 43 frames per second on a GPU-enabled consumer-grade personal computer regardless of the enabled or disabled post-processing classes prioritization. 4

CONCLUSIONS AND FUTURE WORK

In this paper, we presented an out-of-the-box solution utilizing a modern pre-trained CNN for the task of medical image classification. The goal was to provide a baseline for the task and to show the performance of basic methods without any deep architecture modification. The best achieved performance measured as Matthew correlation coeficient for k diferent classes of 0.854 and a speed of 43 frames per second. This is already a quite good result for an out-of-the-box method.

[1]

Jia

Deng , Wei Dong, Richard Socher, Li-Jia

Kai

Li , and Li Fei-Fei. 2009 . Imagenet: A large-scale hierarchical image database . In Computer Vision and Pattern Recognition , 2009 . CVPR 2009 . IEEE Conference on. Ieee, 248 - 255 .

[2]

Jef

Donahue , Yangqing Jia, Oriol Vinyals, Judy Hofman, Ning Zhang, Eric Tzeng, and

Trevor

Darrell . 2014 . DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. . In Proc. of ICML , Vol. 32 . 647 - 655 .

[3] Yang

Liu

, Zhonglei Gu , and William K Cheung. 2017 . HKBU at MediaEval 2017 Medico: Medical multimedia task . In Working Notes Proceedings of the MediaEval 2017 Workshop (MediaEval 2017 ).

[4]

Syed

Sadiq Ali Naqvi , Shees Nadeem, Muhammad Zaid, and Muhammad Atif Tahir. 2017 . Ensemble of Texture Features for Finding Abnormalities in the Gastro-Intestinal Tract . Working Notes Proceedings of the MediaEval 2017 Workshop (MediaEval 2017 ).

[5]

Stefan

Petscharnig and

Klaus

Schöfmann . 2018 . Learning laparoscopic video shot classification for gynecological surgery . An International Journal of Multimedia Tools and Applications 77 , 7 ( 2018 ), 8061 - 8079 .

[6]

Stefan

Petscharnig , Klaus Schöfmann, and

Mathias

Lux . 2017 . An Inception-like CNN Architecture for GI Disease and Anatomical Landmark Classification . In Working Notes Proceedings of the MediaEval 2017 Workshop (MediaEval 2017 ).

[7]

Konstantin

Pogorelov , Sigrun Losada Eskeland, Thomas de Lange, Carsten Griwodz, Kristin Ranheim Randel, Håkon Kvale Stensland, Duc-Tien Dang-Nguyen, Concetto Spampinato, Dag Johansen, Michael Riegler , and others. 2017 . A holistic multimedia system for gastrointestinal tract disease detection . In Proceedings of the 8th ACM on Multimedia Systems Conference. ACM , 112 - 123 .

[8]

Konstantin

Pogorelov , Kristin Ranheim Randel, Thomas de Lange, Sigrun Losada Eskeland, Carsten Griwodz, Dag Johansen, Concetto Spampinato, Mario Taschwer, Mathias Lux, Peter Thelin Schmidt,

Michael

Riegler , and

Pål

Halvorsen . 2017 . Nerthus: A Bowel Preparation Quality Video Dataset . In Proceedings of the 8th ACM on Multimedia Systems Conference (MMSYS) . ACM , 170 - 174 .

[9]

Konstantin

Pogorelov , Kristin Ranheim Randel, Carsten Griwodz, Sigrun Losada Eskeland, Thomas de Lange, Dag Johansen, Concetto Spampinato, Duc-Tien Dang-Nguyen, Mathias Lux, Peter Thelin Schmidt, and others. 2017 . Kvasir: A multi-class image dataset for computer aided gastrointestinal disease detection . In Proceedings of the 8th ACM on Multimedia Systems Conference (MMSYS) . ACM , 164 - 169 .

[10] Konstantin

Pogorelov

, Michael Riegler, Pål Halvorsen, Carsten Griwodz, Thomas de Lange, Kristin Ranheim Randel, Sigrun Eskeland, Duc-Tien Dang-Nguyen, Olga Ostroukhova , and others. 2017 . A comparison of deep learning with global features for gastrointestinal disease detection . In Working Notes Proceedings of the MediaEval 2017 Workshop (MediaEval 2017 ).

[11] Konstantin

Pogorelov

, Michael Riegler, Pål Halvorsen, Thomas De Lange, Kristin Ranheim Randel, Duc-Tien Dang-Nguyen, Mathias

Lux , and Olga

Ostroukhova . 2018 . Medico Multimedia Task at MediaEval 2018 . In Working Notes Proceedings of the MediaEval 2018 Workshop.

[12]

Michael

Riegler , Konstantin Pogorelov, Pål Halvorsen, Carsten Griwodz, Thomas Lange, Kristin Ranheim Randel, Sigrun Eskeland, Dang Nguyen, Duc Tien, Mathias Lux, and others. 2017 . Multimedia for medicine: the medico Task at mediaEval 2017 . In Working Notes Proceedings of the MediaEval 2017 Workshop (MediaEval 2017 ).

[13] Christian

Szegedy

, Vincent Vanhoucke, Sergey Iofe, Jonathon Shlens, and

Zbigniew

Wojna . 2015 . Rethinking the inception architecture for computer vision . arXiv preprint arXiv:1512.00567 ( 2015 ).

[14]

Tijmen

Tieleman and

Geofrey

Hinton . 2012. Lecture 6 .5 -rmsprop: Divide the gradient by a running average of its recent magnitude . COURSERA: Neural networks for machine learning 4 , 2 ( 2012 ).