=Paper=
{{Paper
|id=Vol-2283/MediaEval_18_paper_39
|storemode=property
|title=Majority Voting of Heterogeneous Classifiers for Finding Abnormalities in the Gastro-Intestinal Tract
|pdfUrl=https://ceur-ws.org/Vol-2283/MediaEval_18_paper_39.pdf
|volume=Vol-2283
|authors=Zeshan Khan,Muhammad Atif Tahir
|dblpUrl=https://dblp.org/rec/conf/mediaeval/KhanT18
}}
==Majority Voting of Heterogeneous Classifiers for Finding Abnormalities in the Gastro-Intestinal Tract==
Majority voting of Heterogeneous Classifiers for finding abnormalities in the Gastro-Intestinal Tract Zeshan Khan, Muhammad Atif Tahir School of Computer Science, National University of Computer and Emerging Sciences, Karachi Campus, Pakistan {zeshan.khan,atif.tahir}@nu.edu.pk ABSTRACT An endoscopy is a procedure in which a doctor uses specialized instruments to view and operate on the internal organs and vessels of the body. This paper aims to detect the diseases and abnormalities in the Gastro-Intestinal Tract using multimedia data. It differs from other projects in the medical domain because it does not use medical imaging like X-rays, CT scan etc. The dataset, which comprises of 5293 images, is provided by MediaEval Benchmarking Initiative for Multimedia Evaluation. The data is collected during traditional colonoscopy procedures. Techniques from the fields of multimedia content analysis (to extract information from the visual data) and machine learning (for classification) have been used. On testing data, 98% accuracy, 0.76 F1 and an MCC of 0.75 is achieved using majority voting of logistic regression, random forest, and extra trees classifiers. 1 INTRODUCTION Medical image diagnosis is one of the most challenging tasks perti- nent to the industry of computer vision. Most of the work in the recent times has been done on CT-Scans, X-Rays, and MRI etc. The Medico Task of 2018 [5] 1 challenged their participants to predict Figure 1: Proposed Model. the abnormalities in the Gastro-Intestinal tract through endoscopic examination [4]. This implies the presence of multimedia images instead of traditional medical images for the challenge [4]. Deep layers (16 convolutional layers and 3 fully-connected layers) for analysis on GI tract images can help to predict abnormalities and large scale image classification. With the help of pre-train process diseases in its initial stages. 5293 images were used for training using large dataset from the ImageNet challenge and retraining of purpose and the 8740 were reserved for testing data. Different pre- the last 2 layers with these medical images, the VGG 19 model is processing techniques were applied and machine learning models used to extract plentiful visual concepts. were deployed for accurate systems. Classifiers are trained on the logistic regression [1], random for- est [2] and extremely random trees classifier [3] for the features 2 APPROACH that have been extracted. There were two categories of the features Feature Engineering is one of the most challenging and key part including pre-computed texture features and VGG features, the of any Machine Learning problem. Figure 1 shows the proposed features extracted by using VGG19 pre-trained model. Ensemble model. Discriminating features are the requirement for the function implies the fact that the final model makes use of weighted majority approximation. The task organizers provided 6 pre-computed visual voting among all the independent models trained on all features. features for every image. These include JCD, Tamura, Color Layout, The weights of the ensemble are the percentage of accuracy mea- Edge Histogram, Auto Color Correlogram and PHOG. Alongside sure of the independent classifier. It should be noted that various these pre-computed visual features, deep learning features are also advanced machine learning techniques have been investigated but used to extract meaningful information for classification. There are the best results were obtained using logistic regression, random some visual features those can be extracted by using deep Networks. forest and extremely random trees classifiers and thus reported in As the training dataset is of 5293 images is very low for the training this paper. of a deep learning model, a pre-trained model VGG19 is used [6]. The interesting characteristics of this competition included the VGG 19 is a very deep convolutional networks of up to 19 weight limited data to train the models and the class imbalance. The tech- nique of resampling is used to generate more data for each class. 1 http://www.multimediaeval.org/mediaeval2018/medico/index.html The resampling generated some more features of each class and Copyright held by the owner/author(s). resulted in the same number of instance for each of the available MediaEval’18, 29-31 October 2018, Sophia Antipolis, France 16 classes. The resampling also increased the training dataset and the increased dataset is used to train and validate different models. MediaEval’18, 29-31 October 2018, Sophia Antipolis, France Z. Khan, A. Tahir 3 RESULTS AND ANALYSIS class. Overall, performance is satisfactory but still there is a need The linear regression, extremely randomized trees and random to investigate state of the art texture and local features to further forest models have been implemented using Python’s scikit-learn improve the performance. package. We trained logistic regression, random forest, and ex- tremely random trees on both deep and global features. The results Table 2: Confusion matrix of all classes. There are total 16 are first evaluated on training data using 10 Fold cross validation. classes and summary of all classes is shown. By applying the proposed model, we obtained the accuracy of 97%, F1 score of 0.90 and MCC of 0.81 on the 10 fold cross validation Predicted Actual ALL non-ALL of the training data. Based on this initial investigation, the follow- ALL 7271 1469 ing runs are submitted to evaluate the performance of classifiers non-ALL 1469 129631 independently. The runs are submitted with the focus of 3 runs for speed results generation and 3 runs for accuracy. • Run1 Ensemble of 7 features [JCD, Tamura, Color Lay- Table 3: Confusion matrix for class dyed-lifted-polyps ver- out, Edge Histogram, Auto Color Correlogram, PHOG and sus non dyed-lifted-polyps. df = dyed-lifted. VGG features] trained on 60 images, using voting of the logistic regression, random forest and extremely random Predicted Actual df-polyps non df-polyps trees classification algorithms. dyed-lifted-polyps 339 236 • Run2 Same as Run1 but trained on 300 images. non-dyed-lifted-polyps 217 7948 • Run3 Same as Run1 but trained on all 5293 images. • Run4 Ensemble of 6 features [JCD, Tamura, Color Layout, Edge Histogram, Auto Color Correlogram, and PHOG] Table 4: Confusion matrix for class dyed-resection-polyps trained on 60 images, using voting of the logistic regression, versus non dyed-resection-polyps. df = dyed-lifted. random forest and extremely random trees classification algorithms. Predicted class Actual class dr-margins non-dr-margins • Run5 Same as Run4 but trained on 300 images. dyed-resection-margins 387 232 • Run6 Same as Run4 but trained on all 5293 images. non-dyed-resection-margins 177 7944 Table 1: Accuracy, F1 , and MCC on different runs of testing data. Table 5: Confusion matrix for class polyps versus non non- polyps. Accuracy F1 MCC Run1 0.956 0.625 0.614 Predicted Actual polyps non-polyps Run2 0.957 0.587 0.603 polyps 241 281 Run3 0.954 0.549 0.572 non-polyps 133 8085 Run4 0.961 0.611 0.597 Run5 0.976 0.745 0.741 Run6 0.979 0.752 0.756 4 CHALLENGES AND FUTURE WORK Table 1 shows the summary of some evaluation criterions on best It has been observed that results produced for many classes are run. Accuracy of 97.9% is observed with F-score of 0.75 and MCC quite accurate. However, there are some classes that are confusing of 0.76. It is interesting to see that the best run is obtained by using the system. Future work aims to target these classes hierarchically just global features without using any deep learning features. We and improve the performance using local features. will investigate in future why deep features perform poorly. Initial investigation has indicated that a lot of samples that should belong 5 CONCLUSION to class “ulcerative-colitis" are misclassified as class “esophagitis" by A model to classify gastro-intestinal abnormalities using endoscopic using deep features. The best run is obtained using Run6 in which images is presented. Training (5293 samples) and Testing (8740 sam- all 5293 images are used and this approach is basically ensemble of 6 ples) data was provided by MediaEval Benchmarking Initiative for features (JCD, Tamara, Edge Histograms, Color Layout, Auto Color Multimedia Evaluation. As mentioned earlier in the introduction, Correlogram and PHOG). Logistic regression, random forest and the study used multimedia content analysis, machine learning and extremely random trees is being used as a classifier with weighted ensemble learning techniques for classification. The best of the majority voting. Table 2 is the confusion matrix of various classes. It results were found on majority voting of three models including is observed that total of around 1469 samples are misclassified. Two logistic regression, random forest and extremely random trees clas- categories are mainly responsible for the misclassification which sifier on 6 different features (including JCD, Tamura, Color Layout, are “dyed-lifted-polyps" and “dyed-resection-margins". Around 500 Edge Histogram, Auto Color Correlogram and PHOG) which re- samples are misclassified in these 2 categories (Tables 3 and 4). sulted in an accuracy of 97% with F1-score of 0.75 and MCC of 0.76 Table 5 shows the confusion matrix for polyps versus non-polyps on testing data. Medico Multimedia MediaEval’18, 29-31 October 2018, Sophia Antipolis, France REFERENCES [1] Christopher M. Bishop. 2006. Pattern Recognition and Machine Learn- ing. Springer. [2] Leo Breiman. 2001. Random Forests. Machine Learning 45, 1 (01 Oct 2001), 5–32. [3] Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely randomized trees. Machine Learning 63, 1 (2006), 3–42. [4] Konstantin Pogorelov, Kristin Ranheim Randel, Carsten Griwodz, Sigrun Losada Eskeland, Thomas de Lange, Dag Johansen, Con- cetto Spampinato, Duc-Tien Dang-Nguyen, Mathias Lux, Peter Thelin Schmidt, Michael Riegler, and Pål Halvorsen. 2017. KVASIR: A Multi- Class Image Dataset for Computer Aided Gastrointestinal Disease Detection. In Proceedings of the 8th ACM on Multimedia Systems Con- ference (MMSys’17). ACM, 164–169. [5] Konstantin Pogorelov, Michael Riegler, Pal Halvorsen, Thomas de Lange, Kristin Ranheim Randel, Duc-Tien Dang-Nguyen, Mathias Lux, and Olga Ostroukhova. 2018. Medico Multimedia Task at MediaEval 2018. In MediaEval18, 29-31 October 2018, Sophia Antipolis, France. [6] K. Simonyan and A. Zisserman. 2014. Very Deep Convolutional Net- works for Large-Scale Image Recognition. CoRR (2014).