Ensemble of Texture Features for finding abnormalities in the Gastro-Intestinal Tract Syed Sadiq Ali Naqvi, Shees Nadeem, Muhammad Zaid, Muhammad Atif Tahir School of Computer Science National University of Computer and Emerging Sciences, Karachi Campus, Pakistan {k142268,k142092,k142009,atif.tahir}@nu.edu.pk ABSTRACT compared to others. Hence, they were removed from the model. An endoscopy is a procedure in which a doctor uses specialized in- The refined features were JCD, Edge Histograms, Color Layout, struments to view and operate on the internal organs and vessels of Auto Color Correlogram, Local Binary Pattern with radius 1 and the body. This paper aims to predict the diseases and abnormalities haralick texture features. in the Gastro-Intestinal Tract, using multimedia data. It differs from We then train separate model using logistic regression [7] and other projects in the medical domain because it does not use medi- kernel discriminant analysis using spectral regression [5, 6] for cal imaging like X-rays, CT scan etc. The dataset, which comprises each feature because of the composite nature of features. Ensemble of 4000 images, is provided by MediaEval Benchmarking Initiative technique was then applied to the predictions. Ensemble implies for Multimedia Evaluation. The data is collected during traditional the fact that final model makes use of majority voting among all the colonoscopy procedures. Techniques from the fields of multimedia independent models trained on each feature. It should be noted that content analysis (to extract information from the visual data) and we investigated various advanced machine learning techniques but machine learning (for classification) have been used. On testing the best results were obtained using logistic regression and thus data, 94% accuracy and an MCC of 0.73 is achieved using logistic reported in this paper. regression and ensemble on different features. One of the interesting characteristics of this competition in- cluded the limited use of data to train the models. We, therefore, use K-means clustering [8] to come up with a reduced data set 1 INTRODUCTION representing the whole distribution. We divide the dataset into Medical image diagnosis is one of the most challenging tasks perti- 10 clusters and extract images from each cluster in an equal ratio. nent to the industry of computer vision. Most of the work in the Through this, we extract 732 images from 4000 to train models. recent times has been done on CT-Scans, X-Rays, and MRI etc. The Medico Task of 2017 challenged their participants to predict the abnormalities in the Gastro-Intestinal tract through endoscopic examination [1]. This implies the presence of multimedia images instead of traditional medical images for the challenge [2]. Deep analysis on GI tract images can help to predict abnormalities and diseases in its initial stages [1]. 4000 images were used for train- ing purpose and the same numbers were reserved for testing data. Different pre-processing techniques were applied and machine learning models were deployed to produce healthy results. 2 OUR PROPOSED APPROACH Feature Engineering is one of the most challenging and key parts of any Machine Learning Project. Discriminating features are the Figure 1: Our proposed model. Pre-computed features are requirement for function approximation. The task organizers pro- provided by the organizer including ColorLayout, JCD, Edge- vided 6 pre-computed visual features for every image. These include Histogram etc. JCD, Tamura, Color Layout, Edge Histogram, Auto Color Correlo- gram and PHOG. Since texture plays an important role in the recognition of any object in the image and has been used a lot for different computer 3 RESULTS AND ANALYSIS vision tasks such as Facial recognition etc. We, therefore, compute The linear regression model was implemented using Python’s scikit- the texture of the images using the most common methods of Lo- learn package. Among other parameters of logistic regression, two cal Binary Pattern [3] and Haralick features [4]. This drastically of the most important parameters include “solver" and “multi_class" improves the classifier accuracy. Through 10-Fold cross validation parameters, for which we used the values of “lbfgs" and “ovr" respec- approach, it was found that some features perform very poorly as tively. The Broyden Fletcher Goldfarb Shanno (BFGS) algorithm is an iterative method for solving unconstrained nonlinear optimiza- Copyright held by the owner/author(s). MediaEval’17, 13-15 September 2017, Dublin, Ireland tion problems. One-Versus-Rest (ovr), also known as one-vs-all, is a strategy which fits one classifier per class. For each classifier, the MediaEval’17, 13-15 September 2017, Dublin, Ireland S.Sadiq et al. Table 1: Confusion Matrix of the best run (Run 1 ). n-cecum = normal-cecum. n-z-line = normal-z-line. n-pylorus = normal- pylorus. d= dyed. Predicted / Actual class polyps n-cecum n-z-line n-pylorus esophagitis d-res-margins d-lifted-polyps ulcerative-colitis polyps 341 13 0 0 0 4 20 98 normal-cecum 71 485 0 0 0 3 4 90 normal-z-line 0 0 451 0 225 0 0 0 normal-pylorus 4 0 1 500 0 0 0 10 esophagitis 3 0 48 0 275 0 0 4 dyed-resection-margins 0 0 0 0 0 360 119 0 dyed-lifted-polyps 5 2 0 0 0 133 356 0 ulcerative-colitis 76 0 0 0 0 0 1 298 class is fitted against all the other classes. In addition to its compu- Table 2: Results from testing data independently evaluated tational efficiency (only 8 classifiers are needed), one advantage of by the organizers. this approach is its interpretability. Since each class is represented by one and one classifier only, it is possible to gain knowledge about Approach Precision Specifity MCC F1 Accuracy the class by inspecting its corresponding classifier. This is the most Run 1 0.7665 0.966 0.736 0.767 0.942 commonly used strategy for multiclass classification and is a fair Run 2 0.764 0.966 0.734 0.765 0.941 default choice. Run 3 0.745 0.963 0.712 0.745 0.936 We train logistic regression on each feature resulting in 6 dif- Run 4 0.564 0.937 0.565 0.509 0.891 ferent models. Each model provided 8 probabilities, where each Run 5 0.688 0.955 0.649 0.689 0.922 probability represented a class confidence score. These probabilities were added together and the class with the highest probability score 3.5 Run 5 is chosen to be the predicted label. By applying the proposed model, we obtained the accuracy of 90% with the F1-score of 0.89 and MCC Firstly, KMeans Clustering [8] is applied to obtain 10 clusters from of 0.8 on the training data. While on testing data, which are indepen- each class. From these clusters, 732 images were selected such that dently run the organizers, we found the accuracy of 94% with the uniformity among the dataset is maintained. Run1 was duplicated F-score of 0.76 and MCC of 0.73 (Table 2). The best run is obtained on these selected 732 images. using Run 1 in which all 4000 images are used and this approach Table 1 shows the confusion matrix of the best run. It is observed is basically ensemble of 6 features (JCD, Edge Histograms, Color that the model performs remarkably well for Normal-Pylorus (all Layout, Auto Color Correlogram, Local Binary Pattern with radius 500 True Positive) and Normal-cecum (485). It also classifies Normal- 1 and haralick texture features). Logistic regression is being used z-line quite accurately (451), however, Esophagitis is also being as the classifier. In summary, following are the 5 runs submitted for confused with Normal-z-line quite often. Polyps are also being the abnormality detection: correctly classified moderately well (341), however, they are also being confused with Ulcerative-colitis (and vice versa) and Normal- cecum. Lastly, Dyed-resection-margins and Dyed-lifted-polyps are 3.1 Run 1 being confused with each other in some cases. It feels like the model Ensemble of 6 features [JCD, Edge Histogram, Color Layout, Auto is somewhat overfit on the Normal-cecum class. Color Correlogram, LBP, Haralick] trained on 4000 images, using Logistic Regression. 4 CONCLUSION We present our proposed model to classify gastro-intestinal ab- 3.2 Run 2 normalities using endoscopic images. Training (4000 samples) and Same as Run1 but 2000 images were randomly selected. Testing (4000 samples) data was provided by MediaEval Bench- marking Initiative for Multimedia Evaluation. As mentioned earlier in the introduction, the study used multimedia content analysis, 3.3 Run 3 machine learning and ensemble learning techniques for classifica- Same as Run1 with the addition of another feature. The new feature tion. The best of the results were found on logistic regression using was formulated by Kernel Discriminant Analysis (for dimensional- ensemble method on 6 different features (including Local Binary ity reduction) which takes an input all the 6 features. For this run, Pattern, Haralick texture feature) which resulted in an accuracy of 4000 images were used. 94% with F1-score of 0.76 and MCC of 0.73 on testing data. 3.4 Run 4 The model was trained on just reduced dimensions which were obtained by KDA. Nearest Neighbour was used as the classifier. Complete training data (4000) is used. The 2017 Multimedia for Medicine Task (Medico) MediaEval’17, 13-15 September 2017, Dublin, Ireland REFERENCES [1] Michael Riegler, Konstantin Pogorelov, PÃěl Halvorsen, Carsten Griwodz, Thomas de Lange, Kristin Ranheim Randel, Sigrun Losada Eskeland, Duc-Tien Dang- Nguyen, Mathias Lux, Concetto Spampinato, “Multimedia for Medicine: The Medico Task at MediaEval 2017", MediaEvalâĂŹ17, 13-15 September 2017, Dublin, Ireland. [2] Konstantin Pogorelov, Kristin Ranheim Randel, Carsten Griwodz, Sigrun Losada Eskeland, Thomas de Lange, Dag Johansen, Concetto Spampinato, Duc-Tien Dang- Nguyen, Mathias Lux, Peter Thelin Schmidt, Michael Riegler, PÃěl Halvorsen, “Kvasir: a multi-class image dataset for computer aided gastrointestinal disease detection", Proceedings of ACM on Multimedia Systems Conference (MMSYS), pp.164-169, 2017. [3] T. Ojala, M. PietikÃďinen, and T. T. MÃďenpÃďÃď, âĂIJMultiresolution gray-scale and rotation invariant texture classification with Local Binary Pattern,âĂİ IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 971-987, 2002 [4] Haralick, Robert M., and Karthikeyan Shanmugam. “Textural features for image classification." IEEE Transactions on systems, man, and cybernetics 6 (1973): 610- 621. [5] D. Cai, X. He, and J. Han. Speed up kernel discriminant analysis. The VLDB Journal, 20(1):21âĂŞ33, 2011. [6] M. A. Tahir et al. A robust and scalable visual category and action recognition sys- tem using kernel discriminant analysis with spectral regression. IEEE Transactions on Multimedia, 15(7), 2013. [7] Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006 [8] MacQueen, James , Some methods for classification and analysis of multivariate observations, Proceedings of the fifth Berkeley symposium on mathematical statistics and probability , 1967