Texture Analysis from 3D Model and Individual Slice Extraction for Tuberculosis MDR Detection, Type Classification and Severity Scoring Notebook for ImageCLEF 2018 Md Sajib Ahmed, Md Obaidullah Sk, Mohan Jayatilake, Teresa Gonçalves, and Luı́s Rato Department of Informatics, University of Évora, Portugal {jack6148,sk.obaidullah,jayatiml}@gmail.com, {tcg,lmr}@uevora.pt Abstract. Tuberculosis (TB) is a dreaded bacterial infection that af- fects human lungs. It has been known to mankind since ancient ages. Tu- berculosis ImageCLEF 2018 proposes a set of tasks based on Computed Tomography (CT) scan images of patients’ lungs. They are: multi-drug resistance (MDR) detection, tuberculosis type (TBT) classification and severity scoring (SVR). In this work, two different methods are presented to solve these problems. Texture analysis based methods (3D Modeling and Slice extraction approach) were used to generate feature values from CT scans and different classifiers were tested. 3D Modeling approach calculates seven statistical features of Mean, Skewness, Kurtosis, Homo- geneity, Energy, Entropy and Fractal Dimension. And Slice extraction approach calculates 96 dimensional feature vector based on Contrast, Correlation, Energy, Homogeneity, Entropy and Mean. In accordance with the ranking given by the organizers, this approach was ranked 1st for multi-drug resistance detection, 5th for tuberculosis type classification and 3rd tuberculosis severity scoring. Keywords: Tuberculosis, Computed Tomography, Classification, Tex- ture Analysis, Fractal Dimension, Machine Learning 1 Introduction Tuberculosis (TB) is an infection caused by a bacteria which named Mycobac- terium tuberculosis. This bacteria generally attacks the lungs but sometimes it can damage other parts of the body. TB spreads through the air when an in- fected person coughs, sneezes or talks. From World Health Organization (WHO) report, tuberculosis is one of the top ten courses of death worldwide [12]. The greatest problem that can happen to a TB patient is that the organ- isms become resistant to two or more of the standard drugs. In contrast to drug sensitive (DS) tuberculosis, its multi-drug resistant (MDR) form is much more difficult and costly to recover from. Thus, early identification of the drug resis- tance (DR) status is of great importance for an effective treatment. The most frequently used methods for DR detection are either costly or take a long time (up to several months), hence, there is an urgent requirement for fast, precise and cheap techniques. One possible technique is the automatic analysis of CT scan images of patient’s lungs to help characterize tuberculosis. ImageCLEF (the Image Retrieval and Analysis Evaluation Campaign of the Cross Language-Evaluation Forum) has organized challenges on image classifi- cation and retrieval since 2003 [11]. The 2018 edition [8] proposes 3 main tasks (and a pilot task), one being related to the analysis of tuberculosis from lung CT images [4]. This task includes three independent subtasks: multi-drug resis- tance (MDR) tuberculosis detection, tuberculosis type (TBT) classification and severity scoring (SVR). This work presents the University of Évora approach to tackle these subtasks. The rest of the paper is organized as follows: Section 2 describes the Method- ology (theory of methods, explanation of 3D modeling approach and slice extrac- tion approach) and Section 3 introduces the Experiments and Submitted Runs (dataset description, evaluation metrics, system configuration, top-5 submitted runs for each subtask and results on test dataset). Finally, Section 4 concludes the paper. 2 Methodology To tackle ImageCLEF 2018 tuberculosis task [4], two different approaches were tested: one based on 3D modeling and another based on slice extraction infor- mation. Next subsections present the underlying theory and techniques of the proposed approach. 2.1 Theory The present work is based on mean, higher order moments (skewness and kurto- sis) and texture analysis to classify tuberculosis images. We have proposed two basic models for the present ImageCLEF 2018 [8] competition. In this section, we present the theoretical basis for the estimation of mean and higher order moments, fractal dimension and GLCM-based texture analysis (energy and ho- mogeneity) techniques that used in both models. Mean and Higher Order Moments. The mean (1st moment), skewness (3rd moment) and kurtosis (4th moment) are measures of asymmetry and normalized form of the fourth central moment of the whole bronchi ROI respectively. The mean; found by adding the pixel values and dividing by the number of pixels. If the skewness is negative, the pixel values are spread out more to the left of the mean than to the right; if skewness is positive, the pixel values are spread out more to the right. Kurtosis indicates the degree of peakedness of the values; it is based on the size of the tail of the pixel value distribution. The skewness and kurtosis of the pixel value within the whole bronchi can be determined using Equation (1). Here, pi and i represent the signal intensity in ith pixel and the number of pixels within the bronchi the regions of interest (ROI) respectively. Further, p and f (pi ) characterize the mean of the pixel values within the bronchi and the probability of the pixel value falling within a specific value given by the range of this variable’s density, respectively. X nth moment = (pi − p)n f (pi ) (1) i Fractal Dimension. According to Peleg et. al. [13], the image within the whole ROI is treated as a hilly terrain and its height is proportional to the gray level of the image. Points with distance ε from the surface on both sides create a blanket, whose thickness is 2ε. The area of the blanket can be estimated as [2,5]: A(ε) = F ε2−D (2) where F is a constant and D is the fractal dimension of the surface. When the log(A(ε)) is plotted against log(ε) a straight line is obtained with a slope equal to 2−D, which gives an estimation of the fractal dimension (Equation(3)). log(A(ε)) = log(F ) + (2 − D)log(ε) (3) Gray Level Co-occurrence Matrix. Energy, homogeneity, contrast, correla- tion and entropy are statistical texture features. These features are based on the normalized gray-level co-occurrence matrix (GLCM); for a 2D bronchi map f , the co-occurrence matrix Mf,δ (k, l) represents the joint probability occurrence of pixel pairs (with a defined spatial relationship) having gray level values k and l , for a given spatial offset δ = (δx, δy) between the pair [7,14]. Mf,δ (k, l) is defined by Equation (4) X X    1, if f (x, y) = k and f (x + δx, y + δy) = l x=1n y=1n Mf,δ (k, l) = X X (4)    0, otherwise  x=1n y=1n The co-occurrence matrix Mf,δ (k, l) has dimension n×n, where n is the num- ber of gray levels in f . The GLCM accounts for the spatial inter-dependency or co-occurrence of two pixels at specific relative positions. Co-occurrence matri- ces are calculated for the directions of 0◦ , 45◦ , 90◦ , and 135◦ and the average matrix over all offsets can be used [1,10]. In this study, the 2D formulation with 8-connexity (computed with 8 different offsets) was used. On the basis of this matrix, the second-order statistical features of energy and homogeneity [7] for each image slice were derived. Then, the average of energy and homogeneity over all slices was estimated. Energy is defined as the measure of the extent of pixel pair repetitions in the matrix M and can be estimated using Equation (5). When pixels are very similar, the energy value will be large. X X Energy = Mf,δ (k, l)2 (5) k=1n l=1n Homogeneity is the statistical measure of the similarity of pixels in the matrix M and can be estimated using Equation (6). A diagonal gray level co-occurrence matrix gives homogeneity of 1 and the value becomes large if local textures only have minimal changes. X X Mf,δ (k, l) Homogeneity = (6) 1 + |k − l| k=1n l=1n Contrast returns a value after measuring the intensity contrast between a pixel and its neighbors over the entire image. Contrast is also known as variance or inertia and is given by Equation (7). X X Contrast = |k − l|2 Mf,δ (k, l) (7) k=1n l=1n Correlation returns a value after measuring how correlated a pixel is to its neighbors over the entire image. Correlation ranges between -1 and 1; it’s 1 or -1 for a perfectly positively or negatively correlated image and is N aN for a constant image. It can be calculated through the Equation (8) X X (k − µk)(l − µl)Mf,δ (k, l) Correlation = (8) σk σl k=1n l=1n Entropy returns a scalar value representing the entropy of a grayscale image. Entropy is a statistical measure of randomness that can be used to characterize the texture of an image and is defined by Equation (9). X X Entropy = −ln(Mf,δ (k, l)).Mf,δ (k, l) (9) k=1n l=1n 2.2 3D Modeling Approach The first method is based on 3D modeling and further extraction of texture patterns. The block diagram is presented in Figure 1. First, the input data set is pre-processed: slices are extracted and 2D data is converted to 3D data by selecting lung ROIs (using masks [3] given). Then, bronchi are extracted using a pre-defined threshold value. The texture analysis is performed within the bronchi. Finally, a feature vector is computed, followed by the application of multiple classifiers and their performance analysis. The main parts of the system are described below. Fig. 1. Overview of the 3D modeling approach Data Acquisition and ROI Selection. All CT images were in NIFTI (Neu- roimaging Informatics Technology Initiative) format and 3D images were ex- tracted. The masks of each subject were also provided in NIFTI format. The masks were applied to extract the 3D lung images of each subject. Then, a threshold value was defined in order to extract only the bronchi within the lung. Based on the threshold, ROIs were selected (see Figure 1). Then, the statistical features of mean, higher order moments (skewness and kurtosis), homogeneity, energy, and entropy of the whole bronchi pixel value distribution of each indi- vidual subject were calculated. Estimation of Mean. For quantitative data analysis, the ROIs on multiple image slices were extracted within the bronchi. Then the whole 3D bronchi ROI mean of signal intensity pixel values was computed by weighted (by pixel number) average of slice values (using all pixel values within the multi-slice ROIs encompassing the bronchi). The histograms of signal intensity were plotted with the mean values. Higher Order Moments and Fractal Dimension Analysis. The whole bronchi pixel value distribution higher order moments (skewness and kurtosis) and fractal dimension of pixel values were calculated within the bronchi ROI using the Equations (1) and (3) for each image slice covering the entire lung. The fractal dimension was estimated after linear fitting of log A(ε) vs. log(A(ε)). Homogeneity and Energy Analysis. Image texture analysis was performed on pixel values within the bronchi at all 2D slices within the 3D ROI. The normalized GLCM was calculated for each 2D slice, and based on the GLCM obtained, the two feature measures of energy and homogeneity were computed (Equations (5) and (6)). 2.3 Slice Extraction Approach In this method, a focus was taken on individual slices. Using a threshold and performing averages over attributes computed on each slice the final feature values for the particular image was obtained. First the input dataset is pre- processed: slice extraction, ROI generation using a mask and ROIs selection based on threshold values are done; then, using Texture analysis on each ROI features are extracted by averaging slice-wise values; finally, a feature vector is computed, followed by the application of multiple classifiers and their perfor- mance analysis. The block diagram is presented in Figure 2. Fig. 2. Overview of the slice extraction approach Individual Slice Consideration. Since the images were in NIFTI format (where a single image has several slices), so in this approach, we have taken individual slices for further processing and finally performed a weighted average among all slices. Figure 3 shows a slice of Tuberculosis CT scan image. After slice extraction, an ROI for each slice was defined based on the given provided mask [3]. Figure 4 shows a mask for image depicted in Figure 3. Observing each slice pattern, a threshold value of 15000 (pixel area) was chosen to ensure that no slices with meaningful information would be missed. Here, meaningful information corresponds to some dots being present in the ROI. Feature Extraction. There are two parts for the feature extraction task: Tex- ture Analysis on each ROI and Averaging Attribute Value. For texture analysis, the Gray Level Co-occurrence Matrix (GLCM) [7] based on texture features from the selected slices was computed. For each ROI, 96- dimensional features were extracted. For present work, 16 offsets are considered for generation of GLCM as shown below. The value 1, 2, 3, 4 represent the pixel distance from the point of inter- est in each of the four directions: 0, 45, 90 and 135 degrees. So altogether 16 Fig. 3. A slice of Tuberculosis CT Fig. 4. Provided Mask image of scan the lung offsets are available which eventually generates 16 co-occurrence matrices. Now, from each of this co-occurrence matrix, 06 features are computed namely: Con- trast, Correlation, Energy, Homogeneity, Entropy and Mean. Finally, 16∗ 6 = 96 dimensional feature vector is generated. (a) (0 1); (0 2); (0 3); (0 4); → at 0 degree direction (b) (-1 1); (-2 2); (-3 3); (-4 4); → at 45 degree direction (c) (-1 0); (-2 0); (-3 0); (-4 0); → at 90 degree direction (d) (-1 -1); (-2 -2); (-3 -3); (-4 -4) → at 135 degree direction Averaging attribute values for all slices was done to generate the final fea- ture vector, obtaining the correspondent 96 average feature values (for contrast, correlation, energy, homogeneity, entropy and mean). 2.4 Classifiers For each of the previous approaches, several machine learning algorithms were used to build classification models. The tested classifiers were: Linear Discrim- inant Analysis (LDA), Logistic Regression (L), Multilayer Perceptron (MLP), Simple Logistic (SL), Sequential Minimal Optimization (SMO), Logistic Model Trees (LMT), Random Forest (RF) and Random Tree (RT). A simple voting scheme (Vote) also experimented. 3 Experiment and Submitted Runs As already mentioned, there are 3 different sub-tasks: multi-drug resistant (MDR) detection, tuberculosis type (TBT) classification and severity scoring (SVR). This section describes the datasets and introduces the evaluation measures and system configuration. 3.1 Datasets Description Table 1 presents, for the MDR sub-task, the number of subjects of each class (multi-drug resistant vs. drug-sensitive patients) for the training and test sets. Table 1. MDR dataset: mumber of patients per class Patients Training set Test set DS 134 101 MDR 125 113 Total 259 214 The second sub-task is a multi-class classification problem with five tubercu- losis types: infiltrative, focal, tuberculoma, miliary, and fibro-cavernous. No in- formation about the relation between this the classes are given. Table 2 presents the details for training and test sets. For training set, the total number of pa- tients 677, but 1008 chest CT scans of TB patients along with the TB type. And some patients include more than one scan. Similar to the test set, patients 317 but chest CT scans image 505. Table 2. TBT dataset: number of patients per class Patients Training set Test set Infiltrative 228 (376) 89 (176) Focal 210 (273) 80 (115) Tuberculoma 100 (154) 60 (86) Miliary 79 (106) 50 (71) Fibro-cavernous 60 (99) 38 (57) Total 677 (1008) 317 (505) The third sub-task consisted of chest CT scans of TB patients with the corresponding severity score (1 to 5) and the severity level (“low” and “high”). Table 3 presents the details for training and test sets. Table 3. SVR dataset: number of patients per class Patients Training set Test set Low severity 90 62 High severity 80 47 Total 170 109 Moreover, lung segmentation extracted automatically [3] were also provided. In our work, we used this segmentation to restrict the region of interest of the lungs. Figure 5 shows sample slices of the Computerized Tomography (CT) im- ages with segmented lungs. Fig. 5. Sample slices of CT images with segmented lungs [9] 3.2 Evaluation Metrics and System Configuration For MDR task the performance of the system was measured using the Area Under the Curve (AUC) of the Receiver Operator Characteristic (ROC). The ROC curve is created by plotting the true positive rate against the false positive rate. Cohen’s Kappa coefficient (Kappa) is used for measuring the system performance of TBT task. Kappa statistic is measure inter-rater agreement for qualitative (categorical) items. The performance of the system of SVR task was used Root Means Square Error (RMSE). RMSE presents the sample standard deviation of the differences between observed values and predicted values. To evaluate the system, a stratified k-fold cross-validation approach was used. The value of k was chosen experimentally. Regarding resources, all experiments were carried out using MATLAB 2017b software and Weka 3.8.1 toolkit [6] in a system with 3.5 GHz CPU, 8 GB RAM. 3.3 Top-5 Submitted Runs for each Subtask From the total 30 runs allowed for submission for the ImageCLEF 2018 Tuber- culosis classification challenge (10 runs for each task), 23 runs were uploaded. The runs were selected based on best accuracy, AUC, Kappa coefficient and RMSE measures (calculated with a cross-validation procedure over the training dataset), with the machine learning algorithms fine-tuned experimentally. Tables 4, 5 and 6 show the configuration of the best 5 runs for MDR, TBT and SVR tasks, respectively. Table 4. MDR subtask: top five submitted runs. Method Classifier Run Name 3D modeling Sl MDR-Run-06-Mohan 3D modeling Vote (LDA, SMO) MDR-Run-08-Mohan Slice extraction SL MDR-Run-09-Sk 3D modeling + Slice extraction Vote (LDA, SL) MDR-Run-10-Mix Slice extraction LDA MDR-Run-07-Sk Table 5. MDR subtask: top five submitted runs. Method Classifier Run Name 3D modeling RF TBT-Run-02-Mohan 3D modeling RF TBT-Run-05-Mohan 3D modeling RF TBT-Run-03-Mohan 3D modeling + Slice extraction RF TBT-Run-06-Mix 3D modeling Vote (RF, LMT) TBT-Run-04-Mohan Table 6. SVR subtask: top five submitted runs. Method Classifier Run Name 3D modeling MLP SVR-Run-07-Mohan 3D modeling MLP SVR-Run-03-Mohan 3D modeling Vote(MLP,Sl) SVR-Run-06-Mohan 3D modeling RF SVR-Run-02-Mix 3D modeling RF SVR-Run-05-Mohan 3.4 Results on Test Set Tables 7, 8 and 9 shows the performance measures and rank for the best 3 submitted runs for MDR, TBT and SVR subtasks, respectively. As can be seen, very competitive results were achieved for MDR subtask and for TBT subtask positions 5 and 7 of the top 10 were obtained. For SVR subtask a 3rd place was reached. Also, the best run over the test set was also the best obtained for the train- ing set. For the MDR subtask, it uses the 3D modeling approach (Section 2.2) with patient personal data and the Simple Logistic (SL) classifier algorithm. For TBT subtask, it uses the 3D modeling approach (Section 2.2) with the ma- chine learning Random Forest (RF) algorithm (with the following parameters: numFeatures=20, numIterations=1500 and seed=20) and for the SVR subtask, the 3D modeling approach with the multi-layer perception algorithm (MLP) with parameter trainingTime=100. Table 7. MDR subtask: performance on the test set based on the Area Under the Curve (AUC) Run AUC ACC Rank MDR-Run-06-Mohan 0.6178 0.5593 1 MDR-Run-08-Mohan 0.6065 0.5424 3 MDR-Run-09-Sk 0.5921 0.5763 4 Table 8. TBT subtask: performance on the test set based on Cohen’s Kappa coefficient (Kappa) Run Kappa ACC Rank TBT-Run-02-Mohan 0.1664 0.3785 5 TBT-Run-05-Mohan 0.1621 0.3754 7 TBT-Run-03-Mohan 0.1335 0.3502 14 Table 9. SVR subtask: performance on the test set based on Root Mean Square Error (RMSE) Run RMSE ACC Rank SVR-Run-07-Mohan 0.8883 0.6239 3 SVR-Run-03-Mohan 1.0091 0.6371 17 SVR-Run-06-Mohan 1.0536 0.6356 21 4 Conclusion This work presents texture analysis based approaches to categorize different TB images for 3 different problems: multi-drug resistance (MDR) detection, tuberculosis type (TBT) classification and severity scoring (SVR). Two different texture analyses, one based on 3D modeling and another based on slice extraction were proposed. Their individual and combined performances were tested using different machine learning classifiers. Though in terms of the accuracy both approaches are very competitive, using the TB tasks ImageCLEF 2018 performance measures (AUC for MDR, Kappa for TBT and RMSE for SVR subtasks), using 3D modeling features seems more promising. In future, we will use the patient clinical information to improve the overall performance of three tasks. 5 Acknowledgment The authors thank the LEADER and gLINK Erasmus Mundus projects for supporting this research. References 1. M.-C. Asselin, J. P. O’Connor, R. Boellaard, N. A. Thacker, and A. Jackson. Quantifying heterogeneity in human tumours using mri and pet. European journal of cancer, 48(4):447–455, 2012. 2. J. Baish and R. Jain. Fractals and cancer. Cancer Research, 60(14):104, 2000. 3. Y. Dicente Cid, O. A. Jiménez del Toro, A. Depeursinge, and H. Müller. Effi- cient and fully automatic segmentation of the lungs in ct volumes. In O. Goksel, O. A. Jiménez del Toro, A. Foncubierta-Rodrı́guez, and H. Müller, editors, Pro- ceedings of the VISCERAL Anatomy Grand Challenge at the 2015 IEEE ISBI, CEUR Workshop Proceedings, pages 31–35. CEUR-WS, May 2015. 4. Y. Dicente Cid, V. Liauchuk, V. Kovalev, , and H. Müller. Overview of Im- ageCLEFtuberculosis 2018 - detecting multi-drug resistance, classifying tubercu- losis type, and assessing severity score. In CLEF2018 Working Notes, CEUR Workshop Proceedings, Avignon, France, September 10-14 2018. CEUR-WS.org . 5. B. G and P. F. Fractals and pathology. Journal of Biostatistics and Biometric Applications, 1(1):104, 2015. 6. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA data mining software: an update. SIGKDD Explorations, 11(1):10– 18, 2009. 7. R. M. Haralick, K. Shanmugam, et al. Textural features for image classification. IEEE Transactions on systems, man, and cybernetics, SMC-3(6):610–621, 1973. 8. B. Ionescu, H. Müller, M. Villegas, A. G. S. de Herrera, C. Eickhoff, V. Andrea- rczyk, Y. D. Cid, V. Liauchuk, V. Kovalev, S. A. Hasan, Y. Ling, O. Farri, J. Liu, M. Lungren, D.-T. Dang-Nguyen, L. Piras, M. Riegler, L. Zhou, M. Lux, and C. Gurrin. Overview of ImageCLEF 2018: Challenges, datasets and evaluation. In Experimental IR Meets Multilinguality, Multimodality, and Interaction, Pro- ceedings of the Ninth International Conference of the CLEF Association (CLEF 2018), Avignon, France, September 10-14 2018. LNCS Lecture Notes in Computer Science, Springer. 9. V. Liauchuk and V. Kovalev. Imageclef 2017: Supervoxels and co-occurrence for tuberculosis ct image classification. In CLEF2017 Working Notes. CEUR Work- shop Proceedings, Dublin, Ireland, CEUR-WS. org¡ http://ceur-ws. org¿(September 11-14 2017), 2017. 10. A. Materka, M. Strzelecki, et al. Texture analysis methods–a review. Technical university of lodz, institute of electronics, COST B11 report, Brussels, pages 9–11, 1998. 11. H. Müller, P. Clough, T. Deselaers, B. Caputo, and I. CLEF. Experimental evalu- ation in visual information retrieval. The Information Retrieval Series, 32:1–554, 2010. 12. W. H. Organization et al. Global tuberculosis report 2016. World Health Organi- zation, 2016. 13. S. Peleg, J. Naor, R. Hartley, and D. Avnir. Multiple resolution texture analysis and classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(4):518–523, 1984. 14. X. Zhang, J. Cui, W. Wang, and C. Lin. A study for texture feature extraction of high-resolution satellite images based on a direction measure and gray level co-occurrence matrix fusion algorithm. Sensors, 17(7):1474, 2017.