Automated Classification of Lung Tuberculosis Using 3D Deep Convolutional Neural Networks Sushaanth. Srinivasan1 , Sharvesh. Shankar1 , Nitheesh Kumar. N1 , Sabarivasan. Velayutham1 , Thejas. N1 , Vikash Anand. N1 , Lekshmi. Kalinathan1 and Prabavathy. Balasundaram1 1 Department of CSE, SSN College of Engineering, Rajiv Gandhi Salai, Chennai, Tamil Nadu, India Abstract Automated TB and disease classification is a dire need in these times, as traditional diagnostic procedures are inefficient. Existing literature is focused on TB identification and classification using 2D images. As 3D images contain extra depth information which helps in more accurate modelling of the disease cavern, an approach using the 3D-CNN model has been proposed to classify the type of caverns present in lung CT scans in order to ensure prompt treatment. Keywords Tuberculosis, Deep Learning, 3D CNN Classification, Cavern 1. Introduction Tuberculosis (TB) is a bacterial infection caused by Mycobacterium Tuberculosis. It usually affects the lungs and can then spread to other parts of the body such as the brain and the spine. TB cavern has three classes which represent its property of having thick walls, foci around and the presence of calcification. The traditional diagnostic procedures include skin tests, blood tests, imaging modalities, sputum testing and culture test. The results of a sputum smear requires several days, while the results of a culture needs several weeks. This reduces the diagnostic efficiency and frequently delays the isolation of infectious individuals. These tests have a low sensitivity as well. TB diagnosis, especially in smear-negative patients, can be extremely difficult. If TB is not diagnosed properly, due to its highly transmittable nature, it spreads from one person to the next through the air. Hence, it is a dangerous disease, and if not treated rightly, can be fatal. According to the World Health Organization, a total of 1.6 million people died in the year of 2020 due to Tuberculosis. Existing research presented in Table 1 has been applied on 2D images. However, 3D images provide better insights when compared to 2D images. The training process for 3D images would be time consuming when compared to the training process on 2D images. Hence, the inference CLEF 2022: Conference and Labs of the Evaluation Forum, September 5–8, 2022, Bologna, Italy $ sushaanth19113@cse.ssn.edu.in (Sushaanth. Srinivasan); sharvesh19101@cse.ssn.edu.in (Sharvesh. Shankar); nitheesh2010343@ssn.edu.in (N. Kumar. N); sabarivasan2010624@ssn.edu.in (Sabarivasan. Velayutham); thejas2010679@ssn.edu.in (Thejas. N); vikashanand2010015@ssn.edu.in (V. Anand. N); lekshmik@ssn.edu.in (Lekshmi. Kalinathan); prabavathyb@ssn.edu.in (Prabavathy. Balasundaram) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Table 1 Existing work along with their methodology and performance Existing work Methodology used Performance Automated TB classification using Ensemble classifier using Accuracy - 88.24% ensemble of deep architectures. AlexNet, GoogleNet and ResNet Area under Curve - 0.93 Multimedia Tools and Applications [3] to classify 2D CXR images A simple shallow neural network Simple Neural Network Validation Accuracy- 20% is employed with three layers to based TB Classification. [4] Testing Accuracy - 22.1% classify 3D CT-images Model learns from the pre-trained Tuberculosis (TB) detection system weights of Inception V3 and classifies using deep neural networks. Neural Accuracy - 95.05% the data using support vector machine Computing and Applications [5] (SVM) from the transferred knowledge Accuracy - 80% without Application of a convolutional neural ConvNet model that uses VGG16 applying augmentation, network using transfer learning for to classify 2D CXR images Accuracy - 81.25% with tuberculosis detection. [6] application of augmentation A deep learning approach for the A custom-built CNN architecture classification of TB from NIH Accuracy - 92.5% to classify 2D CXR images CXR dataset. [7] Analysis of Tuberculosis (TB) on SURF Feature Extraction and X-ray Image Using SURF Feature the K-Nearest Neighbor (KNN) Average Accuracy - 73% Extraction and the K-Nearest Neighbor Classification to classify (KNN) Classification Method.[8] 2D X-ray image Diagnosing tuberculosis using deep Deep Convolutional Neural Network Validation Accuracy - 87.1% convolutional neural network. [9] (CNN) to classify 2D CXR images can be made that a model using a 3D dataset is more reliable. Therefore it is crucial to come up with new deep learning solutions that can detect TB based on 3D images and provide higher accuracy. 2. Task and Dataset Description The dataset used for this task is from ImageCLEF 2022[1]. The goal of the Caverns Report task [2] is to predict 3 binary features of caverns namely, Has Thick Walls, Has Foci Around and Has Calcification. A single 3D image is provided for each of the 60 patients. Each 3D image contains around 100 slices of 2D images of 512x512 pixels. All the CT images are stored in NIFTI file format with .nii.gz file extension (g-zipped .nii files). This file format stores raw voxel intensities in Hounsfield units (HU) as well the corresponding image metadata such as image dimensions, voxel size in physical units and slice thickness. Two versions of automatically extracted masks of the lungs were provided for each CT image. This data is available along with the CT images of the patients. The first version of segmentation is able to provide masks in an accurate manner,but tends to miss features in severe TB cases where large abnormal regions of lungs are present. On the contrary, the second segmentation provides rough bounds, but includes lesion areas. Figure 1: 3D-CNN Architecture 3. Techniques Used The 3D-CNN architecture is frequently used for a stack of 2D images, particularly for medical images, as it can assess the positions of defects in the time domain. During the convolution step, the 3D-CNN generates a 3D activation map. This is required for time and volumetric context. To calculate the representation of elements at a low level, a three-dimensional filter is employed for the 3D convolution of the dataset. Convolution and pooling layers connect only to local regions around each input in the CNN, which is a version of the classic neural network. CNNs are sparsely connected to hierarchical representation of the input, allowing them to process images from general forms to edge details. The 3D-CNN is a development of 2D-CNNs that captures discriminating features in both the spatial and temporal dimensions by dividing hierarchical 3D visual information into small cubes rather than 2D patches. Figure 1 illustrates the notion of a 3D convolution. The input 3D image is split into 2D slices. The process of 3D convolution begins with the 2D slices, with xij denoting the jth receptive field on the ith slice. Local receptive fields are constructed on 2D slices to generate 2D features yij , which are convolved along the temporal direction i to generate feature vector yj at the convolution layer. A 3D pooling layer follows the same steps as a 2D pooling layer. However, it takes the maximum or average values at each step. The CNN is ready for hierarchical learning from motifs to edges after repeated convolution and pooling operations until the picture information is adequately compressed. The sequences of signals will be fed into the final hidden layer, where a fully connected neural network will learn the image’s attributes and generate the output as data sequences. Figure 2: Proposed 3D-CNN Architecture 4. Implementation The model was trained on an Intel i7 core CPU, Samsung 1TB SSD and an Nvidia GeForce GTX 2060 Super GPU System. Google Colaboratory was also used which uses a 12GB Nvidia Tesla K80 GPU. The Deep Learning framework Tensorflow was used along with tools such as Anaconda, Jupyter Notebook Environment, CUDA, Numpy, Pandas and Sci-Kit Learn. A TB classification system was built which has been implemented as multiple binary classifiers for each of the three classes; namely Has thick walls, Has foci around, and Has calcification. The procedure for the implementation of the system is given as follows: • The labels were separated by class in order to implement multiple binary classifiers for the three classes. The separated labels were stored in a Numpy array. • The given dataset for the TB Task consists of 3D images. Each 3D image was split into a set of 2D image slices. Each slice was resized to 128x128 pixels using Spline Interpolated Zoom and Inter Cubic Interpolation. • The training images were loaded into Numpy arrays in order to train the model. • The training data was normalized as it calibrates the different pixel intensities into a normal distribution which makes computation efficient and helps the model converge faster. • The normalized data was split into training and testing sets. • During the learning process, the learning rate is automatically reduced if the loss remains the same for a predefined number of epochs. This callback monitors the validation loss and helps prevent stagnation. • If the model shows no improvement in the reduction of validation loss after a predefined number of epochs even after reduction of the learning rate, the training is automatically stopped. • The architecture of the model in as shown in Figure 2. • K-fold cross validation was performed to estimate the prediction of the model on unseen data and is a preventive measure for overfitting. This ensures that the proportion of the feature of interest is the same across the original data, training and the test set, which gives a more accurate estimate of the performance of the model. • After training, the model was made to predict values for the test data. 5. Result and Analysis The model was trained on a small scale dataset of five patients and then upscaled to sixty patient records. The model has attained an accuracy of 75% on training data and 60% on validation data. The area under the ROC curve for the validation data is 0.78 for the class Has Calcification, 0.74 for the class Has Foci Around, and 0.65 for the class Has Thick Walls. For the classes Has Foci Around and Has Calcification, the most of the ROC curve is present towards the left of the line denoting the random classifier but it intersects the line. The validation accuracy is very less compared to the training accuracy and the validation loss is much greater Figure 3: (a) roc-thickwalls (validation data) (b) trainval-acc-loss-thickwalls Figure 4: (a) roc-foci around (validation data) (b) trainval-acc-loss-foci around Figure 5: (a) roc-calcification (validation data) (b) trainval-acc-loss-calcification than the training loss. Therefore it can be inferred from the figures 3, 4 and 5 that the model might be overfitting the data. This is due to the imbalance between the positive and negative binary classes present in the separated datasets for the classes Has Foci Around and Has Calcification. However, for the class Has Thick Walls, the ROC curve is present towards the left side of the line denoting the random classifier as well as does not intersect it. The training and validation accuracy plot lines are close to each other and the same is true for loss values as well. This shows that the model has correctly fitted the data. Thus, it can be inferred from this graph that the model performs well on this class as the positive and negative samples in the dataset for this class are balanced. The following table describes the probability of the three classes predicted by the model. Table 2 Probability of having Thick Walls, Foci Around and Calcification for first 15 patients Image Thickwalls Foci Around Calcification TST_00 0.9978415 0.8283372 0.9964127 TST_01 0.99999547 0.34543616 0.83471966 TST_02 0.999368 0.69200265 0.9884094 TST_03 0.9999981 0.8152172 0.9991033 TST_04 0.999992 0.38898218 0.9299915 TST_05 0.9999901 0.4413950 0.8158201 TST_06 0.9999924 0.73378474 0.9003653 TST_07 0.99952185 0.72178733 0.6093098 TST_08 0.9999995 0.30586368 0.9996055 TST_09 0.9991689 0.98074776 0.9726256 TST_10 0.99997044 0.9738817 0.96510625 TST_11 0.929831 0.5132923 0.99828976 TST_12 0.9966671 0.99855965 0.995698 TST_13 0.9999987 0.839497 0.915082 TST_14 0.99993134 0.93050194 0.79498357 TST_15 0.9987043 0.7850096 0.8628713 6. Conclusions In summary, a Deep 3D Convolutional Neural Network model structure has been proposed to accurately predict the classification of the cavern present in CT Scans of Tuberculosis in the Lung region. In order to increase the predictions of the model, the parameters and the hyperparameters of the 3D-CNN model were tuned. The optimal model resulted in a training accuracy of 75%. The model was tested for the test data of 16 patients and achieved a mean AUC of 0.536. Acknowledgments We thank the CSE department of SSN College of Engineering for letting us utilize the GPU machine extensively to implement this task. References [1] B. Ionescu, H. Müller, R. Peteri, J. Rückert, A. Ben Abacha, A. G. S. de Herrera, C. M. Friedrich, L. Bloch, R. Brüngel, A. Idrissi-Yaghir, H. Schäfer, S. Kozlovski, Y. D. Cid, V. Kovalev, L.-D. Ştefan, M. G. Constantin, M. Dogariu, A. Popescu, J. Deshayes-Chossart, H. Schindler, J. Chamberlain, A. Campello, A. Clark, Overview of the ImageCLEF 2022: Multimedia retrieval in medical, social media and nature applications, in: Experimental IR Meets Multi- linguality, Multimodality, and Interaction, Proceedings of the 13th International Conference of the CLEF Association (CLEF 2022), LNCS Lecture Notes in Computer Science, Springer, Bologna, Italy, 2022. [2] S. Kozlovski, Y. Dicente Cid, V. Kovalev, H. Müller, Overview of ImageCLEF Tuberculosis 2022 - CT-based caverns detection and report, in: CLEF2022 Working Notes, CEUR Workshop Proceedings, CEUR-WS.org , Bologna, Italy, 2022. [3] H. Rahul, A. Mittal, S. Sofat, Automated TB classification using ensemble of deep architectures, in: Multimedia Tools and Applications Journal. 2019, 78(22): 31515-31532. [4] A. Anand, K.R. Anandan, B. Jayaraman, M.T. Thai, Simple Neural Network based TB Classification 2021. [5] R. Dinesh Jackson Samuel, B. Rajesh Kanna, Tuberculosis (TB) detection system using deep neural networks, in: Neural Computing and Applications Journal. 2019, 31(5):1533-1545. [6] M. Ahsan, R. Gomes, A. Denton, Application of a convolutional neural network using transfer learning for tuberculosis detection, in: IEEE International Conference on Electro Information Technology (EIT). 2019, pp. 427-433. [7] S.Z.Y. Zaidi, M.U. Akram, A.Jameel, N.S.Alghamdi, A deep learning approach for the classification of TB from NIH CXR dataset, in: IET Image Processing. 2022, 16(3):787-796. [8] R.A. Rizal, N.O. Purba, L.A. Siregar, K. Sinaga, N. Azizah, Analysis of Tuberculosis (TB) on X-ray Image Using SURF Feature Extraction and the K-Nearest Neighbor (KNN) Classification Method, in: Journal of Applied Information and Communication Technologies (JAICT). 2020, 5(2):9-12. [9] M. Oloko-Oba, S. Viriri, Diagnosing tuberculosis using deep convolutional neural network, in: Proceedings of International Conference on Image and Signal Processing. 2020, pp. 151-161.