-

ImageCLEF 2018 Tuberculosis Task: Ensemble of 3D CNNs with Multiple Inputs for Tuberculosis Type Classi cation

Adam Ishay

Oge Marques

omarquesg@fau.edu 0 0 Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University , 33431 Boca Raton FL , USA

Convolutional neural networks have achieved state-of-theart results in general image classi cation tasks and have shown success in several applications within the medical imaging domain. In this paper, we apply a 3D convolutional neural network (CNN) to a dataset of tuberculosis-positive computed tomography (CT) scans to solve the task of automatically categorizing each tuberculosis (TB) case into one of ve possible TB types in the context of the ImageCLEFtuberculosis 2018 challenge. The size of the volumetric scans poses unique constraints on the network and the training process. The CT volumes are segmented with the provided masks, which are further pre-processed prior to training our model. Our best run ranked 2nd with an unweighted Cohen's Kappa of 0.1736 and an accuracy of 35.33%.

3D-CNN tion Medical Imaging Deep Learning Tuberculosis Image Classi ca-

For the second year, ImageCLEF [ 5 ] has proposed the ImageCLEFtuberculosis 2018 task [ 3 ], in e orts to reduce the time required and cost of medical image analysis. This year there are three subtasks: multi-drug resistance (MDR) detection, tuberculosis type classi cation, and severity scoring. The goal of the MDR task is to predict probabilities of a patient having a drug-resistant form of tuberculosis. The third task, severity scoring, aims at predicting a severity score from 1 (very bad) to 5 (very good). Finally, the task that this paper addresses is the tuberculosis type classi cation. We are tasked with classifying the type of tuberculosis, given a positive image. These types are: (1) In ltrative, (2) Focal, (3) Tuberculoma, (4) Miliary, and (5) Fibro-cavernous.

Deep learning approaches have been shown to be successful on a large variety of computer vision and image analysis tasks [ 7 ]. Deep learning and CNNs in particular have now broadly been applied to medical imaging [ 8 ]. We apply a deep 3D CNN to the medical image dataset for classi cation.

Data Pre-processing

The training set provided by the ImageCLEF organizers consisted of patient chest CT scans of ve di erent types of TB along with their labels. Often patients had multiple scans and all scans of the same patient were of the same type. There were 228, 210, 100, 79, and 60 patients belonging to In ltrative, Focal, Tuberculoma, Miliary, and Fibro-cavernous types, respectively. The dataset totaled 677 patients with 1008 scans (Table 1). Each scan consists of approximately 100 512 512 slices. The depth of each scan varies and was changed to a constant number of slices.

The pre-processing stage consisted of 7 steps (Figure 1). The supplied masks [ 4 ] were applied to the original scans to segment the lungs. The distance between slices along with the resolution of each slice varied among scans. For easier training, the images were resampled to an isotropic resolution of 1 1 1 mm. After this step, the scans were roughly 300 300 300 in size. All scans were then cut to remove the excess zeros in the background from the mask. The voxel values were clipped between -1000 and 400, and normalized between 0 and 1. The values outside of this range are not useful. Then, the largest lungs were used to nd the new width and height to which all images would be padded, with the common background voxel value in the scans. The scans were also padded in the depth dimension to the depth of the largest scan. Next, the mean pixel was calculated and subtracted from all scans to zero-center the data for better training. Finally, the resulting scans were resized to reduce the data to a more reasonable size for the network. Using this process, two datasets of di erent sized images were created (see Figure 2). The purpose of this was to combine two di erent networks to predict the label. The batch sizes used are a function of the size of the input and the architecture of the network. In the networks used, most of the memory consumption was due to the rst few layers of the network, since in these layers the images were still large.

The two datasets each had a respective train/validation split of 80/20. Initially this split was done by scan and validation accuracy was relatively high. However, when submitting results on the test set the accuracy was much lower and close to random. This was thought to be at least partially due to the method for splitting. Because some patients had multiple scans, there were scans in the train and validation set from the same patient. Upon visual inspection, scans from the same patient were indeed similar (see Figure 3). The trained models were 3D convolutional neural networks using the software library Keras [ 2 ] with Tensor ow [ 1 ] backend. We opted for 3D convolutions because they naturally capture the 3D nature of the scans. We trained two networks, one for each dataset created in the pre-processing stage. The combination of the two networks achieved better results than either of them alone. To alleviate the class imbalance problem shown in Table 1, oversampling was used during the training phase. Classes three, four, and ve were oversampled to approximately match the test distribution. This meant that a full epoch of training was reached when roughly 900 patients (677 + 200) were processed by the network.

Each network (Figure 4) had ve convolution layers with recti ed linear unit (ReLU) activations, each with a following batch normalization and max pooling with dropout layer. These led to two fully connected layers, each with batch normalization and dropout. Finally, these activations went through a softmax layer, which output a tensor of size ve, for each category. Categorical crossentropy was used as the loss function, and Adam [ 6 ] was used for optimization.

One of the restrictive parts of training this model was the batch size. Such small batches make it harder to converge.

Our most successful model was the combination of the two best models, which had inputs of di erent size image volumes. The outcome of this ensemble were 5 probabilities, one for each class. The probabilities were summed across the two models, and then this vector was iteratively scaled by a weight vector which was calculated from the class distribution. This resulted in output labels which more closely matched the data distribution. This combination of networks shown in Figure 4 was used for predicting the test labels. 4

Results

Only runs for the tuberculosis type subtask 2 were submitted. Our initial submissions accuracies were barely better than random chance ( 28%). After combining models and weighting probabilities, the accuracy and kappa score did improve. Our best run (indicated in bold in Table 2) ranked second in unweighted kappa coe cient, but tenth in accuracy. 5

Conclusion

This paper applies a 3D CNN to pre-processed CT scans of the lungs. The question of whether a CNN can extract the information necessary for labeling types of TB remains open. Making predictions on image data alone has proved a challenging problem. The large size of the images and small size and class imbalance of the datasets are characteristic of medical imaging tasks. In this analysis, batch sizes were restricted to sizes of ve and fourteen samples for the two networks used. A feasible way of e ectively training with a much larger batch size is to accumulate the gradients of each batch and only update the weights of the network after storing a su cient number of batches gradients. The average of the gradients for each batch can be used to update the weights. This allows for e ectively training on larger batch sizes, circumventing memory problems.

1. Abadi , M. , Agarwal , A. , Barham , P. , Brevdo , E. , Chen , Z. , Citro , C. , Corrado , G.S. , Davis , A. , Dean , J. , Devin , M. , Ghemawat , S. , Goodfellow , I. , Harp , A. , Irving , G. , Isard , M. , Jia , Y. , Jozefowicz , R. , Kaiser , L. , Kudlur , M. , Levenberg , J. , Mane , D. , Monga , R. , Moore , S. , Murray , D. , Olah , C. , Schuster , M. , Shlens , J. , Steiner , B. , Sutskever , I. , Talwar , K. , Tucker , P. , Vanhoucke , V. , Vasudevan , V. , Viegas , F. , Vinyals , O. , Warden , P. , Wattenberg , M. , Wicke , M. , Yu , Y. , Zheng , X. : TensorFlow: Large-scale machine learning on heterogeneous systems ( 2015 ), https://www.tensor ow.org/, software available from tensor ow.org

2. Chollet , F. , et al.: Keras. https://keras.io ( 2015 )

Dicente

Cid , Y. , Liauchuk , V. , Kovalev , V. , , Muller, H.: Overview of ImageCLEFtuberculosis 2018 - detecting multi-drug resistance, classifying tuberculosis type, and assessing severity score . In: CLEF2018 Working Notes. CEUR Workshop Proceedings , CEUR-WS.org <http://ceur-ws. org> , Avignon, France (September 10-14 2018 )

Dicente

Cid , Y. , Jimenez del Toro , O.A. , Depeursinge , A. , Muller, H.: E cient and fully automatic segmentation of the lungs in ct volumes . In: Goksel, O. , Jimenez del Toro , O.A. , Foncubierta-Rodr guez , A., Muller, H. (eds.) Proceedings of the VISCERAL Anatomy Grand Challenge at the 2015 IEEE ISBI . pp. 31 { 35 . CEUR Workshop Proceedings, CEUR-WS (May 2015 )

5. Ionescu , B. , Muller, H., Villegas , M., de Herrera , A.G.S. , Eickho , C. , Andrearczyk , V. , Cid , Y.D. , Liauchuk , V. , Kovalev , V. , Hasan , S.A. , Ling , Y. , Farri , O. , Liu , J. , Lungren , M. , Dang-Nguyen , D.T. , Piras , L. , Riegler , M. , Zhou , L. , Lux , M. , Gurrin , C. : Overview of ImageCLEF 2018: Challenges, datasets and evaluation. In: Experimental IR Meets Multilinguality, Multimodality, and Interaction . Proceedings of the Ninth International Conference of the CLEF Association (CLEF 2018 ), LNCS Lecture Notes in Computer Science , Springer, Avignon, France (September 10-14 2018 )

6. Kingma , D.P. , Ba , J.: Adam: A method for stochastic optimization . arXiv preprint arXiv:1412.6980 ( 2014 )

7. LeCun , Y., Bengio , Y. , Hinton , G.: Deep learning . Nature 521 ( 7553 ), 436 ( 2015 )

8. Litjens , G. , Kooi , T. , Bejnordi , B.E. , Setio , A.A.A. , Ciompi , F. , Ghafoorian , M., van der Laak , J.A., van Ginneken , B. , Sanchez , C.I.: A survey on deep learning in medical image analysis . Medical image analysis 42 , 60 { 88 ( 2017 )