1. Introduction

Multi-Classification Study of the Tuberculosis with 3D CBAM-ResNet and EficientNet

Xing Lu

lvxingvir@gmail.com 1

Eric Y Chang

e8chang@health.ucsd.edu 0 1

Chun-Nan Hsu

0 1

Jiang Du

jiangdu@health.ucsd.edu 1

Amilcare Gentili

agentili@ucsd.edu 0 1 0 San Diego VA Health Care System , San Diego, CA , USA 1 University of California , San Diego, CA , USA

The detection and characterization of tuberculosis along with the evaluation of tuberculosis lesion characteristics are challenging. To provide a solution for this multi-classification task, we performed a deep learning study that relied on the use of 3D Resnet and EficientNet. With proper application of the provided masks, lung images were cropped, masked, and rearranged with diferent windowing. Stratified sampling for train/validation split and a balanced sampler in each batch sampler during training were used to address the data imbalance problem. A convolutional block attention model (CBAM) was used to add an attention mechanism in each block of the Resnet to further improve the performance of the convolutional neural network (CNN).

eol>Tuberculosis Computed Tomography Image Classification Tuberculosis Type CBAM Eficient Net

1. Introduction 2. Methods

2.1. Data The datasets provided for the tuberculosis task training set contained a total of 917 patients, with labeling provided for five categories. To avoid bias in the training and validation cohorts, a balanced train/validation strategy was employed to split each class according to an 8:2 ratio, as shown in Figure 1. Figure 1(a) shows the results of a random train/validation split, whereas Figure 1(b) shows the balanced split. As can be seen in Figure 1(a), in the fibro-cavernous class, there were only a few examples in the validation dataset, whereas the balanced split shown in Figure 1(b) resulted in each class having similar validation and training dataset.

2.2. Preprocessing

The preprocessing of the images for the deep learning model is shown in Figure 2. The images for the ImageCLEF TB task were provided as NIFTI 3D datasets. Two versions of lung segmentation masks were also provided. The first version of segmentation (denoted as Mask-1) provided more accurate masks, containing individual masks for left and right laterality (values equal 1 for left and 2 for right), but in the most severe TB cases, there was a tendency to miss large abnormal regions in the lungs [ 3 ]. On the other hand, the second segmentation (denoted as Mask-2) provided less precise boundaries, given that it contained the entire lung area (i.e., both left and right sides of the lung), but was more stable in terms of including lesion areas [ 4 ]. As there was no need to locate lesions in terms of lung side, only Mask-2 was used in this study.

As shown in Fig. 2, the original NIFTI-formatted dataset was transformed into image data by first applying the NiBabel package. Next, the reformatted images were adjusted to three diferent window levels, namely baseline, lung, and soft tissue, and then normalized. For baseline window level, the foreground was obtained via the Otsu thresholding algorithm provided in the openCV package; for lung and soft tissue, the image levels were set as [-600,1500] and [50,350], respectively. Then, images were normalized to [ 0,1 ] with their mean and std values. Finally, all three windows and levels of data were saved, and annotation files were rearranged for use in further training.

2.3. Network and Training

In this study, a 3D convolutional block attention module (CBAM)-Resnet and a 3D EficientNet were employed to train the model for 5-class classification based on the PyTorch framework. Similar to our last year’s work [ 5 ], a standard 3D-resnet34 [ 6 ] was used as the convolutional neural network (CNN) backbone, with three fc layers as the classiefir. CBAM [ 7 ]was used to implement channel and spatial attention mechanisms for each block of the Resnet, and sigmoid was used as the activation function for binary classification. according to our computing resources, EficientNet B5 was the optimal 3D EficientNet for training [ 8 ].

To train the neural networks, we used a workstation with 4 Nvidia GTX 1080 Ti video cards, 128 GB RAM, and a 1 TB solid state drive. During the training process, to avoid overfitting, image augmentation and a balanced sampler were implemented in each batch. For the image augmentation, traditional data augmentation methods, including brightness, shear, scale, and lfip, were applied. The balanced sampler strategy, which equalized the data sampled from all ifve classes for each batch, was adopted during the training process.

2.4. Experiments and Model Selection

Three experiments were conducted during model training. For 3D Resnet, 20 datasets that fed into the network were dynamically generated from saved metadata with diferent window levels as a single channel and were interpolated into two kinds of data size: 3× 64× 256× 256 and 3× 16× 384× 384. For 3D EficientNet, only 3 × 64× 256× 256 was used. For each experiment, 60 epochs with a cosine annealing warm-up learning rate were performed to train the model. To find the best model for each experiment during training, epochs with either minimum loss or highest accuracy were selected and saved for further submission.

3. Results and Submissions

The provided TST dataset included 421 image files for testing. With our preprocessing pipeline, the TST data were cropped according to Mask-2 to generate calibrated image files. After evaluation of the trained model, the results were rearranged according to the requirement and saved as a .txt file to be submitted. As was mentioned in the Methods, we saved six models with diferent metrics for evaluating the TST datasets; their performances are displayed in Table 1.

Per the submitted results, 3D Resnet34 achieved both better accuracy and Kappa than EficientNet B5. For 3D Resnet34, the model with tensor size 3× 64× 256× 256 and a loss-based selection model achieved a superior Kappa result of 0.190 and accuracy of 0.371, with submission name of 154940_loss. We also tried to assemble the results from diferent models into a single submission,137652, but the result was not significantly improved.

4. Discussion and Conclusion

To provide a deep learning solution for a multi-classification task of tuberculosis, we performed experiments using 3D CBAM Resnet and 3D EficientNet as CNN backbones. There were several challenges for this task, such as the severe class imbalance and 3D dimensionality of the CT images, so we tried several techniques to improve the models’ performance. First, we properly applied stratified sampling of each class for the train/validation split to mitigate bias in the training and validation cohorts. Furthermore, a balanced sampler in each batch sampler was used to address the data imbalance problem. Second, CBAM was used to add an attention mechanism to each block of the Resnet to further improve the performance of the CNN. Third, diferent windowings of the CT images were concatenated to further focus the CNN on features of the illness according to a radiologist. Using all the aforementioned techniques, we achieved a kappa of 0.190 in the evaluation of the test dataset and placed third in this competition.

5. Acknowledgments

This work was supported in part by the Ofice of the Assistant Secretary of Defense for Health Afairs through the Accelerating Innovation in Military Medicine Program under Award No. (W81XWH-20-1-0693).

[1]

Ionescu ,

Müller ,

Peteri ,

A. Ben

Abacha ,

Sarrouti ,

Demner-Fushman ,

S. A.

Hasan ,

Kozlovski ,

Liauchuk ,

Dicente ,

Kovalev ,

Pelka , A. G. S. de Herrera , J.

Jacutprakart , C. M.

Friedrich , R.

Berari , A.

Tauteanu , D.

Fichou , P.

Brie , M.

Dogariu , L. D.

Ştefan , M. G.

Constantin , J.

Chamberlain , A.

Campello , A.

Clark , T. A.

Oliver , H.

Moustahfid , A.

Popescu , J.

Deshayes-Chossart , Overview of the ImageCLEF 2021: Multimedia retrieval in medical, nature, internet and social media applications, in: Experimental IR Meets Multilinguality , Multimodality, and Interaction , Proceedings of the 12th International Conference of the CLEF Association (CLEF 2021 ), LNCS Lecture Notes in Computer Science , Springer, Bucharest, Romania, 2021 .

[2]

Kozlovski ,

Liauchuk ,

Dicente Cid ,

Kovalev ,

Müller , Overview of ImageCLEFtuberculosis 2021 - CT-based tuberculosis type classification , in: CLEF2021 Working Notes, CEUR Workshop Proceedings , CEUR-WS.org <http://ceur-ws. org> , Bucharest, Romania, 2021 .

[3]

Dicente Cid ,

O. A.

Jiménez del Toro ,

Depeursinge ,

Müller , Eficient and fully automatic segmentation of the lungs in ct volumes , in: O. Goksel , O. A.

Jiménez del Toro , A.

Foncubierta-Rodríguez , H. Müller (Eds.), Proceedings of the VISCERAL Anatomy Grand Challenge at the 2015 IEEE ISBI, CEUR Workshop Proceedings , CEUR-WS.org <http://ceurws.org>, 2015 , pp. 31 - 35 .

[4]

Liauchuk ,

Kovalev , Imageclef 2017 : Supervoxels and co-occurrence for tuberculosis ct image classification , in: CLEF2017 Working Notes, CEUR Workshop Proceedings , CEURWS.org <http://ceur-ws. org> , Dublin, Ireland, 2017 .

[5]

Lu ,

E. Y.

Chang ,

Liu ,

Hsu ,

Du , A. Gentili, Imageclef2020: Laterality-reduction three-dimensional cbam-resnet with balanced sampler for multi-binary classification of tuberculosis and CT auto reports , in: L. Cappellato , C.

Eickhof , N.

Ferro , A . Névéol (Eds.), Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum , Thessaloniki, Greece, September 22-25 , 2020 , volume 2696 of CEUR Workshop Proceedings, CEUR-WS.org , 2020 . URL: http://ceur-ws. org/ Vol- 2696 /paper_70.pdf.

[6]

He ,

Zhang , S. Ren,

Sun , Deep residual learning for image recognition , 2015 . arXiv: 1512 . 03385 .

[7]

Woo ,

Park , J.-

Lee ,

I. S.

Kweon , Cbam: Convolutional block attention module, 2018 . arXiv: 1807 .06521.

[8]

Tan ,

Q. V.

Le , Eficientnet: Rethinking model scaling for convolutional neural networks , 2020 . arXiv: 1905 .11946.