ImageCLEF 2021: Deep categorizing tuberculosis cases using
normalization and pseudo-color CT image
Tetsuya Asakawa1, Riku Tsuneda2, Kazuki Shimizu3, Takuyuki Komoda4 and Masaki Aono5
1 ,2 ,5
        Toyohashi University of Technology, 1-1 Hibarigaoka Tenpaku, Toyohashi, Aichi, Japan
3, 4
       Toyohashi Heart center, 21-1 Gobutori Tenpaku, Oyama, Toyohashi, Aichi, Japan


                  Abstract
                  The ImageCLEF 2021 Tuberculosis task is an example of a challenging research problem in
                  the ﬁeld of computed tomography (CT) image analysis. The purpose of this study is to make
                  accurate estimates for ﬁve labels (inﬁltrative, focal, tuberculoma, miliary, and fibrocavernous)
                  based on lung images. We describe the tuberculosis task and approach for chest CT image
                  analysis and then perform a single-label CT image analysis using the task dataset. We propose
                  an image processing and ﬁne-tuning deep neural network model that uses inputs from
                  convolutional neural network features. This paper presents several approaches for applying
                  normalization and pseudo-color to the extracted 2D images, for applying mask data to the
                  extracted 2D image data, and for extracting a set of 2D projection images based on the 3D
                  chest CT data. Our submissions for the task test dataset achieved an unweighted Cohen’s kappa
                  of 0.117 and an accuracy of 0.382.

                  Keywords 1
                  Tuberculosis, Deep Learning, Normalization, Pseudo-color

1. Introduction

    With the spread of various diseases (e.g., tuberculosis (TB), COVID-19, and inﬂuenza), medical
research has been performed to develop and implement the necessary treatments for viruses. However,
there is no method currently available to identify such diseases early. An early diagnosis method is
needed to provide the necessary treatment, develop speciﬁc medicines, and prevent the deaths of
patients.
    Accordingly, a signiﬁcant amount of eﬀort has been invested in medical image analysis research in
recent years. In fact, a task dedicated to TB has been adopted as part of the ImageCLEF evaluation
campaign for the ﬁve last years [1][2][3][4][5]. In ImageCLEF 2021 the main task [6],
“ImageCLEFmed Tuberculosis,” is treated as a computed tomography (CT) report. The goal of this
subtask is to automatically categorize each TB case into one of the following ﬁve types: inﬁltrative,
focal, tuberculoma, miliary, or fibrocavernous. Accordingly, the goal of this study is to automatically
categorize the TB type from 3D CT images of TB patients.
    In this paper, we employ a new ﬁne-tuning neural network model that uses features extracted by pre-
trained convolutional neural network (CNN) models as input. The existing CNN model had weak
classiﬁcations; therefore, we propose a new fully connected two layers. The new contributions of this
paper are the proposition of novel feature building techniques, the incorporation of features from the
proposed CNN model, and the use of several forms of pre-processing to predict TB from the images.
In Section 2, we describe the conducted task and the ImageCLEF2021 dataset. In Section 3, we


1
 CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania
EMAIL: asakawa@kde.cs.tut.ac.jp (A. 1); tsuneda@kde.cs.tut.ac.jp (A. 2); shimizu@heart-center.or.jp (A. 3); komoda@heart-center.or.jp
(A. 4); aono@tut.jp (A. 5)
ORCID: 0000-0003-1383-1076 (A. 1); 0000-0002-3063-7489 (A. 2); 0000-0002-8345-7094 (A. 5)
              ©️ 2021 Copyright for this paper by its authors.
              Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
              CEUR Workshop Proceedings (CEUR-WS.org)
introduce the image pre-processing, experimental settings, and features used in this study. In Section 4,
we describe the experiments we performed. In Section 5, we provide our conclusions.

2. ImageCLEF 2021 Dataset

   The TB task of the ImageCLEF 2021 Challenge included partial 3D patient chest CT images [7].
The dataset contained the chest CT scan imaging data, including 917 images for the training
(development) dataset and 421 images for the test dataset. Some of the scans include additional meta-
information, which may vary depending on data availability for diﬀerent cases. Each CT image
corresponds to only one TB type. In this edition, each CT scan corresponds to one patient. Using the
CT image data, our goal is to automatically extract and categorize each TB case into one of the following
ﬁve types: (1) Inﬁltrative, (2) Focal, (3) Tuberculoma, (4) Miliary, (5) fibrocavernous Table 1 lists the
labels for the chest CT scan in the training dataset.

Table 1
Presence of labels for the chest CT scan in the training dataset.
                            Label                   In Training set (number of patients)
                          Inﬁltrative                                420
                             Focal                                   226
                        Tuberculoma                                  101
                            Miliary                                  100
                       ﬁbrocavernous                                  70
                             Total                                   917


3. Proposed Method

   We propose a single-label analysis system to predict the TB type from CT scan images. The ﬁrst
step is input data pre-processing. After introducing our pre-processing of the input data, we describe
our deep neural network model, which enables single-label outputs given the CT scan images. In
addition, optionally in the ﬁrst step, we can use a CT scan movie instead of CT scan images. We detail
our proposed system in the following subsections.

    3.1.        Input data pre-processing

    The 3D CT scans in the training and test datasets are provided in compressed Nifti format. We
decompressed the ﬁles and extracted the slices along the z-axis of the 3D image, as shown in ﬁg. 1. For
each Nifti image, we obtained a number of slices, according to the dimensions, ranging from 110 to 250
images for the z-dimension. After extracting the slices along the z-axis, we ﬁltered the slices of each
patient using mask1 and mask2 data [8][9]. The mask1 data provide more accurate masks but tend to
miss large abnormal regions of the lungs in the most severe TB cases. The mask2 data provide more
rough bounds but behave more stably in terms of including lesion areas. We extracted the ﬁltered CT
scan images. We noticed that all slices contain relevant information, including bone, space, fat, and
skin, in addition to the lungs that could help classify the samples. This is why we added a step to the
ﬁlter and selected a number of slices per patient. We call this data the Applying mask CT data.
    In addition, as shown in ﬁg. 2, we implemented pseudo-color on the normalization mask CT data.
We call this data the normalization mask CT data.
    In addition, as shown in ﬁg. 3, we perform pseudo color for normalization mask CT data. We call
this data pseudo color CT data.
Figure 1: Pre-processing of the input data applying mask data.


Figure 2: Pre-processing of the input data using normalization.
Figure 3: Pre-processing of the input data using pseudo color.

    3.2.        Proposed deep neural network model

  To solve this single-label problem, we propose ﬁne-tuning neural network models that allow inputs
coming from end-to-end CNN features.


Figure 4: Our proposed method for feature extraction.

        3.2.1. Training and Validation sets
    The training dataset consists of 107,955 and 105,494 images extracted from the applying mask1 and
mask2 CT datasets, respectively, for the z-axis.
    We divided the training dataset at random into training and validation datasets with a ratio of 8:2.
The CNN features were extracted using pre-trained CNN-based neural networks, including EﬃcientNet
B05. To deal with the above features, we propose a deep neural network architecture.
    Our system incorporates CNN features, which can be extracted using deep CNNs pre-trained on
ImageNet [10] such as EffcientNet B05[11]. Because of the lack of datasets in visual sentiment analysis,
we adopted transfer learning for the feature extraction to prevent overﬁtting. We decreased the
dimensions of the fully connected layers used in the CNN models. In addition, we extracted the vector
to 2048 dimensions.
        3.2.2. Training and Validation sets and Test data

    We employed the unweighted Cohen’s kappa and accuracy to ﬁne-tune the above CNN model.
    As illustrated in ﬁg. 4, the CNN features are combined and represented by an integrated feature as a
linearly weighted average, where the weights are w3 for the CNN features. The CNN features are passed
through “Fusion” processing to generate the integrated features, followed by a “softmax” activation
function.

    3.3.        Single-label probability

   We propose the method illustrated in Algorithm 1. The input is a collection of features extracted
from each image with K types of diseases, while the output is a K-dimensional hot vector.
   In Algorithm 1, we assume that the extracted CNN features are represented by their probabilities.
For each TB case, we sum the features, followed by the median of the result, which is denoted as T ik in
Algorithm 1. In short, the vector Si represents the output of each hot vector. We repeat this computation
until all the test (unknown) images are processed.


4. Experiments
   4.1.       Unweighted Cohen’s Kappa and Accuracy of training and
        validation sets

   The training dataset consists in Applying mask1 and mask2 CT data, and the normalization mask1
and mask2 CT data. The training dataset consists of 105 494,107 955 images extracted for the mask1
and mask2 CT data respectively.
   Here, we have divided the ﬁltering data into training and validation datasets with a ratio of 8:2. We
determined the following hyper-parameters: the batch size is 256, the optimization function is stochastic
gradient descent with a learning rate of 0.001 and a momentum of 0.9, and the number of epochs is 200.
For the implementation, we employed Tensorﬂow[12] as our deep learning framework.
   For the evaluation of the single-label classiﬁcation, we employed the un-weighted Cohen’s kappa
and the accuracy. Table 2 shows the results. ﬁnally, we employed EfficientNet B05 for the training and
validation datasets and the test data. The results are given in Section (4.2).

Table 2
Unweighted Cohen’s Kappa and Accuracy of training and validation sets for ﬁne-tuning EfficientNet
B05.
    Mask                    Pre-processing                  Unweighted Cohen’s        Accuracy
                                                                   Kappa
                            applying mask                           0.213               0.443
   mask1           applying mask and normalization                  0.199               0.443
              applying mask, normalization, pseudo color            0.215               0.475
                            applying mask                           0.215               0.495
   mask2           applying mask and normalization                  0.244               0.489
              applying mask, normalization, pseudo color            0.183               0.448
Table 3
Numbers of images with the ﬁve labels for the chest CT scans in the training dataset.
        Mask                      Pre-processing                    In Training set (number of
                                                                              images)
                                     Inﬁltrative                               49058
                                        Focal                                  25722
        mask1                      Tuberculoma                                 11293
                                       Miliary                                 11692
                                  fibrocavernous                                7729
                                     Inﬁltrative                               50035
                                        Focal                                  26203
        mask2                      Tuberculoma                                 11552
                                       Miliary                                 12030
                                  fibrocavernous                                8135


    4.2.         Results for the training and validation datasets and the test data
           using our proposed model

   The test dataset consisted of 59 835 and 60 758 images extracted from the applying mask1 and
mask2 CT data, respectively, as show in Table 3.
   It is likely that our proposed models will give better results after more advanced data pre-processing
including the use of several types of CT images and data augmentation. Here as described above, we
employed ﬁne-tuning CNN models in EfficientNet B05 based on several pre-processing methods.
   Table 4 shows the results. Here, we compare the results in terms of the unweighted Cohen’s kappa
and the accuracy. For mask1 and normalization on fine-tuning EfficientNet B05, our proposed CNN
model has good values of un-weighted Cohen’s kappa and accuracy.
   In addition, results of the other participants’ submissions with their un-weighted Cohen’s kappa and
accuracy are shown in Table 5. Here, we compare the results in terms of the unweighted Cohen’s kappa
and the accuracy.
   For our team, KDE-lab, our proposed CNN model has the best unweighted Cohen’s kappa and
accuracy.
   The results achieved by our submissions are well ranked compared to those at the top of the list
given in Table 5. Note that several runs in the table belong to the same teams and likely do not diﬀer
signiﬁcantly. In terms of the unweighted Cohen’s kappa, our model ranks 8th. In terms of the accuracy,
our model ranks 7th.

Table 4
Results of experiments for single-label classification.
     Mask                       Pre-processing                    Unweighted Cohen’s        Accuracy
                                                                        Kappa
                                applying mask                           0.016                0. 382
    mask1             applying mask and normalization                   0.117                0.382
                  applying mask, normalization, pseudo color            0.069                0.371
                                applying mask                           0.015                0.372
    mask2              applying mask and normalization                  0.085                0.375
                  applying mask, normalization, pseudo color            0.081                0.373
Table 5
The best participants’ runs submitted for the CTR subtask.
         Group name                 Rank           Unweighted Cohen’s              Accuracy
                                                          Kappa
      SenticLab.UAIC                 1                     0.221                    0.446
         hasibzunair                 2                     0.200                    0.423
        SDVA-UCSD                    3                     0.190                    0.371
    Emad-Aghajanzadeh                4                     0.181                    0.333
     MIDL-NCAI-CUI                   5                     0.140                    0.333
           uaic2021                  6                     0.129                    0.401
         IALab PUC                   7                     0.120                    0.401
           KDE-Lab                   8                     0.117                    0.381
            JBTTM                    9                     0.038                    0.221
          Zhao_Shi_                  10                    0.015                    0.380
          YNUZHOU                    11                    -0.08                    0.385


5. Conclusions
   In this study, we proposed image pre-processing and a CNN model for predicting ﬁve labels
(inﬁltrative, focal, tuberculoma, miliary, and ﬁbrocavernous) from chest CT images. We performed a
lung CT image analysis in which we proposed a deep neural network model that enabled the inputs to
be derived from the CNN features. To predict the ﬁve labels, we introduced a threshold-based single-
label prediction algorithm.
   Speciﬁcally, after training our deep neural network using the pre-processed images, we were able to
predict the categories of the ﬁve types of TB cases from unknown CT scan images. The experimental
results demonstrate that our proposed models out-perform some models in terms of the unweighted
Cohen’s kappa and the accuracy. For the unweighted Cohen’s kappa, our model achieved a good value.
As a consequence, we believe that using normalization to pre-process an image is eﬀective.
   In the future, given an arbitrary X-ray, CT, echo, or magnetic resonance imaging image might be
included the optimal weights for the neural networks. Moreover, we hope our proposed model will
encourage further research into the early detection of diseases (such as TB, COVID-19, and inﬂuenza)
or unknown diseases.

6. Acknowledgment

  A part of this research was carried out with the support of the Grant for Toy-ohashi Heart Center
Smart Hospital Joint Research Course and the Grant for Education and Research in Toyohashi
University of Technology.

7. References

    [1]          Yashin Dicente Cid, Alexander Kalinovsky, Vitali Liauchuk, Vassili Kovalev, , and
        Henning M¨uller. Overview of ImageCLEFtuberculosis 2017 - predicting tubercu-losis type
        and drug resistances. In CLEF2017 Working Notes, CEUR Workshop Proceedings, Dublin,
        Ireland, September 11-14 2017. CEUR-WS.org <http://ceur-ws.org>.
    [2]          Bogdan Ionescu, Henning M¨uller, Mauricio Villegas, Alba Garc´ıa Seco de Herrera,
        Carsten Eickhoﬀ, Vincent Andrearczyk, Yashin Dicente Cid, Vitali Liauchuk, Vas-sili
        Kovalev, Sadid A. Hasan, Yuan Ling, Oladimeji Farri, Joey Liu, Matthew Lun-gren, Duc-Tien
        Dang-Nguyen, Luca Piras, Michael Riegler, Liting Zhou, Mathias Lux, and Cathal Gurrin.
        Overview of ImageCLEF 2018: Challenges, datasets and evaluation. In Experimental IR Meets
    Multilinguality, Multimodality, and Interac-tion, Proceedings of the Ninth International
    Conference of the CLEF Association (CLEF 2018), Avignon, France, September 10-14 2018.
    LNCS Lecture Notes in Computer Science, Springer.
[3]          Bogdan Ionescu, Henning M¨uller, Renaud P´eteri, Yashin Dicente Cid, Vitali Li-
    auchuk, Vassili Kovalev, Dzmitri Klimuk, Aleh Tarasau, Asma Ben Abacha, Sa-did A. Hasan,
    Vivek Datla, Joey Liu, Dina Demner-Fushman, Duc-Tien Dang-Nguyen, Luca Piras, Michael
    Riegler, Minh-Triet Tran, Mathias Lux, Cathal Gur-rin, Obioma Pelka, Christoph M. Friedrich,
    Alba Garc´ıa Seco de Herrera, Narciso Garcia, Ergina Kavallieratou, Carlos Roberto del
    Blanco, Carlos Cuevas Rodr´ıguez, Nikos Vasillopoulos, Konstantinos Karampidis, Jon
    Chamberlain, Adrian Clark, and Antonio Campello. ImageCLEF 2019: Multimedia Retrieval
    in Medicine, Lifel-ogging, Security and Nature: Multimedia Retrieval in Medicine,
    Lifelogging, Secu-rity and Nature. In Experimental IR Meets Multilinguality, Multimodality,
    and Interaction, volume 2380 of Proceedings of the 10th International Conference of the CLEF
    Association (CLEF 2019), Lugano, Switzerland, September 9-12 2019. LNCS Lecture Notes
    in Computer Science, Springer.
[4]          Obioma Pelka, Christoph M Friedrich, Alba Garc´ıa Seco de Herrera, and Hen-ning
    M¨uller. Medical image understanding: Overview of the ImageCLEFmed 2020 concept
    prediction task. In CLEF2020 Working Notes, Workshop Proceedings, Thessaloniki, Greece,
    September 22-25 2020. CEUR-WS.org.
[5]          Serge Kozlovski, Vitali Liauchuk, Yashin Dicente Cid, Aleh Tarasau, Vassili Ko-
    valev, and Henning M¨uller. Overview of ImageCLEFtuberculosis 2021 - automatic CT-based
    report generation. In Overview of ImageCLEF tuberculosis 2021 - CT-based Tuberculosis Type
    Classiﬁcation, CEUR Workshop Proceedings, Bucharest, Romania, September 21-24 2021.
    CEUR-WS.org <http://ceur-ws.org>.
[6]          Bogdan Ionescu, Henning M¨uller, Renaud P´eteri, Asma Ben Abacha, Vivek Datla,
    Sadid A. Hasan, Dina Demner-Fushman, Serge Kozlovski, Vitali Liauchuk, Yashin Dicente
    Cid, Vassili Kovalev, Obioma Pelka, Christoph M. Friedrich, Alba Garc´ıa Seco de Herrera,
    Van-Tu Ninh, Tu-Khiem Le, Liting Zhou, Luca Piras, Michael Riegler, P˚al Halvorsen, Minh-
    Triet Tran, Mathias Lux, Cathal Gurrin, Duc-Tien Dang-Nguyen, Jon Chamberlain, Adrian
    Clark, Antonio Campello, Dim-itri Fichou, Raul Berari, Paul Brie, Mihai Dogariu, Liviu Daniel
    S¸tefan, and Mi-hai Gabriel Constantin. Overview of the ImageCLEF 2020: Multimedia
    Retrieval in Medical, Lifelogging, Nature, and Internet Applications. In Experimental IR Meets
    Multilinguality, Multimodality, and Interaction, volume 12260 of Proceedings of the 11th
    International Conference of the CLEF Association (CLEF 2020), Thessaloniki, Greece,
    September 22-25 2020. LNCS Lecture Notes in Computer Science, Springer.
[7]          Serge Kozlovski, Vitali Liauchuk, Yashin Dicente Cid, Aleh Tarasau, Vassili Ko-
    valev, and Henning M¨uller. Overview of ImageCLEFtuberculosis 2020 - auto-matic CT-based
    report generation. In CLEF2020 Working Notes, CEUR Work-shop Proceedings, Thessaloniki,
    Greece, September 22-25 2020. CEUR-WS.org <http://ceur-ws.org>.
[8]          Yashin Dicente Cid, Oscar Alfonso Jim´enez del Toro, Adrien Depeursinge, and
    Henning M¨uller. Eﬃcient and fully automatic segmentation of the lungs in ct volumes. In
    Orcun Goksel, Oscar Alfonso Jim´enez del Toro, Antonio Foncubierta-Rodr´ıguez, and
    Henning M¨uller, editors, Proceedings of the VISCERAL Anatomy Grand Challenge at the
    2015 IEEE ISBI, CEUR Workshop Proceedings, pages 31–35. CEUR-WS, May 2015.
[9]          Vitali Liauchuk and Vassili Kovalev. Imageclef 2017: Supervoxels and co-occurrence
    for tuberculosis CT image classiﬁcation. In Linda Cappellato, Nicola Ferro, Lorraine Goeuriot,
    and Thomas Mandl, editors, Working Notes of CLEF 2017 - Conference and Labs of the
    Evaluation Forum, Dublin, Ireland, September11-14, 2017, volume 1866 of CEUR Workshop
    Proceedings. CEUR-WS.org, 2017.10.
[10]         Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma,
    Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexan-der C. Berg, and
    Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of
    Computer Vision (IJCV), 115(3):211–252, 2015.
[11]         Mingxing Tan and Quoc V. Le. Eﬃcientnet: Rethinking model scaling for convo-
    lutional neural networks. ICML 2019, 05 2019.
[12]   Google. Tensorﬂow. https://github.com/tensorﬂow.