<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Identifying tuberculosis type in CTs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Cosmin Moisii</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Radu Miron</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mihaela Elena Breaban</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Computer Science, "Alexandru Ioan Cuza" University of Iasi</institution>
          ,
          <country country="RO">Romania</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>SenticLab</institution>
          ,
          <addr-line>Iasi</addr-line>
          ,
          <country country="RO">Romania</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The paper proposes and compares two distinct approaches based on deep learning for tuberculosis classiifcation in CTs, highlighting the benefits of building the inference engine at slice-level over a volumetric approach. The methods are evaluated in the context of the ImageClef 2021 Tuberculosis task and the reported work belongs to the SenticLab.UAIC team, which ranked the first in the competition.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>According to the World Health Organization1, tuberculosis (TB) is one of the top 10 causes
of death worldwide and the leading cause from a single infectious agent. It is present in all
countries and age groups and was the cause of a total of 1.4 million deaths in 2019, with an
estimate of 10 million people infected worldwide. Generally, TB can be cured with antibiotics.
An estimated 60 million lives were saved through TB diagnosis and treatment between 2000
and 2019. However, the diferent types of TB require diferent treatments, and therefore the
detection of the TB type and characteristics are important real-world tasks.</p>
      <p>
        In this regard, the 2021 edition of the Tuberculosis task within ImageCLEFmed [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] aimed
at automatically categorizing CTs of TB patients into one of the following five types: (1)
Infiltrative, (2) Focal, (3) Tuberculoma, (4) Miliary, (5) Fibro-cavernous. The current paper
reports the approaches developed by the SenticLab.UAIC team obtaining the best results in the
competition2.
      </p>
      <p>
        Given the 3-dimensional nature of the CTs, several ways to tackle the classification problem
in terms of input type exist. In previous work [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], we compared three diferent strategies: 1)
compressing the 3D matrix to 2D representations by computing projections onto 3 distinct
planes, 2) treating the 3D volume as a whole and thus using 3D convolutions or fusing the
information at slice level and 3) bringing the inference process to the slice level. The 3rd
approach proved to outperform significantly the others in terms of accuracy, at a higher cost of
data preparation and much less computational burden compared to a 3D approach; training
data preparation involves in this case identifying slices in the CT that present the afection
specified as label for the entire CT.
      </p>
      <p>The current work experiments with two of the approaches above, the 3rd approach proving
again to be the winner solution in the competition. Another key component of the winner
solution was the aggregation step of the inference results obtained at slice level, of great
importance especially for the objective of the 2021 evaluation task, where only one label had to
be output per CT, although our analysis highlighted the existence of several afections for some
CTs.</p>
      <p>The paper is structured as follows. Section 2 describes the challenge and the dataset. Section 3
presents the approach we used to exploit the whole volumetric information. Section 4 describes
the architectures used to process the information at slice level and the heuristics used to produce
the diagnosis report at the CT level. Section 5 summarizes the results obtained on the blind test
set in the competition and discusses comparatively the performance of the methods. Section 6
concludes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. ImageCLEFmed Tuberculosis 2021: tasks, data, evaluation</title>
      <p>The challenge in the 2021 ImageCLEFmed Tuberculosis competition is the automatic
classification of CTs into one of 5 TB types - illustrated in Figure 1.</p>
      <p>The training dataset consists of chest CT scans of 917 TB patients, each CT scan being
categorized in only one TB class. The test set consists of 421 CT scans. Part of the training data
also has some additional metadata, but because such information was not available for the test
data, we did not include it in the analysis.</p>
      <p>The resolution is 512x512 with a variable number of slices - 580 at maximum (illustrated in
Figure 2) and various spacing, the slice thickness varying from 0.6 to 5 mm with a median at 2.5
mm.</p>
      <p>The distribution of classes is imbalanced, as illustrated in Figure 3.</p>
      <p>The metrics used to measure the performance of the algorithms are Cohen’s Kappa and
accuracy, the former being used to rank the entries in the competition.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Learning from volumetric data</title>
      <p>
        2D convolutional neural networks have been very successful in a wide range of 2D image vision
tasks from classification to object detection and segmentation. Ever since the appearance of
Alex-Net [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], state of the art results have been obtained on several benchmarks.
      </p>
      <p>
        For these reasons we chose to work with convolutions for this competition. Although
impressive results have been obtained on 2D image tasks using 2D convolutions, 3D convolutions
still have to emerge as de facto architecture for 3D image tasks. Since the convolution operation
is a local one, searching for features in the neighbourhood of a pixel and tuberculosis type might
be influenced by several pathologies found in diferent and distant slices of the same patient,
we choose as our main model 3D ResNet with Non Local Features [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. ResNets [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] have become
popular due to their residual connections which prevent failure when training very deep neural
networks.
      </p>
      <p>
        Pretraining has had a significant role in increasing the performances of convolutional neural
networks. We chose to use pretrained 3D ResNet50 with Non Local features on Kinetics. Due to
good results mentioned in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] we chose Inflated 3D architecture with weights pretrained on
ImageNet 1.
      </p>
      <p>We converted each volume to a sequence of images, each image representing the RGB
representation of each slice. We chose window width equal to 1500 and window level equal
to -600 for the whole volume and stored only 8bit of information for each slice, thus obtaining
an [0, 255] range 1 channel image. In order to take advantage of the pretrained models which
require 3 channel images as input, we simply duplicated the first channel over the second and
the third channel. We chose to resize each image slice to 128 × 128 and 256 × 256 pixels. Each
input image is a  ×  × 120 part of the whole volume, where  is the pixel size of an image.
Padding with 0 filled slices is done if necessary.</p>
      <p>We used 2 training phases for the final model. The first phase uses as input slices with
128 × 128 dimensions and is trained with no augmentations techniques. The total number of
epochs is 100. As loss function we use Cross Entropy. Initial learning rate is 1− 3. Learning rate
scheduler is Linear Scheduler with a decreasing factor of 0.5 each 20 epochs. We use Stochastic
Gradient Descent as optimizer with a weight decay of 1 − 6. We call this the 
phase.</p>
      <p>The second phase is the proper training. This time we use 256 × 256 images as input. We use
as augmentations Horizontal and Vertical Flips, Contrast and Color distortions and Gaussian
Blur as well, each with a 0.5 chance. With 0.5 chance we also invert the volume. The total
number of epochs is 100. Initial learning rate is 1 − 3 with a decrease factor of 0.5 each 15
epochs. The loss used is Cross Entropy with label smoothing in order to avoid overfitting.
Each volume was normalized using ImageNet mean and standard deviation. We use Stochastic
Gradient Descent as optimizer with a weight decay of 1 − 6. We use as initial weights the final
weights obtained by previous  phase in order to not start from scratch. For this
approach we did not use the masks for lung segmentation provided by the competition, nor any
other method to segment the lungs. The entire CT was used as is.</p>
      <p>We use test time augmentations. We perform for each image 6 inference steps. One step
with the original image and another step with the reversed image. For the other 4 steps we use
random augmentations that we used during the training phase. We used the last model saved
during training for inference and also the last 10 models saved during training as an ensemble,
leading to a total of 66 predictions per CT. We use diferent techniques for aggregating the results
of the ensemble and diferent test time augmentations. The first method is to pick as final label
the most frequent label predicted (FS 3DNLR50). In case of frequency equality the prediction
with the highest score is chosen. The second method is based on the mean of the scores for
each label (MS 3DNLR50). The label with the highest mean is picked. The third method uses
as final label the one with the highest score predicted among all predictions ( HS 3DNLR50).
The second method obtains the best performance. The single model inference consisting of
applying the model in the last epoch, with no test-time augmentation, gives the poorest results
(S 3DNLR50). We believe this is due to the fact that a single volume can’t always present
pathologies for a single type of Tuberculosis. This is hinted by the instability of the performance
metrics (loss and accuracy) computed on the training set, even on the final epochs with a small
learning rate. Label smoothing also prevents overfitting acting as a strong regularizer. Training
on the 128 × 128 images without label smoothing reaches almost perfect accuracy after 150
1https://github.com/facebookresearch/video-nonlocal-netmain-results
epochs, whereas training with label smoothing reaches less than 80% accurcy.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Learning at slice level</title>
      <p>Having a closer look at the training set, one can observe that usually the lesions on the lungs
are located only on a small number of slices from the whole volume. A natural idea is to try a
2D model that could diferentiate between healthy lung slices and lung slices with lesions and
construct the CT report based on the findings at slice level. For this purpose we need training
data labeled at slice level and not CT level.</p>
      <p>We manually selected slices, at lung level, that we thought were relevant for the respective
label. This means we carefully selected only the slices that contained the representative label
even though, to our opinion confirmed later by a radiologist, that CT contained pathologies
corresponding to other labels as well. The selection was made by us, briefly trained by small
descriptions we found on the internet. We strived to make a balanced dataset, but due to the
nature of the pathology some were easier to gather than others. Also due to the easiness of
selecting healthy slices we took the opportunity to make a large healthy slices set.</p>
      <p>
        Using the same architecture as last year [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Eficient-net B4 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], we trained a classifier with
6 labels (the initial 5 tuberculosis types + healthy class). We then aggregated the predictions
probabilities for each volume and used it as a data point for a diferent simpler classifier (we
tried Multi Layer Perceptron and Logistic Regression) using as output the final volume label.
This way we aggregated all slices into a single label. We further increased the scores by using
test-time augmentations (only horizontal flip) and averaging several second-stage classifier
results that were trained with diferent parameters.
4.1. Training
We used approximately the same approach we used last year at ImageCLEF CTReport challenge
which we will briefly describe here.
      </p>
      <p>Given a selected sliced we grouped it together with the previous and the next slice in the
volume. We changed its window and level values to highlight the lung features. The selected
slices were split into half, corresponding to each lung, and we kept only the side with the
afection. We cropped the images, using a simple threshold method to remove the padding and
kept only the body. The resulting images were resized to 256 × 256 pixels. These were then
normalized with values in the range [0,255] corresponding to 3 black and white images which
were concatenated at channel level. These mini volumes of 3 consecutive slices, we thought,
could better highlight the diference between an infiltration and an artery, or a cavern and a
lumen as these are can be very similar at a certain point in space but continue in a diferent
manners.</p>
      <p>
        As augmentations we used a random crop of size 224 × 224, a random horizontal flip with a
probability of 0.5 and normalized the image. We trained an EficientNet-B4 , with 90 epochs
and batches of 32. We used this network to predict on each slice of a volume in the training
set (processed in the fashion we explained above after the volume was resized to a fixed value
of depth 128) and obtain the probabilities of each afection type. These probabilities were
concatenated into an array of size [
        <xref ref-type="bibr" rid="ref6">128, 6</xref>
        ] corresponding to [no of slices, no of labels]. These
arrays were used as input to train a simpler classifier (for example a logistic regression classifier),
each array corresponding to a label. We did not use any masks nor segmentation algorithms to
extract the lungs.
      </p>
      <p>The first submission ( Ef MLP) using this approach resulted in a kappa score of 0.203 and used
9/10 of the data, the rest 1/10 being used for internal validation. The second stage classifier was
an MLP classifier with 2 hidden layers of size 100 and 30 respectively. No test-time augmentation
was performed. The second submission (Ef MLP LogReg) with a kappa score of 0.221 was a
mean of 4 predictions: one MLP classifier and one LogReg classifier tested on the original and
lfipped images. This submission scored the highest. The rest of the submissions ( Ef comb)
correspond to diferent training parameters and means of scores (second stage classifiers training
on flipped images, means of first stage classifier probabilities on original and flipped images,
etc).</p>
    </sec>
    <sec id="sec-5">
      <title>5. Comparative results</title>
      <p>The winning submission corresponds to a kappa score of 0.221. The low scores obtained
generally in the competition are not a surprise for us, since, during the phase of manual slice
labeling, we identified CTs in the training set presenting more afection types and not only the
labeled one. In our opinion, the task should be framed as a multi-label classification problem,
giving the possibility to report all the afections present. We could not find the rational behind
CT labeling for the cases that present more than one lesion type, and neither could the AI, as
the results indicate.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>In the light of the results we obtained in both the 2020 and the 2021 ImageClef TB evaluation
tasks, we may conclude that an approach based on inference at slice level is superior to those
using the entire volume in such classification tasks. The efort of including in the training
set the needed information in the form of labeled slices, basically reducing to identifying the
sequence of slices presenting a certain afection, is definitely rewarding, not only in terms of
accuracy gain, but also in terms of the inference type, models built in this way being able to
provide more valuable information like localization and size of the afection.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Acknowledgements</title>
      <p>This research was partially supported by the Competitiveness Operational Programme Romania
under project number SMIS 124759 - RaaS-IS (Research as a Service Iasi).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Peteri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Ben</given-names>
            <surname>Abacha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sarrouti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Demner-Fushman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Hasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kozlovski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Liauchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dicente</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pelka</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Jacutprakart</surname>
            ,
            <given-names>C. M.</given-names>
          </string-name>
          <string-name>
            <surname>Friedrich</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Berari</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Tauteanu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Fichou</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Brie</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Dogariu</surname>
            ,
            <given-names>L. D.</given-names>
          </string-name>
          <string-name>
            <surname>Ştefan</surname>
            ,
            <given-names>M. G.</given-names>
          </string-name>
          <string-name>
            <surname>Constantin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Chamberlain</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Campello</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>T. A.</given-names>
          </string-name>
          <string-name>
            <surname>Oliver</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Moustahfid</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Popescu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Deshayes-Chossart</surname>
          </string-name>
          ,
          <article-title>Overview of the ImageCLEF 2021: Multimedia retrieval in medical, nature, internet and social media applications, in: Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and
          <string-name>
            <surname>Interaction</surname>
          </string-name>
          ,
          <source>Proceedings of the 12th International Conference of the CLEF Association (CLEF</source>
          <year>2021</year>
          ),
          <source>LNCS Lecture Notes in Computer Science</source>
          , Springer, Bucharest, Romania,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kozlovski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Liauchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dicente Cid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          , Overview of ImageCLEFtuberculosis 2021 -
          <article-title>CT-based tuberculosis type classification</article-title>
          ,
          <source>in: CLEF2021 Working Notes, CEUR Workshop Proceedings</source>
          , CEUR-WS.org &lt;http://ceur-ws.
          <source>org&gt;</source>
          , Bucharest, Romania,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Miron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Moisii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Breaban</surname>
          </string-name>
          ,
          <article-title>Revealing lung afections from cts. A comparative analysis of various deep learning approaches for dealing with volumetric data</article-title>
          , in: L.
          <string-name>
            <surname>Cappellato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Eickhof</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ferro</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Névéol (Eds.), Working Notes of CLEF 2020 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , Thessaloniki, Greece,
          <source>September 22-25</source>
          ,
          <year>2020</year>
          , volume
          <volume>2696</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2020</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2696</volume>
          /paper_ 105.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>Imagenet classification with deep convolutional neural networks</article-title>
          ,
          <source>in: Advances in neural information processing systems</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>1097</fpage>
          -
          <lpage>1105</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <article-title>Non-local neural networks</article-title>
          ,
          <year>2018</year>
          . arXiv:
          <volume>1711</volume>
          .
          <fpage>07971</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Deep residual learning for image recognition</article-title>
          ,
          <source>CoRR abs/1512</source>
          .03385 (
          <year>2015</year>
          ). URL: http://arxiv.org/abs/1512.03385. arXiv:
          <volume>1512</volume>
          .
          <fpage>03385</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Carreira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          ,
          <article-title>Quo vadis, action recognition? A new model and the kinetics dataset</article-title>
          ,
          <source>CoRR abs/1705</source>
          .07750 (
          <year>2017</year>
          ). URL: http://arxiv.org/abs/1705.07750. arXiv:
          <volume>1705</volume>
          .
          <fpage>07750</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          , Eficientnet:
          <article-title>Rethinking model scaling for convolutional neural networks</article-title>
          , arXiv preprint arXiv:
          <year>1905</year>
          .
          <volume>11946</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>