Tuberculosis detection using optical flow and the
           activity description vector

    Fernando Llopis, Andrés Fuster-Guilló, Juan Ramón Rico-Juan, Jorge
                      Azorı́n-López, and Irene Llopis

University of Alicante. Carretera San Vicente del Raspeig s/n 03690 San Vicente del
                    Raspeig - Alicante, Spain informacio@ua.es
http://www.ua.es {fernando.llopis, jazorin, juanramonrico, fuster}@ua.es,
                                   ilq2@alu.ua.es


      Abstract. Early detection of tuberculosis can save many lives as it re-
      mains one of the leading causes of death, half a century after its discovery.
      The analysis of chest CT scanned images can be a quick and economic
      mechanism for detecting not only the type of tuberculosis, but also dif-
      ferentiating whether or not the disease is multi-drug resistant. These are
      two of the objectives of the ImageClef Tuberculosis task of 2018, and are
      the ones studied by the group of the University of Alicante in this edition.
      We have carried out two work approaches, one based exclusively on the
      use of Deep Learning techniques on a sequence of 2D images extracted
      from a 3D tomography and on a second approach using Optical Flow
      to convert the 3D tomography into a motion representation in order to
      calculate the ADV (a previous descriptor provided by the group). This
      descriptor is able to synthesize the information of a sequence into one
      image. This article presents the experiments carried out and the results
      obtained within the task.

      Keywords: Tuberculosis · Optical Flow · Activity Description · Deep
      Learning.


1   Introduction

ImageClefTuberculosis is one of the tasks of ImageClef 2018 [11]. The Image-
CLEFtuberculosis task 2018 [5] includes three independent subtasks.

1. Subtask 1: MDR detection The goal of this subtask is to assess the prob-
   ability of a TB patient having resistant form of tuberculosis based on the
   analysis of chest CT scan.
2. Subtask 2: TBT classification The goal of this subtask is to automatically
   categorize each TB case into one of the following five types: (1) Infiltrative,
   (2) Focal, (3) Tuberculoma, (4) Miliary, (5) Fibro-cavernous.
3. Subtask 3: Severity scoring This subtask is aimed at assessing TB severity
   score based on chest CT image. The Severity score is a cumulative score of
   severity of TB case assigned by a medical doctor.
    In this first participation our initial objective was to compare two models,
Deep learning and Optical Flow to check their results in task 1. Finally we made
a delivery about task 2 using the second model that had given us better results
in the experimentation of the task 1.
    This document is structured as follows: in sections 2 we present the archi-
tectures of the models used: Deep Learning and Optical Flow. In section 3 we
show the experimetntation done with both models. Section 4 presents the official
results of the experiments and Section 5 summarizes the document and offers a
series of proposals for future work.


2     Our approaches to the solution

2.1    Deep Learning

Deep neural networks have managed to solve problems or increase efficiency in
problems related to image processing [7]. On the one hand, the convolutional
layers manage to extract discriminative characteristics from the images so that
they can be evaluated by subsequent layers [12]. On the other hand, recurrent
neural networks have also evolved in their approach and are mainly used in
sequence analysis [15].
    To address the issue of the first task (resistant form of tuberculosis), 3D chest
images of CT scan are used. In a first stage, these 3D images are transformed
into a sequence of 2D images each one that represent the entry of the neural
networks.


        Input:
      multichannel    Conv1        Pool1         Conv2     Pool2   Dense   Output
        image


 Fig. 1. Basic scheme of a deep convolutional neural network for classification task.


   Different approaches will be proposed to address the problem of tuberculosis
detection:

 1. Convolutional neural network (CNN) with data augmentation: The main
    idea is to use the advantages of convolutional layers for a single multi-channel
    image. In this case, each channel would be a 2D gray image (see Figure 1).
 2. Convolutional layers combined with a recurrent neural network: The nat-
    ural way to combine the advantages of convolutional layers and sequential
               Input                                                  ...


                                                                            Convolutional
                                                  Convolutional
                              Convolutional
          Convolutional


                                                                               layers
                                                     layers
                                 layers
            modules                                                   ...


                                                                               Internal state
                                                     Internal state
                                 Internal state


                                                                                                Dense layer
               LSTM                                                   ...


                                                                                                              Output
                                                    h2
                                                                      ...     hn
                               h1


Fig. 2. Basic scheme of a combination of a convolutional layer with a Long-Short Term
Memory network for classification task.


    treatment is to combine it in networks with multiple inputs per tomography.
    Figure 2 shows a basic scheme about this approach.
 3. Pretrained network and classification: As first approximation to extract fea-
    tures from an image VGG16 deep convolutional neural network [16] is used
    with the ImageNet [12] weights learned (4096 features per image). The main
    idea is concatenate the features of each input image belonging to the same
    tomography to get the final features vector to classify in a classical way.
 4. Pretrained network and classification as a sequence using recurrent neural
    network. In this case, the extraction of features is similar to the one described
    in the previous paragraph and each feature vector would be considered as
    a component of a sequence to be treated by a well knowns recurrent neural
    network called Long-Shot Term Memory (LSTM) [10].


2.2   Optical Flow plus ADV

In this sections we propose a combined method based on optical flow and a
characterization method called ADV, to deal with the classification of chest CT
scan images affected by different types of tuberculosis. The key point of this
method is the interpretation of the set of cross-sectional chest images provided
by CT scan, not as a volume but as a sequence of video images. We can extract
movement descriptors capable of classifying tuberculosis affections by analyzing
deformations or movements produced in these video sequences.
   The concept of optical flow refers to the estimation of displacements of in-
tensity patterns. This concept has been extensively used in computer vision in
Paper CLEF

OF-MIADV. Optical Flow plus MIADV

In this sections we propose a combined method based on optical flow and a characterization
method called MIADV, to deal with the classification of chest CT scan images affected by
different types of tuberculosis. The key point of this method is the interpretation of the set of
cross-sectional chest images provided by CT scan, not as a volume but as a sequence of video
different
images. Weapplication        domains:
              can extract movement       robotcapable
                                    descriptors   or vehicle     navigation,
                                                         of classifying          caraffections
                                                                        tuberculosis  driving, by video
analysing deformations
surveillance     or facialor movements
                             expressionproduced
                                          [6]. In inbiomedical
                                                     these video sequences.
                                                                   context optical flow has been
used     to analyze
 The concept    of opticalorgan    deformations
                           flow refers                  [9,17].
                                        to the estimation          We can find
                                                            of displacements            different
                                                                                 of intensity         methods
                                                                                               patterns.  This   in
the    literature
 concept              to obtainused
           has been extensively      thein optical
                                            computerflow
                                                       vision[3].   One of
                                                              in different      the most
                                                                            application        usedrobot
                                                                                         domains:      method
                                                                                                            or   to
estimate       motioncaratdriving,
 vehicle navigation,           each video
                                      pixelsurveillance
                                                is LucasorKanade          [13]. In(Fortun
                                                              facial expression       this etwork     we will
                                                                                                al. 2015).  In use
Lucas     Kanade
 biomedical   contextmethod
                       optical flowtohas
                                       extract
                                          been usedoptical    floworgan
                                                      to analyse      comparing
                                                                          deformations sequences
                                                                                          (Hata et al.of2000)
                                                                                                          consecu-
 (Xavier
tive      et al. 2012).
       images.          We can find we
                   Nevertheless,       different
                                            needmethods
                                                    not onlyin the
                                                                 to literature
                                                                     estimatetomotion
                                                                                    obtain thebutoptical flow this
                                                                                                     describe
 (Chao et al. 2014). One of the most used method to estimate motion at each pixel is Lucas
motion.
 Kanade   (Patel &to
      In order      Saurahb   2013).motion
                       describe      In this work   we will
                                                there  areuse   Lucas Kanade
                                                             several     methods method
                                                                                      used to extract  optical com-
                                                                                              in different
 flow  comparing    sequences   of consecutive    images.  Nevertheless,    we
puter vision context like human behavior recognition [8]. A successful method   need   not  only  to estimate
 motion but describe this motion.
to describe human behavior based on trajectory analysis is presented in [1]. The
paper
 In order proposes      a description
           to describe motion                vector
                                there are several      calledused
                                                    methods      (ADV       Activity
                                                                     in different        Description
                                                                                  computer    vision contextVector)
 like human
tested         behaviour
          in several        recognition
                        contexts      [2].(Gowsikhaa
                                            In summary,et al. the
                                                               2014).
                                                                    ADV A successful
                                                                             vector method
                                                                                        describes to describe
                                                                                                      the activity
 human
in   imagebehaviour
              sequencebasedby oncounting
                                 trajectory analysis
                                               for eachis presented
                                                           region ofin the(Azorin-Lopez
                                                                               image the   et al.  2013). The pro-
                                                                                               movements
duced in four directions of the 2D space. A detailed description ofin the
 paper   proposes   a description  vector   called (ADV   Activity Description   Vector)   tested     several
                                                                                                            method
 contexts (Azorin-Lopez et al. 2016). In summary, the ADV vector describes the activity in image
can be found in [1]. In this paper we propose the use of ADV to describe mo-
 sequence by counting for each region of the image the movements produced in four directions
tion in the optical flow obtained from sequences of cross-sectional chest images
 of the 2D space. A detailed description of the method can be found in (Azorin-Lopez et al. 2013).
provided       by CT scan.
 In this paper we propose the use of ADV to describe motion in the optical flow obtained from
sequences of cross-sectional chest images provided by CT scan.

     n                                                             n
                                                                                                                               n

                                                 Video image                                                Optical flow
                                                transformation                                             Lucas Kanade


    cross-sectional chest                                           Video sequence                                                     Optical Flow
      CT scan images                                              chest XxYxn images                                               chest 64x64xn images
                                5               f
                                            d                                          5               f
                                        u                                                          d
       Activity                                                                                u
                                    l
                                                                                           l
      Description           r
                                                                                   r
                                                                                                                       Classification
                                                                 Normalization
        Vector                                                                                                          SVM, K-nn
        ADV

                                                                                                                                                 TB Label
                                ADV 3x3x5                                        Normalized ADV 3x3x5


The figure summarizes the  successive
                         Fig.         stagesflow
                              3. Optical      of the process
                                                   plus  ADV  forprocess
                                                                 extractingstages
                                                                              the activity descriptors
(optical flow+ADV) that will be the input of a classifier. In the first stage a transformation over
the cross-sectional chest images provided by the CT scan is performed in order to transform
image
    The formats
           figureinto video sequences
                   summarizes           adapted to calculate
                                  the successive       stages ofoptical    flow. Thefor
                                                                     the process        second   stage the
                                                                                           extracting
implements the Lucas Kanade method to obtain optical Flow. The third stage calculates de
activity descriptors (optical flow+ADV) that will be the input of a classifier. In
the first stage a transformation over the cross-sectional chest images provided
by the CT scan is performed in order to transform image formats into video
sequences adapted to calculate optical flow. The second stage implements the
Lucas Kanade method to obtain optical Flow. The third stage calculates the
activity description vector ADV (3x3x5) accumulating within each 3x3 region
of the image, the displacements of the optical flow in four directions of a 2D
space (right, left, up, down). The fifth component of the ADV calculates the
frequencies in direction changes. In the fourth stage a normalization of the ADV
vector in performed. Finally, the last stage uses the ADV vector normalized as
the input for a generic classifier in order to evaluate the results.


3     Experimentation
3.1   Preliminary experiments using Deep Learning
In order to validate the results the wide 10-fold cross validation (10-CV) tech-
nique are used and 7 images of 2D are extracted from the original 3D tomogra-
phy. For the experiments Keras v2.1.6 [4] and scikit-learn v0.19.1 [14] Python
software are used, in order to build deep neural networks and apply classifiers,
respectively.
    Table 1 shows the first approach using CNN. The results are close to 50%
which means that the network has not learned the difference between the two
classes.
    The second approach consists of combination of CNN with RNN. In this case,
2 layers are used and the filters are: 32 (3x3), 64 (3x3). The accuracy is 0.50
and individual proofs are [0.54 0.58 0.42 0.50 0.48 0.52 0.52 0.44 0.48 0.52]. The
results are also unsatisfactory and we will try a new approach.


Conv. Detail                   Accuracy                         10-CV
layers filters x kernel         mean                            results
4      64x7, 64x3, 64x3, 64x3   0.52      [0.46 0.54 0.54 0.54 0.52 0.52 0.52 0.52 0.52 0.52]
2      64x7, 64x3               0.51      [0.46 0.54 0.54 0.50 0.52 0.52 0.52 0.52 0.48 0.52]
                     Table 1. Classification results using CNN.


    The third try using a pretrained network (VGG16) with ImageNet weights
configuration. In this case, VGG16 is used to extract the weights of the penul-
timate layer as descriptors of image. These features extracted from the latest
layers of the neural network are called neural codes. The number of final char-
acteristics is 28672 corresponding to 7 images per times 4096 neural codes per
image. Table 2 summarize the experiments using classifiers belonging different
families of algorithms attending to neural codes directly or normalizing with L2
function.
    Last approach using deep learning architectures consists of get neural codes
as previous try and classify the sequence of 7 images with a recurrent neural
network (LSTM). Again, the accuracy is 0.49 and detailed fold results are [0.62
0.54 0.42 0.46 0.28 0.48 0.48 0.48 0.56 0.56].
    In general, the results per folder (10-CV) are very different probably due
to the nature of neuronal networks with random initialization of neurons, the
optimizers that have to adjust thousands of parameters that finally find local
minimums and also due to the small amount of images available to train a
neuronal network where small differences between the training and test sets
Neural Classifier              Accuracy 10-CV
Codes algorithms                mean results
         Nearest Neighbors       0.48   [0.54 0.46 0.65 0.35 0.36 0.6 0.32 0.56 0.44 0.52]
         Linear SVM              0.48   [0.46 0.46 0.69 0.54 0.4 0.6 0.24 0.52 0.44 0.48]
         RBF SVM                 0.53   [0.54 0.54 0.54 0.54 0.52 0.52 0.52 0.52 0.52 0.52]
         Decision Tree           0.54   [0.62 0.54 0.58 0.5 0.44 0.44 0.48 0.72 0.56 0.52]
original Random Forest           0.41   [0.27 0.54 0.5 0.38 0.4 0.56 0.28 0.4 0.4 0.4 ]
         AdaBoost                0.49   [0.46 0.65 0.58 0.42 0.28 0.52 0.52 0.64 0.36 0.48]
         Naive Bayes             0.47   [0.54 0.58 0.5 0.27 0.48 0.52 0.36 0.48 0.48 0.44]
         Logistic Regression     0.48   [0.5 0.46 0.69 0.54 0.44 0.52 0.28 0.48 0.4 0.44]
         XGBoost                 0.52   [0.46 0.65 0.65 0.46 0.32 0.6 0.4 0.6 0.6 0.44]
        Nearest Neighbors       0.50    [0.62 0.5 0.69 0.42 0.4 0.48 0.32 0.64 0.4 0.48]
        Linear SVM              0.53    [0.54 0.54 0.54 0.54 0.52 0.52 0.52 0.52 0.52 0.52]
        RBF SVM                 0.45    [0.42 0.54 0.54 0.46 0.24 0.44 0.28 0.6 0.44 0.56]
        Decision Tree           0.52    [0.58 0.5 0.5 0.58 0.56 0.4 0.52 0.6 0.36 0.56]
L2      Random Forest           0.52    [0.42 0.73 0.42 0.42 0.6 0.48 0.48 0.52 0.52 0.6 ]
        AdaBoost                0.47    [0.54 0.58 0.54 0.31 0.4 0.56 0.24 0.52 0.44 0.56]
        Naive Bayes             0.47    [0.54 0.58 0.5 0.27 0.48 0.52 0.36 0.48 0.48 0.44]
        Logistic Regression     0.47    [0.42 0.62 0.58 0.46 0.2 0.48 0.28 0.6 0.56 0.52]
        XGBoost                 0.50    [0.65 0.54 0.62 0.42 0.32 0.56 0.4 0.6 0.44 0.48]
Table 2. Classification results applied to features extraction with VGG16 pretrained
network.


allow generating sets easier to classify in some cases than in others. On the
other hand, no preprocessing has been applied to 2D images which could also
influence the high variations in results.


3.2   Preliminary experiments using Optical Flow plus ADV

For this experiments, the wide 10-fold cross validation (10-CV) technique have
been used again. All images of the original 3D tomography are used to calculate
the optical flow for each patient. For the experiments Matlab R2013b has been
used to calculate the optical flow, the ADV and the classifiers.
   Table 3.2 shows the performance results of the proposed method.


      Classifier OF size ADV Accuracy MDR Accuracy DS Accuracy
            SVM    64x64 3x3          0,5097               0,312        0,6567
           3-knn   64x64 3x3          0,5135                 0,52       0,4627
             Table 3. Classification results using Optical flow plus ADV
3.3     Frequency Matrix with Deep Learning

A modification of the Optical Flow experiment was to use the frequency matrices
generated as input to a neural network.


                             Fig. 4. Frequency Matrix.


      In figure 1 you can see an example of Frequency Matrix..


4      Results

1. Run 1: MDR Baseline. The Baseline is a probabilistic model in which the
   image was not analyzed and only the data of sex and age have been taken
   into account.
2. Run 2: ADV 3x3, SVM, 1000 SMOTE upsampling
 3. Run 3 Frequency Normalized. In this model we apply Deep Learning tech-
    niques on the normalized frequency matrix obtained through the Optical
    Flow.
 4. Run 4 In this model we apply Deep Learning techniques (Decision Tree) on
    a subset of 2D images of the tomography.
 5. Run 5 In this model we apply Deep Learning techniques (Decision Tree) on
    a subset of 2D images of the tomography.

   As can be see in the table 4 the model of Optical Flow SVM obtains the best
results, for the sake of using only selected images.


     Table 4. Results of University of Alicante vs better results at SubTask 1

Run                                 AUC Rank AUC ACC Rank ACC
VISTA@UEvora                        0.6178 1     0.5593 8
San Diego VA HCS/UCSD               0.6114 2     0.6144 1
MDRBaseline0                        0.5669 10    0.4873 32
testSVMSMOTE                        0.5509 15    0.5339 20
testOpticalFlowwFrequencyNormalized 0.5473 16    0.5127 24
DecisionTree25v2                    0.5049 26    0.5000 29
testOFFullVersion2                  0.4971 29    0.4958 31
testOpticalFlowFull                 0.4845 32    0.5169 23
testFrequency                       0.4781 34    0.4788 34
testflowI                           0.4740 35    0.4492 39


    Due to we had little time available for second task, we only present the two
models of Optical Flow, SVM and 3nn.
    Run 1: ADV 3x3, SVM, 1000 SMOTE upsampling Run 2: ADV 3x3, 3-nn,
1000 SMOTE upsampling
    The results were significantly better using the 3-nn but very far from the rest
of the participants 5.


     Table 5. Results of University of Alicante vs better results at SubTask 2

Run         AUC Rank AUC ACC Rank ACC
UIIP BioMed 0.2312 1     0.4227 1
T23nnFinal 0.0204 32     0.2587 31
T2SVMFinal -0.0920 38    0.1167 38


5   Conclusions and future work
Early detection of tuberculosis is a major social challenge, given the devastating
effects of the disease. On the other hand, it represents a scientific challenge of
the highest level. As the organizers claim, “you have to work to get methods
that allow a correct detection of the disease that kills thousands and thousands
of people”. In this paper we have proposed two different approaches to face the
problem. The first one is based on the use of Deep Learning techniques on a
sequence of 2D images extracted from a 3D tomography. The second approach
uses Optical Flow to convert the 3D tomography into a motion representation in
order to calculate the ADV (a previous descriptor provided by the group). This
descriptor is able to synthesize the information of a sequence into one image.
The experiments carried out in these two approaches allow us to confirm the
interest of these lines of research and encourage us to seek improvements in the
proposed methodologies.


References
 1. Azorin-Lopez, J., Saval-Calvo, M., Fuster-Guillo, A., Garcia-Rodriguez, J.: Hu-
    man behaviour recognition based on trajectory analysis using neural networks.
    In: Proceedings of the International Joint Conference on Neural Networks (2013).
    https://doi.org/10.1109/IJCNN.2013.6706724
 2. Azorin-Lopez, J., Saval-Calvo, M., Fuster-Guillo, A., Garcia-Rodriguez, J., Ca-
    zorla, M., Signes-Pont, M.T.: Group activity description and recognition based
    on trajectory analysis and neural networks. In: 2016 International Joint Con-
    ference on Neural Networks (IJCNN). vol. 2016-Octob, pp. 1585–1592 (2016).
    https://doi.org/10.1109/IJCNN.2016.7727387
 3. Chao, H., Gu, Y., Napolitano, M.: A survey of optical flow techniques for robotics
    navigation applications. Journal of Intelligent and Robotic Systems: Theory and
    Applications 73(1-4), 361–372 (2014). https://doi.org/10.1007/s10846-013-9923-6
 4. Chollet, F., et al.: Keras. https://keras.io (2015)
 5. Dicente Cid, Y., Liauchuk, V., Kovalev, V., , Müller, H.: Overview of ImageCLEF-
    tuberculosis 2018 - detecting multi-drug resistance, classifying tuberculosis type,
    and assessing severity score. In: CLEF2018 Working Notes. CEUR Workshop Pro-
    ceedings, CEUR-WS.org <http://ceur-ws.org>, Avignon, France (September 10-
    14 2018)
 6. Fortun, D., Bouthemy, P., Kervrann, C.: Optical flow modeling and computa-
    tion: A survey. Computer Vision and Image Understanding 134, 1–21 (2015).
    https://doi.org/10.1016/j.cviu.2015.02.008
 7. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016), http:
    //www.deeplearningbook.org
 8. Gowsikhaa, D., Abirami, S., Baskaran, R.: Automated human behavior anal-
    ysis from surveillance videos: a survey. Artificial Intelligence Review 42(4),
    747–765 (2014). https://doi.org/10.1007/s10462-012-9341-3, https://doi.org/
    10.1007/s10462-012-9341-3
 9. Hata, N., Nabavi, A., Wells, W.M., Warfield, S.K., Kikinis, R., Black, P.M.L.,
    Jolesz, F.A.: Three-dimensional optical flow method for measurement of volumetric
    brain deformation from intraoperative MR images. Journal of Computer Assisted
    Tomography 24(4), 531–538 (2000). https://doi.org/10.1097/00004728-200007000-
    00004
10. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation
    9(8), 1735–1780 (1997)
11. Ionescu, B., Muller, H., Villegas, M., de Herrera, A.G.S., Eickhoff, C., Andrea-
    rczyk, V., Cid, Y.D., Liauchuk, V., Kovalev, V., Hasan, S.A., Ling, Y., Farri, O.,
    Liu, J., Lungren, M., Dang-Nguyen, D.T., Piras, L., Riegler, M., Zhou, L., Lux, M.,
    Gurrin, C.: Overview of ImageCLEF 2018: Challenges, datasets and evaluation. In:
    Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceed-
    ings of the Ninth International Conference of the CLEF Association (CLEF 2018),
    vol. 11018. LNCS Lecture Notes in Computer Science, Springer, Avignon, France
    (September 10-14 2018)
12. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep con-
    volutional neural networks. In: Advances in neural information processing systems.
    pp. 1097–1105 (2012)
13. Patel, D., Saurahb, U.: Optical flow measurement using Lucas Kanade method.
    Int J Comput Appl 61(10), 6–10 (2013)
14. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
    Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
    Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine
    learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)
15. Rumelhart, D., Hinton, G., Williams, R.: Learning sequential structure in simple
    recurrent networks. Parallel distributed processing: Experiments in the microstruc-
    ture of cognition 1 (1986)
16. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale
    image recognition. arXiv preprint arXiv:1409.1556 (2014)
17. Xavier, M., Lalande, A., Walker, P.M., Brunotte, F., Legrand, L.: An adapted
    optical flow algorithm for robust quantification of cardiac wall motion from stan-
    dard cine-MR examinations. IEEE Transactions on Information Technology in
    Biomedicine 16(5), 859–868 (2012). https://doi.org/10.1109/TITB.2012.2204893