HKBU at MediaEval 2017
                               Medico: Medical Multimedia Task
                                   Yang Liu1,2 , Zhonglei Gu1 , William K. Cheung1
                1
                  Department of Computer Science, Hong Kong Baptist University, Hong Kong SAR, China
          2
              Institute of Research and Continuing Education, Hong Kong Baptist University, Shenzhen, China
                      csygliu@comp.hkbu.edu.hk,cszlgu@comp.hkbu.edu.hk,william@comp.hkbu.edu.hk
ABSTRACT                                                            2 METHOD
In this paper, we describe our model designed for automat-          2.1 Bidirectional Marginal Fisher Analysis
ic detection of diseases based on multimedia data collected
                                                                     Given the data matrix X = [x1 , x2 , ..., x𝑛 ], where 𝑛 denotes
in hospitals. Specifically, a two-stage learning strategy is
                                                                     the number of data points in the training set, x𝑖 ∈ R𝐷
designed to predict the diseases. In the first stage, a di-
                                                                     the high-dimensional feature vector of the 𝑖-th data point,
mensionality reduction method called bidirectional marginal
                                                                     𝑙𝑖 ∈ {1, 2, ..., 8} the set of possible labels, and l = [𝑙1 , 𝑙2 , ..., 𝑙𝑛 ]
Fisher analysis (BMFA) is proposed to project the original
                                                                     the set of true labels corresponding to each data point x𝑖 .
data to the low-dimensional space, with the key discriminant
                                                                    The proposed bidirectional marginal fisher analysis (BMFA)
information being well preserved. In the second stage, the
                                                                     aims to learn the subspace of original features, in which the
multi-class support vector machine (SVM) is utilized on the
                                                                     marginal discriminant information is well preserved.
low-dimensional space for detection. Experimental results
                                                                        A manifold learning algorithm typically involves the def-
demonstrate the efficiency of designed model.
                                                                     inition of a neighborhood graph. The effectiveness of the
                                                                     algorithm very often depends on how well the defined neigh-
1    INTRODUCTION                                                    borhood graph can preserve the local topological properties of
                                                                     the data manifold. To achieve a robust neighborhood graph,
We have evidenced an increasing trend of applying multi-
                                                                    we use the following two-way connection criterion to con-
media processing and analysis methods, such as computer
                                                                     struct the graph [4]: we connect x𝑖 and x𝑗 only if x𝑖 is one
vision and medical image processing, to assisting diagnosis,
                                                                     of the 𝐾-nearest neighbors of x𝑗 , and, x𝑗 is also one of the
detection and interpretation of medical abnormalities [7].
                                                                     𝐾-nearest neighbors of x𝑖 . The above criterion adopts the
While there exist a number of successful applications related
                                                                    “and ” hypothesis, which means it agrees to connect two data
to healthcare decision support, predictive analytics are still
                                                                     points if and only if both of them are neighbors of each other.
considered challenging for some specific medical multimedia
                                                                        The within-class adjacency matrix and between-class adja-
data like endoscopy and ultrasound images, due to the high
                                                                     cency matrix are then defined as follows:
complexity of these medical tasks [8]. Moreover, most of the                 {︃
existing methods make use of a limited amount of information,                             ||x −x ||2
                                                                        𝑤
                                                                     A𝑖𝑗 =       exp(− 𝑖 2𝜎𝑗 𝐹 ), if 𝑗 ∈ 𝒩𝑖 and 𝑖 ∈ 𝒩𝑗 and 𝑙𝑖 = 𝑙𝑗 ,
where possibly useful information sources such as sensory and                    0,                         otherwise
temporal information were mostly not considered [5, 10, 11].                                                                                (1)
   The 2017 MediaEval Medico Task aims to improve de-                        {︃
                                                                                          ||x𝑖 −x𝑗 ||2
tection and location of abnormalities through designing an           A𝑏𝑖𝑗 =      exp(−         2𝜎
                                                                                                     𝐹
                                                                                                        ), if 𝑗 ∈ 𝒩𝑖 and 𝑖 ∈ 𝒩𝑗 and 𝑙𝑖 ̸= 𝑙𝑗 ,
integrated approach combining information from both video                        0,                         otherwise
and image information, as well as other sensory information                                                                                 (2)
and assistance of medical experts. The dataset used in this         where 𝒩𝑖 denotes the index set of the ∑︀          𝐾 nearest neighbors
task consists of 8,000 GI tract images that are annotated            of x𝑖 , and 𝜎 is empirically set by 𝜎 = 𝑛                            2
                                                                                                                       𝑖=1 ||x𝑖 − x𝑖𝐾 ||𝐹 /𝑛
and verified by medical doctors for the ground truth. More          where x𝑖𝐾 is the 𝐾th nearest neighbor of x𝑖 .
details about the task requirements and the dataset can be              BMFA aims to learn a transformation matrix W ∈ R𝐷×𝑑
found in [8].                                                       where the following objective function is maximized:
   In order to perform efficient detection of diseases, we design                                ∑︀𝑛         𝑏    𝑇          𝑇    2
a two-stage learning strategy. In the first stage, a manifold                                        𝑖,𝑗=1 A𝑖𝑗 ||W x𝑖 − W x𝑗 ||𝐹
                                                                              W = arg max ∑︀𝑛                𝑤                    2
                                                                                                                                    .       (3)
                                                                                                     𝑖,𝑗=1 A𝑖𝑗 ||W x𝑖 − W x𝑗 ||𝐹
                                                                                                                  𝑇          𝑇
learning method called bidirectional marginal Fisher analysis                            W

(BMFA) is proposed to learn the compact representation                 The above optimization problem can be relaxed to the
of the original data, with the key discriminant information         following ratio trace problem:
being well preserved. In the second stage, the multi-class
support vector machine (SVM) is applied to the compact                                                      W𝑇 S𝑏 W
                                                                                       W = arg max 𝑡𝑟(              ),                    (4)
representation of each data point for detection.                                                  W         W𝑇 S𝑤 W
                                                                    where S𝑏 = XL𝑏 X𝑇 and S𝑤 = XL𝑤 X𝑇 , in which L𝑏 = D𝑏 −
Copyright held by the owner/author(s).
MediaEval’17, 13-15 September 2017, Dublin, Ireland                 A𝑏 and L𝑤 = D𝑤 − A𝑤 are the Laplacian matrices ∑︀[1, 3], and
                                                                    D𝑏 and D𝑤 are diagonal matrices defined as D𝑏𝑖𝑖 = 𝑛 𝑗=1 A𝑖𝑗
                                                                                                                               𝑏
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                                     Y. Liu, Z. Gu, W. K. Cheung

Table 1: Official evaluation results (provided by the organizers) of the proposed model on the Kvasir dataset.

                                        Recall    Specificity     Precision   Accuracy     F1     MCC       𝑅𝑘 statistic
            Run 1 for both subtasks     0.6975     0.9568          0.6975      0.9244    0.6975   0.6543      0.6571
            Run 2 for both subtasks     0.7028     0.9575          0.7028      0.9257    0.7028   0.6603      0.6626
            Run 3 for both subtasks     0.6890     0.9556          0.6890      0.9223    0.6890   0.6446      0.6453
            Run 4 for both subtasks     0.6988     0.9570          0.6988      0.9247    0.6988   0.6557      0.6585
            Run 5 for both subtasks     0.6918     0.9560          0.6918      0.9229    0.6918   0.6477      0.6483

            ∑︀𝑛
and D𝑤 𝑖𝑖 =
                    𝑤
              𝑗=1 A𝑖𝑗 (𝑖 = 1, ..., 𝑛). The optimization problem
in Eq. (4) could be solved by generalized eigen-decomposition.
   For each high-dimensional data point x𝑖 , we can obtain
its low-dimensional representation by y𝑖 = W𝑇 x𝑖 .

2.2    Support Vector Classification
To predict diseases, we apply the multi-class support vector
machine (SVM) to the low-dimensional representation of
each data point. Specifically, we employ the one-against-all
strategy for the multi-class classification and use the 𝜈-SVM
[9] on each class.

3     RESULTS
                                                                        Figure 1: Contribution of each individual feature.
In this section, we report the experimental settings and the
evaluation results. The Kvasir Dataset is used to evaluate
the performance of the proposed model [6]. The size of the
training set is 4, 000 (500 for each of the eight classes) and        with only a portion of the available data being utilized for
the dimension of the original feature space is 1, 185. We em-         training. This indicates that the learned subspace is able to
ploy the same setting on both the detection subtask and               capture the discriminant information of the dataset. More-
the efficient detection subtask (i.e., the 5 runs of the detec-       over, by reducing the original data to a very low-dimensional
tion subtask and those of the efficient detection subtask are         subspace, the learning process becomes more efficient.
identical). Specifically:                                                In addition to the overall performance, we analyze the
                                                                      contribution of each dimension in the original feature s-
      ∙ For Run 1 of both subtasks, we randomly select
                                                                      pace. The contribution   of the 𝑖-th dimension is defined as
        1, 000 data samples (125 for each class) for training,                          ∑︀
                                                                      𝐶𝑜𝑛𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛𝑖 = 𝑗 𝜆𝑗 |𝑤𝑖𝑗 |, where 𝜆𝑗 denotes the 𝑗-th eigen-
        and project the original data to the 8-dimensional
                                                                      value, 𝑤𝑖𝑗 denotes the (𝑖, 𝑗)-th element of W, and | · | de-
        subspace via BMFA;
                                                                      notes the absolute value operator. From Figure 1 we can
      ∙ For Run 2 of both subtasks, we randomly select 1, 000
                                                                      see that the edge histogram and auto color correlogram fea-
        data samples for training, and project the original
                                                                      tures contribute more than the others, which indicates their
        data to the 9-dimensional subspace via BMFA;
                                                                      importance in medical image classification.
      ∙ For Run 3 of both subtasks, we randomly select
        800 data samples (100 for each class) for training,
        and project the original data to the 9-dimensional             4      CONCLUSION
        subspace via BMFA;                                            This paper described the model designed for disease detection
      ∙ For Run 4 of both subtasks, we randomly select 1, 000         based on multimedia data. A novel dimensionality reduction
        data samples for training, and project the original           algorithm dubbed bidirectional marginal Fisher analysis (BM-
        data to the 7-dimensional subspace via BMFA;                  FA) was presented to extract the discriminant information
      ∙ For Run 5 of both subtasks, we randomly select 800            from the original feature space. After that, SVM was applied
        data samples for training, and project the original           to the low-dimensional subspace for multi-class classification.
        data to the 7-dimensional subspace via BMFA.                  Experimental results on the Kvasir dataset validated the
                                                                      effectiveness and efficiency of the proposed model.
In the following step, we use the LIBSVM toolbox for clas-
sification [2]. The 𝜈-SVC with linear kernel is selected, and
the parameters are set as 𝜈 = 0.2 and 𝑐𝑜𝑠𝑡 = 1.                        ACKNOWLEDGMENTS
   Table 1 shows the official evaluation results (provided by         This work was supported in part by the National Natural
the task organizers) of the proposed model on the Kvasir              Science Foundation of China under Grant 61503317, and in
dataset. From the table, we can see that the proposed model           part by the Faculty Research Grant of Hong Kong Baptist
performs well, given the task is a multi-class classification         University (HKBU) under Project FRG2/16-17/032.
Medico: Medical Multimedia Task                                     MediaEval’17, 13-15 September 2017, Dublin, Ireland


REFERENCES
 [1] M. Belkin and P. Niyogi. 2001. Laplacian Eigenmaps and
     Spectral Techniques for Embedding and Clustering. In Ad-
     vances in Neural Information Processing Systems 14 (NIPS).
     585–591.
 [2] Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A
     library for support vector machines. ACM Transactions
     on Intelligent Systems and Technology 2 (2011), 27:1–27:27.
     Issue 3.
 [3] X. He and P. Niyogi. 2003. Locality Preserving Projections.
     In Advances in Neural Information Processing Systems 16
     (NIPS). 153–160.
 [4] Yang Liu, Yan Liu, and Keith C. C. Chan. 2011. Ordinal
     regression via manifold learning. In Proceeding of the 25th
     AAAI Conference on Artificial Intelligence (AAAI). 398–403.
 [5] Konstantin Pogorelov, Sigrun Losada Eskeland, Thomas de
     Lange, Carsten Griwodz, Kristin Ranheim Randel, Håkon K-
     vale Stensland, Duc-Tien Dang-Nguyen, Concetto Spampina-
     to, Dag Johansen, Michael Riegler, and Pål Halvorsen. 2017.
     A Holistic Multimedia System for Gastrointestinal Tract Dis-
     ease Detection. In Proceedings of the 8th ACM on Multimedia
     Systems Conference (MMSYS). 112–123.
 [6] Konstantin Pogorelov, Kristin Ranheim Randel, Carsten Gri-
     wodz, Sigrun Losada Eskeland, Thomas de Lange, Dag Jo-
     hansen, Concetto Spampinato, Duc-Tien Dang-Nguyen, Math-
     ias Lux, Peter Thelin Schmidt, Michael Riegler, and Pål
     Halvorsen. 2017. Kvasir: A Multi-Class Image Dataset for
     Computer Aided Gastrointestinal Disease Detection. In Pro-
     ceedings of the 8th ACM on Multimedia Systems Conference
     (MMSYS). 164–169.
 [7] Michael Riegler, Mathias Lux, Carsten Griwodz, Concetto
     Spampinato, Thomas de Lange, Sigrun L. Eskeland, Kon-
     stantin Pogorelov, Wallapak Tavanapong, Peter T. Schmidt,
     Cathal Gurrin, Dag Johansen, Håvard Johansen, and Pål
     Halvorsen. 2016. Multimedia and Medicine: Teammates for
     Better Disease Detection and Survival. In Proceedings of the
     2016 ACM on Multimedia Conference (ACMMM). 968–977.
 [8] Michael Riegler, Konstantin Pogorelov, Pål Halvorsen,
     Carsten Griwodz, Thomas de Lange, Kristin Ranheim Ran-
     del, Sigrun Losada Eskeland, Duc-Tien Dang-Nguyen, Math-
     ias Lux, and Concetto Spampinato. 2017. Multimedia for
     Medicine: The Medico Task at MediaEval 2017. In Working
     Notes Proceedings of the MediaEval 2017 Workshop.
 [9] Bernhard Schölkopf, Alex J. Smola, Robert C. Williamson,
     and Peter L. Bartlett. 2000. New Support Vector Algorithms.
     Neural Comput. 12, 5 (2000), 1207–1245.
[10] Y. Wang, W. Tavanapong, J. Wong, J. Oh, and P. C. de
     Groen. 2011. Computer-aided detection of retroflexion in
     colonoscopy. In Proceedings of 24th International Symposium
     on Computer-Based Medical Systems (CBMS). 1–6.
[11] Yi Wang, Wallapak Tavanapong, Johnny Wong, Jung Hwan
     Oh, and Piet C. de Groen. 2015. Polyp-Alert: Near real-
     time feedback during colonoscopy. Computer Methods and
     Programs in Biomedicine 120, 3 (2015), 164–179.