HKBU at MediaEval 2017 Medico: Medical Multimedia Task Yang Liu1,2 , Zhonglei Gu1 , William K. Cheung1 1 Department of Computer Science, Hong Kong Baptist University, Hong Kong SAR, China 2 Institute of Research and Continuing Education, Hong Kong Baptist University, Shenzhen, China csygliu@comp.hkbu.edu.hk,cszlgu@comp.hkbu.edu.hk,william@comp.hkbu.edu.hk ABSTRACT 2 METHOD In this paper, we describe our model designed for automat- 2.1 Bidirectional Marginal Fisher Analysis ic detection of diseases based on multimedia data collected Given the data matrix X = [x1 , x2 , ..., x𝑛 ], where 𝑛 denotes in hospitals. Specifically, a two-stage learning strategy is the number of data points in the training set, x𝑖 ∈ R𝐷 designed to predict the diseases. In the first stage, a di- the high-dimensional feature vector of the 𝑖-th data point, mensionality reduction method called bidirectional marginal 𝑙𝑖 ∈ {1, 2, ..., 8} the set of possible labels, and l = [𝑙1 , 𝑙2 , ..., 𝑙𝑛 ] Fisher analysis (BMFA) is proposed to project the original the set of true labels corresponding to each data point x𝑖 . data to the low-dimensional space, with the key discriminant The proposed bidirectional marginal fisher analysis (BMFA) information being well preserved. In the second stage, the aims to learn the subspace of original features, in which the multi-class support vector machine (SVM) is utilized on the marginal discriminant information is well preserved. low-dimensional space for detection. Experimental results A manifold learning algorithm typically involves the def- demonstrate the efficiency of designed model. inition of a neighborhood graph. The effectiveness of the algorithm very often depends on how well the defined neigh- 1 INTRODUCTION borhood graph can preserve the local topological properties of the data manifold. To achieve a robust neighborhood graph, We have evidenced an increasing trend of applying multi- we use the following two-way connection criterion to con- media processing and analysis methods, such as computer struct the graph [4]: we connect x𝑖 and x𝑗 only if x𝑖 is one vision and medical image processing, to assisting diagnosis, of the 𝐾-nearest neighbors of x𝑗 , and, x𝑗 is also one of the detection and interpretation of medical abnormalities [7]. 𝐾-nearest neighbors of x𝑖 . The above criterion adopts the While there exist a number of successful applications related β€œand ” hypothesis, which means it agrees to connect two data to healthcare decision support, predictive analytics are still points if and only if both of them are neighbors of each other. considered challenging for some specific medical multimedia The within-class adjacency matrix and between-class adja- data like endoscopy and ultrasound images, due to the high cency matrix are then defined as follows: complexity of these medical tasks [8]. Moreover, most of the {οΈƒ existing methods make use of a limited amount of information, ||x βˆ’x ||2 𝑀 A𝑖𝑗 = exp(βˆ’ 𝑖 2πœŽπ‘— 𝐹 ), if 𝑗 ∈ 𝒩𝑖 and 𝑖 ∈ 𝒩𝑗 and 𝑙𝑖 = 𝑙𝑗 , where possibly useful information sources such as sensory and 0, otherwise temporal information were mostly not considered [5, 10, 11]. (1) The 2017 MediaEval Medico Task aims to improve de- {οΈƒ ||x𝑖 βˆ’x𝑗 ||2 tection and location of abnormalities through designing an A𝑏𝑖𝑗 = exp(βˆ’ 2𝜎 𝐹 ), if 𝑗 ∈ 𝒩𝑖 and 𝑖 ∈ 𝒩𝑗 and 𝑙𝑖 ΜΈ= 𝑙𝑗 , integrated approach combining information from both video 0, otherwise and image information, as well as other sensory information (2) and assistance of medical experts. The dataset used in this where 𝒩𝑖 denotes the index set of the βˆ‘οΈ€ 𝐾 nearest neighbors task consists of 8,000 GI tract images that are annotated of x𝑖 , and 𝜎 is empirically set by 𝜎 = 𝑛 2 𝑖=1 ||x𝑖 βˆ’ x𝑖𝐾 ||𝐹 /𝑛 and verified by medical doctors for the ground truth. More where x𝑖𝐾 is the 𝐾th nearest neighbor of x𝑖 . details about the task requirements and the dataset can be BMFA aims to learn a transformation matrix W ∈ R𝐷×𝑑 found in [8]. where the following objective function is maximized: In order to perform efficient detection of diseases, we design βˆ‘οΈ€π‘› 𝑏 𝑇 𝑇 2 a two-stage learning strategy. In the first stage, a manifold 𝑖,𝑗=1 A𝑖𝑗 ||W x𝑖 βˆ’ W x𝑗 ||𝐹 W = arg max βˆ‘οΈ€π‘› 𝑀 2 . (3) 𝑖,𝑗=1 A𝑖𝑗 ||W x𝑖 βˆ’ W x𝑗 ||𝐹 𝑇 𝑇 learning method called bidirectional marginal Fisher analysis W (BMFA) is proposed to learn the compact representation The above optimization problem can be relaxed to the of the original data, with the key discriminant information following ratio trace problem: being well preserved. In the second stage, the multi-class support vector machine (SVM) is applied to the compact W𝑇 S𝑏 W W = arg max π‘‘π‘Ÿ( ), (4) representation of each data point for detection. W W𝑇 S𝑀 W where S𝑏 = XL𝑏 X𝑇 and S𝑀 = XL𝑀 X𝑇 , in which L𝑏 = D𝑏 βˆ’ Copyright held by the owner/author(s). MediaEval’17, 13-15 September 2017, Dublin, Ireland A𝑏 and L𝑀 = D𝑀 βˆ’ A𝑀 are the Laplacian matrices βˆ‘οΈ€[1, 3], and D𝑏 and D𝑀 are diagonal matrices defined as D𝑏𝑖𝑖 = 𝑛 𝑗=1 A𝑖𝑗 𝑏 MediaEval’17, 13-15 September 2017, Dublin, Ireland Y. Liu, Z. Gu, W. K. Cheung Table 1: Official evaluation results (provided by the organizers) of the proposed model on the Kvasir dataset. Recall Specificity Precision Accuracy F1 MCC π‘…π‘˜ statistic Run 1 for both subtasks 0.6975 0.9568 0.6975 0.9244 0.6975 0.6543 0.6571 Run 2 for both subtasks 0.7028 0.9575 0.7028 0.9257 0.7028 0.6603 0.6626 Run 3 for both subtasks 0.6890 0.9556 0.6890 0.9223 0.6890 0.6446 0.6453 Run 4 for both subtasks 0.6988 0.9570 0.6988 0.9247 0.6988 0.6557 0.6585 Run 5 for both subtasks 0.6918 0.9560 0.6918 0.9229 0.6918 0.6477 0.6483 βˆ‘οΈ€π‘› and D𝑀 𝑖𝑖 = 𝑀 𝑗=1 A𝑖𝑗 (𝑖 = 1, ..., 𝑛). The optimization problem in Eq. (4) could be solved by generalized eigen-decomposition. For each high-dimensional data point x𝑖 , we can obtain its low-dimensional representation by y𝑖 = W𝑇 x𝑖 . 2.2 Support Vector Classification To predict diseases, we apply the multi-class support vector machine (SVM) to the low-dimensional representation of each data point. Specifically, we employ the one-against-all strategy for the multi-class classification and use the 𝜈-SVM [9] on each class. 3 RESULTS Figure 1: Contribution of each individual feature. In this section, we report the experimental settings and the evaluation results. The Kvasir Dataset is used to evaluate the performance of the proposed model [6]. The size of the training set is 4, 000 (500 for each of the eight classes) and with only a portion of the available data being utilized for the dimension of the original feature space is 1, 185. We em- training. This indicates that the learned subspace is able to ploy the same setting on both the detection subtask and capture the discriminant information of the dataset. More- the efficient detection subtask (i.e., the 5 runs of the detec- over, by reducing the original data to a very low-dimensional tion subtask and those of the efficient detection subtask are subspace, the learning process becomes more efficient. identical). Specifically: In addition to the overall performance, we analyze the contribution of each dimension in the original feature s- βˆ™ For Run 1 of both subtasks, we randomly select pace. The contribution of the 𝑖-th dimension is defined as 1, 000 data samples (125 for each class) for training, βˆ‘οΈ€ πΆπ‘œπ‘›π‘‘π‘Ÿπ‘–π‘π‘’π‘‘π‘–π‘œπ‘›π‘– = 𝑗 πœ†π‘— |𝑀𝑖𝑗 |, where πœ†π‘— denotes the 𝑗-th eigen- and project the original data to the 8-dimensional value, 𝑀𝑖𝑗 denotes the (𝑖, 𝑗)-th element of W, and | Β· | de- subspace via BMFA; notes the absolute value operator. From Figure 1 we can βˆ™ For Run 2 of both subtasks, we randomly select 1, 000 see that the edge histogram and auto color correlogram fea- data samples for training, and project the original tures contribute more than the others, which indicates their data to the 9-dimensional subspace via BMFA; importance in medical image classification. βˆ™ For Run 3 of both subtasks, we randomly select 800 data samples (100 for each class) for training, and project the original data to the 9-dimensional 4 CONCLUSION subspace via BMFA; This paper described the model designed for disease detection βˆ™ For Run 4 of both subtasks, we randomly select 1, 000 based on multimedia data. A novel dimensionality reduction data samples for training, and project the original algorithm dubbed bidirectional marginal Fisher analysis (BM- data to the 7-dimensional subspace via BMFA; FA) was presented to extract the discriminant information βˆ™ For Run 5 of both subtasks, we randomly select 800 from the original feature space. After that, SVM was applied data samples for training, and project the original to the low-dimensional subspace for multi-class classification. data to the 7-dimensional subspace via BMFA. Experimental results on the Kvasir dataset validated the effectiveness and efficiency of the proposed model. In the following step, we use the LIBSVM toolbox for clas- sification [2]. The 𝜈-SVC with linear kernel is selected, and the parameters are set as 𝜈 = 0.2 and π‘π‘œπ‘ π‘‘ = 1. ACKNOWLEDGMENTS Table 1 shows the official evaluation results (provided by This work was supported in part by the National Natural the task organizers) of the proposed model on the Kvasir Science Foundation of China under Grant 61503317, and in dataset. From the table, we can see that the proposed model part by the Faculty Research Grant of Hong Kong Baptist performs well, given the task is a multi-class classification University (HKBU) under Project FRG2/16-17/032. Medico: Medical Multimedia Task MediaEval’17, 13-15 September 2017, Dublin, Ireland REFERENCES [1] M. Belkin and P. Niyogi. 2001. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering. In Ad- vances in Neural Information Processing Systems 14 (NIPS). 585–591. [2] Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2 (2011), 27:1–27:27. Issue 3. [3] X. He and P. Niyogi. 2003. Locality Preserving Projections. In Advances in Neural Information Processing Systems 16 (NIPS). 153–160. [4] Yang Liu, Yan Liu, and Keith C. C. Chan. 2011. Ordinal regression via manifold learning. In Proceeding of the 25th AAAI Conference on Artificial Intelligence (AAAI). 398–403. [5] Konstantin Pogorelov, Sigrun Losada Eskeland, Thomas de Lange, Carsten Griwodz, Kristin Ranheim Randel, Håkon K- vale Stensland, Duc-Tien Dang-Nguyen, Concetto Spampina- to, Dag Johansen, Michael Riegler, and Pål Halvorsen. 2017. A Holistic Multimedia System for Gastrointestinal Tract Dis- ease Detection. In Proceedings of the 8th ACM on Multimedia Systems Conference (MMSYS). 112–123. [6] Konstantin Pogorelov, Kristin Ranheim Randel, Carsten Gri- wodz, Sigrun Losada Eskeland, Thomas de Lange, Dag Jo- hansen, Concetto Spampinato, Duc-Tien Dang-Nguyen, Math- ias Lux, Peter Thelin Schmidt, Michael Riegler, and Pål Halvorsen. 2017. Kvasir: A Multi-Class Image Dataset for Computer Aided Gastrointestinal Disease Detection. In Pro- ceedings of the 8th ACM on Multimedia Systems Conference (MMSYS). 164–169. [7] Michael Riegler, Mathias Lux, Carsten Griwodz, Concetto Spampinato, Thomas de Lange, Sigrun L. Eskeland, Kon- stantin Pogorelov, Wallapak Tavanapong, Peter T. Schmidt, Cathal Gurrin, Dag Johansen, Håvard Johansen, and Pål Halvorsen. 2016. Multimedia and Medicine: Teammates for Better Disease Detection and Survival. In Proceedings of the 2016 ACM on Multimedia Conference (ACMMM). 968–977. [8] Michael Riegler, Konstantin Pogorelov, Pål Halvorsen, Carsten Griwodz, Thomas de Lange, Kristin Ranheim Ran- del, Sigrun Losada Eskeland, Duc-Tien Dang-Nguyen, Math- ias Lux, and Concetto Spampinato. 2017. Multimedia for Medicine: The Medico Task at MediaEval 2017. In Working Notes Proceedings of the MediaEval 2017 Workshop. [9] Bernhard Schölkopf, Alex J. Smola, Robert C. Williamson, and Peter L. Bartlett. 2000. New Support Vector Algorithms. Neural Comput. 12, 5 (2000), 1207–1245. [10] Y. Wang, W. Tavanapong, J. Wong, J. Oh, and P. C. de Groen. 2011. Computer-aided detection of retroflexion in colonoscopy. In Proceedings of 24th International Symposium on Computer-Based Medical Systems (CBMS). 1–6. [11] Yi Wang, Wallapak Tavanapong, Johnny Wong, Jung Hwan Oh, and Piet C. de Groen. 2015. Polyp-Alert: Near real- time feedback during colonoscopy. Computer Methods and Programs in Biomedicine 120, 3 (2015), 164–179.