Weighted Discriminant Embedding: Discriminant Subspace Learning for Imbalanced Medical Data Classification Tobey H. Ko1 , Zhonglei Gu2 , Yang Liu2,3 1 Department of Industrial and Manufacturing Systems Engineering, University of Hong Kong, HKSAR, China 2 Department of Computer Science, Hong Kong Baptist University, HKSAR, China 3 Institute of Research and Continuing Education, Hong Kong Baptist University, Shenzhen, China tobeyko@hku.hk,csygliu@comp.hkbu.edu.hk,cszlgu@comp.hkbu.edu.hk ABSTRACT W ∈ R𝐷×𝑑 (𝑑 ≀ 𝐷), which is capable of projecting the o- A model designed for automatic prediction of diseases based riginal high-dimensional data to a low-dimensional subspace on multimedia data collected in hospitals is introduced in 𝒡 = R𝑑 , where the weighted discriminant information could this working notes paper. In order to perform the automatic be preserved. diseases prediction efficiently, while using as few data as pos- In this year’s Medico task, the sample numbers in different sible for training, we develop a two-stage learning strategy, classes are highly imbalanced. To enhance the algorithm’s which first performs the weighted discriminant embedding power in making correct detection on rarer classes, we expect (WDE) to project the original data to a low-dimensional that data samples belonging to the same class, especially for feature subspace and then utilizes the cost-sensitive nearest the rarer class, should be close to each other as much as pos- neighbor (CS-NN) method in the learned subspace for dis- sible in the learned subspace, while nearby data samples from ease prediction. The proposed approach is evaluated on the different classes, again, especially for rarer classes, should be MediaEval 2018 Medico Multimedia Task. separated from each other as much as possible in the learned subspace. To minimize the weighted intra-class scatter, we present 1 INTRODUCTION the following objective: Aiming at improving the efficiency of detecting medical abnor- 𝑛 (︁ βˆ‘οΈ )︁ malities in the machine intelligence assisted medical diagnosis, W = arg min π‘‘π‘Ÿ 𝐴𝑖𝑗 W𝑇 (x𝑖 βˆ’ x𝑗 )(x𝑖 βˆ’ x𝑗 )𝑇 W , (1) and using as little information as possible, the MediaEval W 𝑖,𝑗=1 2018 Medico Multimedia Task [3] seeks to design an integrat- where 𝐴𝑖𝑗 = (𝐼𝑖 + 𝐼𝑗 )/2 if 𝑙𝑖 = 𝑙𝑗 ; and 0 otherwise. Here 𝐼𝑖 ed approach to assist the medical experts’ decision-making indicates the importance of class 𝑙𝑖 and is defined using the process using a combination of video and image information, entropy-based formulation [2]: as well as other sensory information. In this paper, a two-stage learning strategy is introduced to facilitate efficient detec- (1 βˆ’ 𝑝𝑖 )2 𝐼𝑖 = βˆ’ log(𝑝𝑖 ), (2) tion of diseases using multimedia and sensory information. 𝑝𝑖 The first stage consists of a dimensionality reduction process where 𝑝𝑖 denotes the proportion of class 𝑙𝑖 in the dataset. In which projects the original data to a low-dimensional fea- Eq. (2), small proportion indicates high importance. Eq. (1) ture representation using weighted discriminant embedding could be rewritten as: (WDE), which improves the efficiency of the learning process W = arg min π‘‘π‘Ÿ(W𝑇 L𝐴 W), (3) while also preserving the key discriminant information of W the original data. Then, the cost-sensitive nearest neighbor where L𝐴 is a Laplacian matrix [1] defined as L𝐴 = D𝐴 βˆ’ (CS-NN) method is employed to make the prediction in the βˆ‘οΈ€π‘›with D𝐴 being a diagonal matrix defined as (𝐷𝐴 )𝑖𝑖 = A, learned subspace. 𝑗=1 (𝐴)𝑖𝑗 (𝑖 = 1, Β· Β· Β· , 𝑛). Similarly, we define the following objective function to 2 WEIGHTED DISCRIMINANT maximize the weighted inter-class scatter: EMBEDDING (︁ βˆ‘οΈ 𝑛 )︁ Let 𝒳 be the training set: 𝒳 = {(x1 , 𝑙1 ), Β· Β· Β· , (x𝑛 , 𝑙𝑛 )}, where W = arg max π‘‘π‘Ÿ 𝐡𝑖𝑗 W𝑇 (x𝑖 βˆ’ x𝑗 )(x𝑖 βˆ’ x𝑗 )𝑇 W , (4) x𝑖 ∈ R𝐷 (𝑖 = 1, ..., 𝑛) denotes the feature representation of W 𝑖,𝑗=1 the 𝑖-th sample, 𝑙𝑖 ∈ {1, Β· Β· Β· , 𝐢} denotes the label of x𝑖 , 𝑛 where 𝐡𝑖𝑗 = 𝑁𝑖𝑗 (𝐼𝑖 + 𝐼𝑗 )/2 if 𝑙𝑖 ΜΈ= 𝑙𝑗 ; and 0 otherwise. Here denotes the number of data samples in the set, 𝐢 denotes 𝑁𝑖𝑗 = 𝑒π‘₯𝑝(βˆ’β€–x𝑖 βˆ’ x𝑗 β€–2 /2𝜎 2 ) is utilized to measure the close- the number of classes, and 𝐷 denotes the original dimen- ness between two data samples. Eq. (4) could be rewritten sion of data. Given the training set, weighted discriminant as: embedding (WDE) aims to learn a transformation matrix W = arg max π‘‘π‘Ÿ(W𝑇 L𝐡 W), (5) W Copyright held by the owner/author(s). MediaEval’18, 29-31 October 2018, Sophia Antipolis, France where L𝐡 = D𝐡 βˆ’ βˆ‘οΈ€ B, with D𝐡 being a diagonal matrix defined as (𝐷𝐡 )𝑖𝑖 = 𝑛 𝑗=1 (𝐡)𝑖𝑗 (𝑖 = 1, Β· Β· Β· , 𝑛). MediaEval’18, 29-31 October 2018, Sophia Antipolis, France T. H. Ko, Z. Gu, Y. Liu We integrate Eqs. (3) and (5) to form a unified objective Table 1: Results of our approach on the first subtask function of WDE: of MediaEval 2018 Medico Multimedia Task. (οΈ‚ 𝑇 )οΈ‚ W L𝐡 W Recall Precision Accuracy F1 Score Rk W = arg max π‘‘π‘Ÿ . (6) W W𝑇 L𝐴 W Run 1 0.5001 0.4917 0.9471 0.4830 0.5357 Then the optimal W that maximizes the objective func- Run 2 0.4415 0.4294 0.9384 0.4251 0.4612 tion in Eq. (6) is composed of the normalized eigenvectors Run 3 0.3947 0.3670 0.9320 0.3728 0.4035 corresponding to the 𝑑 largest eigenvalues of the following Run 4 0.3553 0.3333 0.9256 0.3324 0.3511 eigen-decomposition problem: Run 5 0.3019 0.2814 0.9186 0.2812 0.2918 L𝐡 w = πœ†L𝐴 w. (7) For a high-dimensional data sample x𝑖 , it can be mapped to Table 2: Results of our approach on the second sub- the subspace by y𝑖 = W𝑇 x𝑖 . task of MediaEval 2018 Medico Multimedia Task. 3 RESULTS AND ANALYSIS Recall Precision Accuracy F1 Score Rk To evaluate our approach, we test its performance on the Run 1 0.5005 0.4917 0.9471 0.4830 0.5357 MediaEval 2018 Medico Multimedia Task. The task contains Run 2 0.4181 0.3857 0.9337 0.4251 0.4193 both development set (with 5, 293 samples) and test set (with Run 3 0.4259 0.4085 0.9350 0.4040 0.4348 8, 740 samples). For each sample, we use six types of features: Run 4 0.3430 0.3107 0.9231 0.3135 0.3293 the 168-D JCD feature; the 18-D Tamura feature; the 33-D Run 5 0.3257 0.3053 0.9227 0.3057 0.3246 ColorLayout feature; the 80-D EdgeHistogram feature; the 256-D AutoColorCorrelogram feature; and the 630-D PHOG feature. The totally dimension is 1, 185. The reason might be that the proposed WDE is a linear map- We participate in two subtasks: 1) Classification of diseases ping method, which is not sufficient to capture the complex and findings; and 2) Fast and efficient classification. For both discriminant information embedded in the high-dimensional tasks, we submit 5 runs. feature space. This motivates us to consider extending our βˆ™ For Run 1 (on both subtasks), we use all the data method to the nonlinear case to improve the performance. from the development set for training; Furthermore, by comparing the performance on Run 2 (Run βˆ™ For Run 2 (on both subtasks), we randomly select 4) and that on Run 3 (Run 5), we observe that even we use all 50% data for each class from the development set the data from the minority classes (i.e., the β€œout-of-patient” for training; and β€œinstruments” classes), the performance is not improved. βˆ™ For Run 3 (on both subtasks), we randomly select The reason might be that the number of data in these two 50% data for each class from the development set, classes are too small to represent the β€œreal” distribution of the together with the remaining data in the β€œout-of- classes. On possible solution is to employ the oversampling patient” and β€œinstruments” classes, for training; technology to reasonably and faithfully generate samples for βˆ™ For Run 4 (on both subtasks), we randomly select minority classes. 25% data for each class from the development set for training; 4 CONCLUSION βˆ™ For Run 5 (on both subtasks), we randomly select In this paper, we propose a subspace learning method called 25% data for each class from the development set, weighted discriminant embedding (WDE), aiming at discov- together with the remaining data in the β€œout-of- ering the discriminant subspace for imbalanced dataset. After patient” and β€œinstruments” classes, for training. dimensionality reduction, the cost-sensitive nearest neighbor In the training stage, we use the training data to learn is utilized for classification. We plan to extend our work the transformation matrix W via WDE. We set 𝜎 = 1 and from two aspects. First, we will generalize our approach to the subspace dimension 𝑑 = 50. In the test stage, we use the nonlinear case to enhance its data representation ability. Sec- obtained W to map both training and test data to the 50-D ond, we will incorporate some oversampling methods into subspace, and then use the cost-sensitive nearest neighbor our approach to make it stronger for imbalanced learning (CS-NN) method for the final classification in the learned problem. subspace, where the cost of misclassifying the data of class 𝑐 (𝑐 = 1, Β· Β· Β· , 𝐢) to other classes is defined as π‘π‘œπ‘ π‘‘π‘ = 𝑛/𝑛𝑐 , ACKNOWLEDGMENTS with 𝑛 and 𝑛𝑐 being the total number of the training data This work was supported in part by the National Natural Sci- and the number of data in class 𝑐, respectively. ence Foundation of China (NSFC) under Grant 61503317, in Tables 1 and 2 report the results of our approach on sub- part by the General Research Fund (GRF) from the Research task 1 and subtask 2, respectively. Although the accuracy Grant Council (RGC) of Hong Kong SAR under Project looks good, the overall performance is far from satisfactory as HKBU12202417, and in part by the SZSTI Grant with the the results on other four important criteria are relatively low. Projct Code JCYJ20170307161544087. Weighted Discriminant Embedding MediaEval’18, 29-31 October 2018, Sophia Antipolis, France REFERENCES [1] M. Belkin and P. Niyogi. 2003. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Comput. 15, 6 (2003), 1373–1396. [2] T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollr. 2017. Focal Loss for Dense Object Detection. In 2017 IEEE International Conference on Computer Vision (ICCV). 2999–3007. [3] K. Pogorelov, M. Riegler, P. Halvorsen, T. de Lange, K. R. Randel, D.-T. Dang-Nguyen, M. Lux, and O. Ostroukhova. Medico Multimedia Task at MediaEval 2018. In Proceedings of the MediaEval 2018 Workshop. CEUR-WS, Sophia Antipolis, France, 29–31 October, 2018.