=Paper=
{{Paper
|id=Vol-1984/Mediaeval_2017_paper_20
|storemode=property
|title=Predicting Media Interestingness via Biased Discriminant Embedding and Supervised Manifold Regression
|pdfUrl=https://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_20.pdf
|volume=Vol-1984
|authors=Yang Liu,Zhonglei Gu,Tobey H. Ko
|dblpUrl=https://dblp.org/rec/conf/mediaeval/LiuGK17
}}
==Predicting Media Interestingness via Biased Discriminant Embedding and Supervised Manifold Regression==
Predicting Media Interestingness via Biased Discriminant Embedding and Supervised Manifold Regression Yang Liu1,2 , Zhonglei Gu1 , Tobey H. Ko3 1 Department of Computer Science, Hong Kong Baptist University, HKSAR, China 2 Institute of Research and Continuing Education, Hong Kong Baptist University, Shenzhen, China 3 Department of Industrial and Manufacturing Systems Engineering, University of Hong Kong, HKSAR, China csygliu@comp.hkbu.edu.hk,cszlgu@comp.hkbu.edu.hk,tobeyko@hku.hk ABSTRACT one. The objective function of BDE is given as follows: (οΈ )οΈ In this paper, we describe our model designed for automatic Wπ Sπ W prediction of media interestingness. Specifically, a two-stage W = arg max π‘π , (1) W Wπ Sπ€ W learning framework is proposed. In the first stage, supervised βοΈπ dimensionality reduction is employed to discover the key where Sπ€ = π,π=1 (πππ Γ ππ Γ ππ )(xπ β xπ )(xπ β xπ ) π de- βοΈπ discriminant information embedded in the original feature π notes the biased within-class scatter, S = π,π=1 (πππ Γ space. We present a new algorithm dubbed biased discrimi- |ππ β ππ |)(xπ β xπ )(xπ β xπ )π denotes the biased between-class nant embedding (BDE) to extract discriminant features with scatter, and πππ = ππ₯π(β||xπ β xπ ||2 /2π) measures the close- discrete labels and use supervised manifold regression (SMR) ness between two data samples xπ and xπ . The optimization to extract discriminant features with continuous labels. In the problem could be solved by generalized eigen-decomposition. second stage, SVM is utilized for prediction. Experimental results validate the effectiveness of our approaches. 2.2 Supervised Manifold Regression Supervised manifold regression (SMR) [4] aims to find the 1 INTRODUCTION latent subspace, where two data points should be close to each other if they possess similar interestingness levels. The Predicting the interestingness of multimedia content has long objective function of SMR is given as follows: been studied in the psychology community [1, 6, 7]. More π recently, we witness an explosion of multimedia content due βοΈ βWπ (xπ β xπ )β2 Β· πΌπππ (οΈ π )οΈ W = arg min + (1 β πΌ)πππ , to the accessibility of low cost multimedia creation tools, the W π,π=1 automatic prediction of media interestingness thus started (2) to attract attention in the computer science community be- π cause of its many useful applications to content providers, where πππ = |ππ β ππ | measures the similarity between the marketing, and managerial decision-makers. interestingness level of xπ and that of xπ . In this paper, we propose to use dimensionality reduction For each high-dimensional data point xπ , we can obtain to extract low-dimensional features for MediaEval 2017 Pre- its low-dimensional representation by yπ = Wπ xπ . Then we dicting Media Interestingness Task. Specifically, we propose a apply SVM to yπ for interestingness prediction. new algorithm called biased discriminant embedding (BDE) for discrete labels and utilize supervised manifold regression 3 EXPERIMENTS (SMR) [4] for continuous labels. For each image data sample, we construct a 1299-D feature vector by selecting features from the feature set provided 2 DIMENSIONALITY REDUCTION by the task organizers, including 128-D color histogram fea- 2.1 Biased Discriminant Embedding tures, 300-D denseSIFT features, 512-D gist features, 300-D hog2Γ2, and 59-D LBP features. For the video data, we treat Given the data matrix X = [x1 , x2 , ..., xπ ], where xπ β Rπ· each frame as a separate image, and calculate the average denotes the feature vector of the π-th image or video, and and standard deviation over all frames in this shot, and thus label vector l = [π1 , π2 , ..., ππ ], where ππ β {0, 1} denotes the we have a 2598-D feature set for each video. We normal- corresponding label of xπ , with 1 for interesting and 0 for ize each dimension of the training data to the range [0, 1] non-interesting, biased discriminant embedding (BDE) aims π₯βπ₯πππ by π₯Λ = π₯πππ₯ βπ₯πππ before dimensionality reduction, where to learn a π· Γ π transformation matrix W, which maximizes π₯πππ and π₯πππ₯ denote the minimum and maximum values the biased discriminant information in the reduced subspace. in the corresponding dimension, respectively. Details about The motivation for proposing the biased discrimination is that the dataset description can be found in [3]. in media interestingness prediction, we are probably more For Run 1 of image data, we use the normalized 1299-D interested in the interesting class than the non-interesting feature vector as the input of SVM. For Runs 2-5 of image Copyright held by the owner/author(s). data, we reduce the original data to the 23-D, 25-D, 26-D, MediaEvalβ17, 13-15 September 2017, Dublin, Ireland 27-D subspaces via BDE (for discrete labels) and SMR (for continuous labels), respectively. For Run 1 of video data, we MediaEvalβ17, 13-15 September 2017, Dublin, Ireland Y. Liu, Z. Gu, T. H. Ko (a) BDE on image data (b) SMR on image data (c) BDE on video data (d) SMR on video data Figure 1: Contribution of each individual feature in image/video discrete/continuous prediction tasks. Table 1: MAP@10 and MAP of the proposed model. We further analyze the contribution of each dimension Images Videos in the original feature space. The contribution βοΈ of the π-th MAP@10 MAP MAP@10 MAP dimension is defined as πΆπππ‘ππππ’π‘ππππ = π ππ |π€ππ |, where Run 1 0.1184 0.2812 0.0556 0.1813 ππ denotes the π-th eigenvalue, π€ππ denotes the (π, π)-th el- Run 2 0.132 0.2916 0.0468 0.1761 ement of W, and | Β· | denotes the absolute value operator. Run 3 0.1332 0.2898 0.0468 0.1761 From Figures 1(a) and 1(c), we can observe that color his- Run 4 0.1315 0.2884 0.0463 0.1742 togram and LBP features contribute more than the others Run 5 0.1369 0.291 0.0445 0.1746 while the GIST features contribute the least in the discrete prediction task. In continuous prediction (Figures 1(b) and 1(d)), the color histogram and GIST features contribute the most among the five feature sets. use the normalized 2598-D feature vector as the input of SVM. For Runs 2-5 of video data, we reduce the original data to the 23-D, 25-D, 26-D, 27-D subspaces via BDE (for 4 DISCUSSION AND OUTLOOK discrete labels) and SMR (for continuous labels), respectively. This paper introduces our model designed for media interest- To predict the binary interestingness labels, we use π-SVC [5] ingness prediction. For the future work, we aim to improve with an RBF kernel. We set π = 0.1 and πππππ = 100 (for the performance of video interestingness prediction by in- image data)/64 (for video data). To predict the continuous corporating the video temporal information. Moreover, as interestingness level, we use π-SVR [2] with an RBF kernel. the ground truth (labels) of interestingness are provided by We set πππ π‘ = 1, π = 0.01, and πΎ = 1/π·. Table 1 reports human beings, they generally vary with each individual and the evaluation results of the proposed model provided by are somewhat subjective. We are therefore particularly inter- the task organizers. For image data, the reduced features ested in refining the human labeled ground truth (especially perform better than the original ones, which indicates that for continuous case) via machine learning technologies. the subspaces learned by BDE and SMR capture important information in terms of media interestingness. For video data, the performance of reduced features is slightly worse ACKNOWLEDGMENTS than that of the original ones. The reason might be that This work was supported in part by the National Natural video data are more complex than image data so that such a Science Foundation of China under Grant 61503317, and in low-dimensional representation cannot fully capture the key part by the Faculty Research Grant of Hong Kong Baptist discriminant information embedded in the original space. University (HKBU) under Project FRG2/16-17/032. Predicting Media Interestingness Task MediaEvalβ17, 13-15 September 2017, Dublin, Ireland REFERENCES [1] Daniel E. Berlyne. 1960. Conflict, arousal and curiosity. McGraw-Hill. [2] Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2 (2011), 27:1β27:27. Issue 3. [3] C.-H. Demarty, M. Sjoberg, B. Ionescu, T.-T. Do, M. Gygli, and N. Q. K Duong. MediaEval 2017 Predicting Media Inter- estingness Task. In Proc. of the MediaEval 2017 Workshop. Dublin, Ireland, Sept. 13β15, 2017. [4] Y. Liu, Z. Gu, and Y.-M. Cheung. Supervised Manifold Learning for Media Interestingness Prediction. In Proc. of the MediaEval 2016 Workshop. Hilversum, Netherlands, Oct. 20β21, 2016. [5] Bernhard SchoΜlkopf, Alex J. Smola, Robert C. Williamson, and Peter L. Bartlett. 2000. New Support Vector Algorithms. Neural Comput. 12, 5 (2000), 1207β1245. [6] Paul J. Silvia. 2006. Exploring the psychology of interest. Oxford University Press. [7] Craig Smith and Phoebe Ellsworth. 1985. Patterns of cogni- tive appraisal in emotion. Journal of Personality and Social Psychology 48, 4 (1985), 813β838.