Towards Learning Emotional Subspace Tobey H. Ko1 , Zhonglei Gu2 , Tiantian He3 , Yang Liu2,4 1 Department of Industrial and Manufacturing Systems Engineering, University of Hong Kong, HKSAR, China 2 Department of Computer Science, Hong Kong Baptist University, HKSAR, China 3 Department of Computing, The Hong Kong Polytechnic University, HKSAR, China 4 Institute of Research and Continuing Education, Hong Kong Baptist University, Shenzhen, China tobeyko@hku.hk,cszlgu@comp.hkbu.edu.hk,tiantian.he@outlook.com,csygliu@comp.hkbu.edu.hk ABSTRACT subspace for induced fear in movie content. On the learned We introduce a model designed to predict emotional impact of low-dimensional feature subspace, we employ the classical movies through affective video content analysis. Specifically, support vector regression and classification techniques, as our approach utilizes a two-stage learning framework, which they are efficient and effective, to predict the induced affec- first conducts subspace learning using emotion preserving tive emotion of the movie content in both a continuous and embedding (EPE) or biased discriminant embedding (BDE) discrete manner. to uncover the informative subspace from the original feature space according to the continuous or discrete emotional labels, 2 LEARNING EMOTIONAL respectively, and then carries out the prediction utilizing the SUBSPACE support vector machine (SVM). Experimentation on a movie 2.1 Emotion Preserving Embedding dataset validates the effectiveness of our learning framework. EPE is proposed to learn the subspace for the continuous arousal and valence labels. Given the training set 𝒳 = 1 INTRODUCTION {(x1 , l1 ), (x2 , l2 ), ..., (x𝑛 , l𝑛 )}, where x𝑖 ∈ R𝐷 (𝑖 = 1, Β· Β· Β· , 𝑛) The Emotional Impact of Movies Task in MediaEval 2018 is the feature vector of the 𝑖-th movie and l𝑖 = [π‘Žπ‘– , 𝑣𝑖 ]𝑇 is the aimed at developing approaches which automatically and corresponding label vector containing the arousal label π‘Žπ‘– and accurately predict the emotional impact of movie content, the valence label 𝑣𝑖 . EPE aims to learn a 𝐷 Γ— 𝑑 transforma- when the said movie content containing a certain stimulus, tion matrix W to map x𝑖 (𝑖 = 1, Β· Β· Β· , 𝑛) to a low-dimensional including either induced valence, induced arousal, or induced subspace, where the emotion information and manifold struc- fear, is exposed to the general audience. Automatic video ture of the dataset can be well preserved. To achieve this emotions discriminator capable of identifying movie content goal, EPE optimizes the following objective function: that is potentially inducing harmful emotions is expected 𝑛 βˆ‘οΈ β€–W𝑇(x𝑖 βˆ’x𝑗 )β€–2 Β· 𝛼𝑆𝑖𝑗 + (1βˆ’π›Ό)𝑁𝑖𝑗 , (1) (οΈ€ )οΈ€ to be developed through the successful implementation of W = arg min W this task. Approaches proposed for the task are trained and 𝑖,𝑗=1 evaluated using the LIRIS-ACCEDE dataset (liris-accede.ec- where 𝑆𝑖𝑗 = 𝑒π‘₯𝑝(βˆ’β€–l𝑖 βˆ’ l𝑗 β€–2 /2𝜎 2 ) measures the label simi- lyon.fr) [1], which offers a collection of 160 professionally made larity of x𝑖 and x𝑗 , 𝑁𝑖𝑗 = 𝑒π‘₯𝑝(βˆ’||x𝑖 βˆ’ x𝑗 ||2 /2𝜎 2 ) measures and amateur movies shared under the Creative Commons the closeness between x𝑖 and x𝑗 , and 𝛼 ∈ [0, 1] is the pa- license, out of which 44 of them are selected and annotated rameter balancing the emotion information and the manifold with their respective fear, valence, and arousal labels. More structure. Eq. (1) could be equivalently rewritten as follows: details of the task requirements and the data description can be found in the task paper [4]. W = arg min π‘‘π‘Ÿ(W𝑇 XLX𝑇 W), (2) W In this paper, a two-stage learning framework is introduced for automatic prediction of the emotional impact of movie where X = [x1 , x2 , ..., x𝑛 ] ∈ R𝐷×𝑛 is the data matrix, L = content. In order to learn an accurate feature representa- D βˆ’ A is the 𝑛 Γ— 𝑛 Laplacian βˆ‘οΈ€ matrix [2], and D is a diagonal tion of the induced emotions in movie content, the learning matrix defined as 𝐷𝑖𝑖 = 𝑛 𝑗=1 𝐴𝑖𝑗 (𝑖 = 1, ..., 𝑛), where 𝐴𝑖𝑗 = framework first projects the original data to a learned low- 𝛼𝑆𝑖𝑗 + (1 βˆ’ 𝛼)𝑁𝑖𝑗 . Then the optimal W can be obtained dimensional feature subspace using dimensionality reduction by finding the eigenvectors corresponding to the smallest techniques, then conducts prediction on the learned subspace eigenvalues of the following eigen-decomposition problem: using classification techniques. Specifically, the dimensionali- XLX𝑇 w = πœ†w. (3) ty reduction process was completed using emotion preserving embedding (EPE) to learn the subspace for induced arousal After obtaining W, we can obtain the low-dimensional and induced valence, whereas the biased discriminant em- representation of x𝑖 by y𝑖 = W𝑇 x𝑖 . bedding algorithm (BDE) [5] was implemented to learn the 2.2 Biased Discriminant Embedding Copyright held by the owner/author(s). MediaEval’18, 29-31 October 2018, Sophia Antipolis, France BDE is a subspace learning algorithm we have proposed for the same task in the last year [5]. It aims to learn the MediaEval’18, 29-31 October 2018, Sophia Antipolis, France T. H. Ko, Z. Gu, T. He, and Y. Liu subspace for the binary fear labels. In this scenario, each Table 1: Results of arousal prediction on the Media- data sample x𝑖 is associated with a binary label 𝑙𝑖 ∈ {0, 1}, Eval 2018 Emotional Impact of Movies Task. with 1 for fear and 0 otherwise. BDE aims to maximize the biased discriminant information in the learned subspace. Run 1 Run 2 Run 3 Run 4 As mentioned in [5], the so-called biased discrimination is MSE 0.1493 0.1574 0.1608 0.1623 designed to emphasize the importance of the fear class. The PCC 0.0828 0.0650 0.0487 0.0255 objective function of BDE is given as follows: (οΈƒ )οΈƒ W𝑇 S𝑏 W Table 2: Results of valence prediction on the Media- W = arg max π‘‘π‘Ÿ , (4) W W𝑇 S𝑀 W Eval 2018 Emotional Impact of Movies Task. where S𝑀 = 𝑛 𝑇 βˆ‘οΈ€ 𝑖,𝑗=1 (𝑁𝑖𝑗 Γ— 𝑙𝑖 Γ— 𝑙𝑗 )(x𝑖 βˆ’ x𝑗 )(x𝑖 βˆ’ x𝑗 ) and Run 1 Run 2 Run 3 Run 4 𝑏 βˆ‘οΈ€ 𝑛 𝑇 S = 𝑖,𝑗=1 (𝑁𝑖𝑗 Γ— |𝑙𝑖 βˆ’ 𝑙𝑗 |)(x𝑖 βˆ’ x𝑗 )(x𝑖 βˆ’ x𝑗 ) denote the MSE 0.1016 0.1089 0.1089 0.1076 biased within-class and between-class scatters, respectively. PCC 0.0499 0.0164 0.0872 0.1142 The optimal W then can be obtained by finding the eigen- vectors corresponding to the largest eigenvalues of the follow- ing generalized eigen-decomposition problem: Table 3: Results of fear prediction on the MediaEval S𝑏 w = πœ†S𝑀 w. (5) 2018 Emotional Impact of Movies Task. Run 1 Run 2 Run 3 Run 4 3 RESULTS AND ANALYSIS Intersection In this section, we evaluate the performance of our approach Union 0.1052 0.0612 0.0360 0.0196 on the MediaEval 2018 Emotional Impact of Movies Task. There are 93337 and 26600 frames in the development set and the test set, respectively. We use 11 types of features to increasing further the dimensionality of subspace may have construct the original feature vector for each frame, i.e., 1583- an adverse effect in terms of prediction of induced arousal. D Auto Color Correlogram (ACC), 256-D Color and Edge However, results in Table 2 do not yield clear implication Directivity Descriptor (CEDD), 144-D Color Layout (CL), in the optimality of valence prediction with respect to the 33-D Edge Histogram (EH), 80-D Fuzzy Color and Texture dimensionality of the learned subspace. The reason might be Histogram (FCTH), 192-D Gabor, 60-D Joint descriptor that we have not yet discovered the optimal dimension of the joining CEDD and FCTH in one histogram (JCD), 168- subspace for valence. Further investigation is needed if we D Scalable Color (SC), 256-D Tamura, 64-D Local Binary intend to uncover the key to obtaining an optimal dimension Patterns (LBP), and 18-D VGG16 fc6 layer (FC6). The total for learned subspace. From Table 3, we can see that the dimension of the original feature space is therefore 2854. performance of our method on fear prediction is unsatisfac- For valence/arousal prediction, we use EPE to learn the tory. A possible reason is the high imbalance between fear transformation matrix W from the development set and class and non-fear class, which makes the traditional learning use W to project the 𝐷-dimensional development and test mechanism inefficient, even though we have made some effort data (𝐷 = 2854) to the 𝑑-dimensional subspace. We set in modeling the class imbalance during subspace learning. 𝑑 = 4, 5, 9, 10 for Runs 1, 2, 3, 4, respectively. We set 𝛼 = 0.5 in our experiment to equally consider the emotion information 4 CONCLUSION and the manifold structure. Then we train the 𝜈-SVR [6] The paper describes our approach designed for predicting on the 𝑑-dimensional development set and apply the trained emotional impact of movies and validate the approach on model for prediction on the 𝑑-dimensional test set. For SVR, the MediaEval 2018 Emotional Impact of Movies Task. The we use RBF kernel and the default settings recommended by future work will be conducted from the following two as- libsvm [3]: 𝜈 = 0.5 and 𝛾 = 1/𝑑. pects. First, we are interested in exploring how to build a For fear prediction, we use BDE to learn W. Similar to joint learning mechanism for both arousal and valence, as the previous experiment, we set 𝑑 = 4, 5, 9, 10 for Runs 1, these two emotional dimensions are related to each other. 2, 3, 4, respectively. Then we train the 𝜈-SVC [6] on the Second, we will investigate more effective ways to model 𝑑-dimensional development set and apply the trained model the class imbalance in subspace learning and the subsequent for classification on the 𝑑-dimensional test set. Similarly, We classification, especially for the extremely imbalanced cases. use RBF kernel and the default settings recommended by libsvm [3]: 𝜈 = 0.5 and 𝛾 = 1/𝑑. ACKNOWLEDGMENTS Tables 1-3 present the results of our approach in which This work was supported in part by the National Natural Sci- several observation can be derived. First, in Table 1, Run 1 ence Foundation of China (NSFC) under Grant 61503317, in (𝑑 = 4) performs the best while Run 4 (𝑑 = 10) performs the part by the General Research Fund (GRF) from the Research worst. Moreover the performance drops when dimensionality Grant Council (RGC) of Hong Kong SAR under Project of subspace increases. This indicates that the arousal informa- HKBU12202417, and in part by the SZSTI Grant with the tion may embed in a very low-dimensional subspace, and thus Projct Code JCYJ20170307161544087. Towards Learning Emotional Subspace MediaEval’18, 29-31 October 2018, Sophia Antipolis, France REFERENCES [1] Y. Baveye, E. Dellandréa, C. Chamaret, and L. Chen. 2015. LIRIS-ACCEDE: A Video Database for Affective Content Analysis. IEEE Transactions on Affective Computing 6, 1 (Jan 2015), 43–55. [2] M. Belkin and P. Niyogi. 2001. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering. In Ad- vances in Neural Information Processing Systems 14 (NIPS). 585–591. [3] Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2 (2011), 27:1–27:27. Issue 3. [4] E. Dellandrea, M. Huigsloot, L. Chen, Y. Baveye, and M. Sjoberg. 2018. The MediaEval 2018 Emotional Impact of Movies Task. In Mediaeval 2018 Workshop. [5] Y. Liu, Z. Gu, and T. Ko. 2017. HKBU at MediaEval 2017 Emotional Impact of Movies Task. In Mediaeval 2017 Work- shop. [6] Bernhard Schölkopf, Alex J. Smola, Robert C. Williamson, and Peter L. Bartlett. 2000. New Support Vector Algorithms. Neural Comput. 12, 5 (2000), 1207–1245.