INTRODUCTION

Towards Learning Emotional Subspace

Tobey H. Ko

tobeyko@hku.hk 2

Zhonglei Gu

Tiantian He

tiantian.he@outlook.com 1

Yang Liu

csygliu@comp.hkbu.edu.hk 0 3 0 Department of Computer Science, Hong Kong Baptist University , HKSAR , China 1 Department of Computing, The Hong Kong Polytechnic University , HKSAR , China 2 Department of Industrial and Manufacturing Systems Engineering, University of Hong Kong , HKSAR , China 3 Institute of Research and Continuing Education, Hong Kong Baptist University , Shenzhen , China

2018

29 31

We introduce a model designed to predict emotional impact of movies through afective video content analysis. Specifically, our approach utilizes a two-stage learning framework, which ifrst conducts subspace learning using emotion preserving embedding (EPE) or biased discriminant embedding (BDE) to uncover the informative subspace from the original feature space according to the continuous or discrete emotional labels, respectively, and then carries out the prediction utilizing the support vector machine (SVM). Experimentation on a movie dataset validates the efectiveness of our learning framework.

INTRODUCTION

The Emotional Impact of Movies Task in MediaEval 2018 aimed at developing approaches which automatically and accurately predict the emotional impact of movie content, when the said movie content containing a certain stimulus, including either induced valence, induced arousal, or induced fear, is exposed to the general audience. Automatic video emotions discriminator capable of identifying movie content that is potentially inducing harmful emotions is expected to be developed through the successful implementation of this task. Approaches proposed for the task are trained and evaluated using the LIRIS-ACCEDE dataset (liris-accede.eclyon.fr) [ 1 ], which ofers a collection of 160 professionally made and amateur movies shared under the Creative Commons license, out of which 44 of them are selected and annotated with their respective fear, valence, and arousal labels. More details of the task requirements and the data description can be found in the task paper [ 4 ].

In this paper, a two-stage learning framework is introduced for automatic prediction of the emotional impact of movie content. In order to learn an accurate feature representation of the induced emotions in movie content, the learning framework first projects the original data to a learned lowdimensional feature subspace using dimensionality reduction techniques, then conducts prediction on the learned subspace using classification techniques. Specifically, the dimensionality reduction process was completed using emotion preserving embedding (EPE) to learn the subspace for induced arousal and induced valence, whereas the biased discriminant embedding algorithm (BDE) [ 5 ] was implemented to learn the subspace for induced fear in movie content. On the learned low-dimensional feature subspace, we employ the classical support vector regression and classification techniques, as they are eficient and efective, to predict the induced afective emotion of the movie content in both a continuous and discrete manner. 2 2.1

LEARNING EMOTIONAL SUBSPACE Emotion Preserving Embedding

EPE is proposed to learn the subspace for the continuous arousal and valence labels. Given the training set = {(x1, l1), (x2, l2), ..., (x, l)}, where x ∈ R ( = 1, · · · , ) is the feature vector of the -th movie and l = [, ] is the corresponding label vector containing the arousal label and the valence label . EPE aims to learn a × transformation matrix W to map x ( = 1, · · · , ) to a low-dimensional subspace, where the emotion information and manifold structure of the dataset can be well preserved. To achieve this goal, EPE optimizes the following objective function: W = arg min ∑︁ ‖W(x − x)‖2 · (︀ + (1− ) ︀) , (1)

W ,=1 where = (−‖ l − l‖2/2 2) measures the label similarity of x and x, = (−|| x − x||2/2 2) measures the closeness between x and x, and ∈ [ 0, 1 ] is the parameter balancing the emotion information and the manifold structure. Eq. (1) could be equivalently rewritten as follows: W = arg min (W XLX W),

W where X = [x1, x2, ..., x] ∈ R× is the data matrix, L = D − A is the × Laplacian matrix [ 2 ], and D is a diagonal matrix defined as = ∑︀=1 ( = 1, ..., ), where = + (1 − ) . Then the optimal W can be obtained by finding the eigenvectors corresponding to the smallest eigenvalues of the following eigen-decomposition problem: XLX w = w. (2) (3)

After obtaining W, we can obtain the low-dimensional representation of x by y = W x. 2.2

Biased Discriminant Embedding

BDE is a subspace learning algorithm we have proposed for the same task in the last year [ 5 ]. It aims to learn the (4) (5) subspace for the binary fear labels. In this scenario, each data sample x is associated with a binary label ∈ {0, 1}, with 1 for fear and 0 otherwise. BDE aims to maximize the biased discriminant information in the learned subspace. As mentioned in [ 5 ], the so-called biased discrimination is designed to emphasize the importance of the fear class. The objective function of BDE is given as follows:

W = arg max

W ︃(

W SW )︃ W SW , S = ∑︀,=1( × | − |)(x − x )(x − where S = ∑︀,=1( × × )(x − x )(x − x ) and x ) denote the biased within-class and between-class scatters, respectively.

The optimal W then can be obtained by finding the eigenvectors corresponding to the largest eigenvalues of the following generalized eigen-decomposition problem:

Sw = Sw. 3

RESULTS AND ANALYSIS

In this section, we evaluate the performance of our approach on the MediaEval 2018 Emotional Impact of Movies Task. There are 93337 and 26600 frames in the development set and the test set, respectively. We use 11 types of features to construct the original feature vector for each frame, i.e., 1583D Auto Color Correlogram (ACC), 256-D Color and Edge Directivity Descriptor (CEDD), 144-D Color Layout (CL), 33-D Edge Histogram (EH), 80-D Fuzzy Color and Texture Histogram (FCTH), 192-D

Gabor, 60-D Joint descriptor joining CEDD and FCTH in one histogram (JCD), 168D Scalable Color (SC), 256-D Tamura, 64-D Local Binary Patterns (LBP), and 18-D VGG16 fc6 layer (FC6). The total dimension of the original feature space is therefore 2854.

For valence/arousal prediction, we use EPE to learn the transformation matrix

W from the development set and use W to project the -dimensional development and test data ( = 2854) to the -dimensional subspace. We set = 4, 5, 9, 10 for Runs 1, 2, 3, 4, respectively. We set = 0.5 in our experiment to equally consider the emotion information and the manifold structure. Then we train the -SVR [ 6 ] on the -dimensional development set and apply the trained model for prediction on the -dimensional test set. For SVR, we use RBF kernel and the default settings recommended by libsvm [ 3 ]: = 0.5 and = 1/.

For fear prediction, we use BDE to learn W. Similar to the previous experiment, we set = 4, 5, 9, 10 for Runs 1, 2, 3, 4, respectively. Then we train the -SVC [ 6 ] on the -dimensional development set and apply the trained model for classification on the -dimensional test set. Similarly, We use RBF kernel and the default settings recommended by libsvm [ 3 ]: = 0.5 and = 1/. tion may embed in a very low-dimensional subspace, and thus However, results in Table 2 do not yield clear implication in the optimality of valence prediction with respect to the dimensionality of the learned subspace. The reason might be that we have not yet discovered the optimal dimension of the subspace for valence. Further investigation is needed if we intend to uncover the key to obtaining an optimal dimension for learned subspace. From Table 3, we can see that the performance of our method on fear prediction is unsatisfactory. A possible reason is the high imbalance between fear class and non-fear class, which makes the traditional learning mechanism ineficient, even though we have made some efort in modeling the class imbalance during subspace learning. 4

CONCLUSION

The paper describes our approach designed for predicting emotional impact of movies and validate the approach on the MediaEval 2018 Emotional Impact of Movies Task. The future work will be conducted from the following two aspects. First, we are interested in exploring how to build a joint learning mechanism for both arousal and valence, as these two emotional dimensions are related to each other. Second, we will investigate more efective ways to model the class imbalance in subspace learning and the subsequent classification, especially for the extremely imbalanced cases.

ACKNOWLEDGMENTS

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant 61503317, in part by the General Research Fund (GRF) from the Research Grant Council (RGC) of Hong Kong SAR under Project HKBU12202417, and in part by the SZSTI Grant with the Projct Code JCYJ20170307161544087.

[1]

Baveye , E. Dellandr´ea, C. Chamaret, and

Chen . 2015 . LIRIS-ACCEDE: A Video Database for Affective Content Analysis . IEEE Transactions on Affective Computing 6 , 1 (Jan 2015 ), 43 - 55 .

[2]

Belkin and

Niyogi . 2001 . Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering . In Advances in Neural Information Processing Systems 14 (NIPS) . 585 - 591 .

[3] Chih-Chung Chang and Chih-Jen Lin . 2011 . LIBSVM: A library for support vector machines . ACM Transactions on Intelligent Systems and Technology 2 ( 2011 ), 27 : 1 - 27 : 27 . Issue 3.

[4]

Dellandrea ,

Huigsloot ,

Chen ,

Baveye , and

Sjoberg . 2018 . The MediaEval 2018 Emotional Impact of Movies Task . In Mediaeval 2018 Workshop.

[5]

Liu ,

Gu , and

Ko . 2017 . HKBU at MediaEval 2017 Emotional Impact of Movies Task . In Mediaeval 2017 Workshop.

[6]

Bernhard

Sch ¨olkopf, Alex J. Smola , Robert

Williamson , and Peter L. Bartlett . 2000 . New Support Vector Algorithms . Neural Comput . 12 , 5 ( 2000 ), 1207 - 1245 .