INTRODUCTION

Emotional Impact of Movies Task

Yang Liu

csygliu@comp.hkbu.edu.hk 0 2

Zhonglei Gu

Tobey H. Ko

tobeyko@hku.hk 1 0 Department of Computer Science, Hong Kong Baptist University , HKSAR , China 1 Department of Industrial and Manufacturing Systems Engineering, University of Hong Kong , HKSAR , China 2 Institute of Research and Continuing Education, Hong Kong Baptist University , Shenzhen , China

2017

13 15

In this paper, we describe our model designed for automatic prediction of emotional impact of movies. Specifically, a two-stage learning framework is proposed. First, the dimensionality reduction techniques are employed to discover the key emotion information embedded in the original feature space. Specifically, we use a classical method principal component analysis (PCA) and a new algorithm biased discriminant embedding (BDE) to learn the subspace. After dimensionality reduction, SVM is utilized for prediction. Experimental results validate the efectiveness of our approaches. In this 2017 Emotional Impact of Movies Task, the participants are asked to predict the expected emotional impact of movie content. Specifically, to predict the response of a general audience to a given stimulus [4], either induced valence, induced arousal, or induced fear, from movie clip segments. The dataset used in this task is the LIRIS-ACCEDE dataset (liris-accede.ec-lyon.fr), which contains videos from a set of 160 professionally made and amateur movies, shared un-

INTRODUCTION

der Creative Commons licenses that allow redistribution [ 1 ]. More details of the task requirements as well as the dataset description can be found in [ 3 ].

In this paper, we propose a two-stage learning framework to predict the emotional impact of movies. First, we use dimensionality reduction to project the original data to the low-dimensional subspace, with the key emotional information being well preserved. Specifically, we use principal component analysis (PCA) to extract features for arousal and valence prediction and propose a new algorithm called biased discriminant embedding (BDE) to extract features for fear prediction. After dimensionality reduction, we use support vector regression and classification [ 2, 5 ] for continuous and discrete predictions, respectively. 2.1

OUR MODEL Feature Extraction

Principal Component Analysis. Given the data matrix X = [x1, x2, ..., x], where x ∈ R denotes the feature vector of the -th data point, principal component analysis Copyright held by the owner/author(s). maximizes the following objective: (PCA) aims to learn a × transformation matrix W, which

W W = arg max W (X − x¯1 )(X − x¯1 ) W︁) , ︁( (1) where x¯ = ∑︀=1 x/ and 1 denotes the × 1 vector with all entries being 1. The optimization problem in Eq. (1) could be solved by the standard eigen-decomposition.

2.1.2

Biased Discriminant Embedding. Given the data matrix X = [x1, x2, ..., x] and the label vector l = [1, 2, ..., ], where ∈ {0, 1} denotes the corresponding label of x, 1 for fear and 0 otherwise, biased discriminant embedding (BDE) aims to maximize the biased discriminant information in the reduced subspace. The motivation for proposing the biased discrimination is that in fear prediction, one might be more interested in the fear class than the non-fear one.

The objective function of BDE is given as follows: W = arg max

W ︃(

W SW )︃ (2) (3) (4) where S = ∑︀,=1( × × )(x − notes the biased within-class scatter, S = ∑︀ x)(x − x) de,=1( × | − |)(x − x)(x − scatter, and = (−|| x − x) denotes the biased between-class x||2/2 ) measures the closeness between two data samples x and x. The optimization problem could be solved by generalized eigen-decomposition. 2.2

Emotion Prediction

2.2.1

Support Vector Regression. For predicting the arousal and valence values, we use -SVR [ 5 ] to train two regressors separately. The dual problem that -SVR aims to solve is: min , * 1 2

( − * ) K( − * ) + (l()) ( − * ) .. e ( − * ) = 0, e ( + * ) ≤

0 ≤ , * ≤ /, = 1, ..., .

The prediction label of a new coming vector y is: =1 = ∑︁( * − )(y, y) + . 2.2.2

Support Vector Classification. To predict the binary fear labels, we use -SVC [ 5 ]. The dual problem that -SVC arousal mse arousal r valence mse valence r fear accuracy fear precision fear recall fear f1 Run 1 Run 2 Run 3 Run 4 Run 5

︁( ∑︁ (y, y) + )︁ . 3

RESULTS

In this section, we report our experimental settings and the evaluation results. For each 1-second segment, we construct a 1, 271-D feature set, including 256-D Auto Color Correlogram (acc) features, 144-D Color and Edge Directivity Descriptor (cedd) features, 33-D Color Layout (cl) features, 80-D Edge Histogram (eh) features, 192-D Fuzzy Color and Texture Histogram (fcth) features, 60-D Gabor (gabor) features, 168D Joint descriptor joining CEDD and FCTH in one histogram (jcd) features, 256-D Local Binary Patterns (lbp) features, 64-D Scalable Color (sc) features, and 18-D Tamura (tamura) features.

Subtask 1: For Run 1, we use the original 12, 710-D feature set (10 seconds) as the input. For Runs 2-5, we use PCA to reduce the original feature set to the 50-D, 80-D, 57-D, and 40-D subspaces, respectively. -SVR with RBF kernel is then used for prediction. We set = 0.5 and = 1/.

Subtask 2: For Run 1, we use the original 12, 710-D feature set (10 seconds) as the input. For Runs 2-5, we use BDE to reduce the original feature set to the 2-D subspaces. -SVC with RBF kernel is then used for prediction. We set = 0.1 and = 1/. by the task organizers. We can see that the performance in the low-dimensional subspace is generally worse than that in the original feature space. The reason might be that the dimension of the original feature space is quite high so that such a low-dimensional subspace cannot fully capture the discriminant information embedded in the original data.

In addition to the overall performance, we analyze the contribution of each dimension in the original feature space. The contribution of the -th dimension is defined as = ∑︀ | |, where denotes the -th eigenvalue, denotes the (, )-th element of W, and | · | denotes the absolute value operator. From Figure 1 we can see that the acc feature makes important contributions in both arousal/valance and fear prediction tasks, which indicates its importance in emotional discriminant analysis. 4

DISCUSSION AND OUTLOOK This paper introduces our model designed for predicting emotional impact of movies. To extract the compact representation of original feature set, dimensionality reduction is utilized. We then use SVM for prediction. For the future work, we are interested in analyzing the relation between arousal/valance and fear, which could help understanding the emotional impacts deeply. Moreover, as the ground truth (labels) of emotions are provided by human beings, they generally vary with each individual and are somewhat subjective. We are therefore particularly interested in refining the human labeled ground truth via machine learning technologies.

ACKNOWLEDGMENTS

This work was supported in part by the National Natural Science Foundation of China under Grant 61503317, and in part by the Faculty Research Grant of Hong Kong Baptist University (HKBU) under Project FRG2/16-17/032.

[1]

Baveye , E. Dellandr´ea, C. Chamaret, and

Chen . 2015 . LIRIS-ACCEDE: A Video Database for Affective Content Analysis . IEEE Transactions on Affective Computing 6 , 1 (Jan 2015 ), 43 - 55 .

[2] Chih-Chung Chang and Chih-Jen Lin . 2011 . LIBSVM: A library for support vector machines . ACM Transactions on Intelligent Systems and Technology 2 ( 2011 ), 27 : 1 - 27 : 27 . Issue 3.

[3]

Emmanuel

Dellandrea , Martijn Huigsloot, Liming Chen, Yoann Baveye, and

Mats

Sjoberg . 2017 . The MediaEval 2017 Emotional Impact of Movies Task . In Mediaeval 2017 Workshop.

[4]

Hanjalic . 2006 . Extracting moods from pictures and sounds: towards truly personalized TV . IEEE Signal Processing Magazine 23 , 2 (March 2006 ), 90 - 100 .

[5]

Bernhard

Sch ¨olkopf, Alex J. Smola , Robert

Williamson , and Peter L. Bartlett . 2000 . New Support Vector Algorithms . Neural Comput . 12 , 5 ( 2000 ), 1207 - 1245 .