Towards Learning Emotional Subspace
                              Tobey H. Ko1 , Zhonglei Gu2 , Tiantian He3 , Yang Liu2,4
    1
        Department of Industrial and Manufacturing Systems Engineering, University of Hong Kong, HKSAR, China
                      2
                        Department of Computer Science, Hong Kong Baptist University, HKSAR, China
                    3
                      Department of Computing, The Hong Kong Polytechnic University, HKSAR, China
            4
              Institute of Research and Continuing Education, Hong Kong Baptist University, Shenzhen, China
              tobeyko@hku.hk,cszlgu@comp.hkbu.edu.hk,tiantian.he@outlook.com,csygliu@comp.hkbu.edu.hk

ABSTRACT                                                             subspace for induced fear in movie content. On the learned
We introduce a model designed to predict emotional impact of         low-dimensional feature subspace, we employ the classical
movies through affective video content analysis. Specifically,       support vector regression and classification techniques, as
our approach utilizes a two-stage learning framework, which          they are efficient and effective, to predict the induced affec-
first conducts subspace learning using emotion preserving            tive emotion of the movie content in both a continuous and
embedding (EPE) or biased discriminant embedding (BDE)               discrete manner.
to uncover the informative subspace from the original feature
space according to the continuous or discrete emotional labels,      2  LEARNING EMOTIONAL
respectively, and then carries out the prediction utilizing the         SUBSPACE
support vector machine (SVM). Experimentation on a movie             2.1 Emotion Preserving Embedding
dataset validates the effectiveness of our learning framework.
                                                                     EPE is proposed to learn the subspace for the continuous
                                                                     arousal and valence labels. Given the training set 𝒳 =
1       INTRODUCTION                                                 {(x1 , l1 ), (x2 , l2 ), ..., (x𝑛 , l𝑛 )}, where x𝑖 ∈ R𝐷 (𝑖 = 1, · · · , 𝑛)
The Emotional Impact of Movies Task in MediaEval 2018                is the feature vector of the 𝑖-th movie and l𝑖 = [𝑎𝑖 , 𝑣𝑖 ]𝑇 is the
aimed at developing approaches which automatically and               corresponding label vector containing the arousal label 𝑎𝑖 and
accurately predict the emotional impact of movie content,            the valence label 𝑣𝑖 . EPE aims to learn a 𝐷 × 𝑑 transforma-
when the said movie content containing a certain stimulus,           tion matrix W to map x𝑖 (𝑖 = 1, · · · , 𝑛) to a low-dimensional
including either induced valence, induced arousal, or induced        subspace, where the emotion information and manifold struc-
fear, is exposed to the general audience. Automatic video            ture of the dataset can be well preserved. To achieve this
emotions discriminator capable of identifying movie content          goal, EPE optimizes the following objective function:
that is potentially inducing harmful emotions is expected                               𝑛
                                                                                       ∑︁
                                                                                            ‖W𝑇(x𝑖 −x𝑗 )‖2 · 𝛼𝑆𝑖𝑗 + (1−𝛼)𝑁𝑖𝑗 , (1)
                                                                                                            (︀              )︀
to be developed through the successful implementation of              W = arg min
                                                                                 W
this task. Approaches proposed for the task are trained and                            𝑖,𝑗=1

evaluated using the LIRIS-ACCEDE dataset (liris-accede.ec-           where 𝑆𝑖𝑗 = 𝑒𝑥𝑝(−‖l𝑖 − l𝑗 ‖2 /2𝜎 2 ) measures the label simi-
lyon.fr) [1], which offers a collection of 160 professionally made   larity of x𝑖 and x𝑗 , 𝑁𝑖𝑗 = 𝑒𝑥𝑝(−||x𝑖 − x𝑗 ||2 /2𝜎 2 ) measures
and amateur movies shared under the Creative Commons                 the closeness between x𝑖 and x𝑗 , and 𝛼 ∈ [0, 1] is the pa-
license, out of which 44 of them are selected and annotated          rameter balancing the emotion information and the manifold
with their respective fear, valence, and arousal labels. More        structure. Eq. (1) could be equivalently rewritten as follows:
details of the task requirements and the data description can
be found in the task paper [4].                                                       W = arg min 𝑡𝑟(W𝑇 XLX𝑇 W),                           (2)
                                                                                                W
   In this paper, a two-stage learning framework is introduced
for automatic prediction of the emotional impact of movie            where X = [x1 , x2 , ..., x𝑛 ] ∈ R𝐷×𝑛 is the data matrix, L =
content. In order to learn an accurate feature representa-           D − A is the 𝑛 × 𝑛 Laplacian
                                                                                                ∑︀ matrix [2], and D is a diagonal
tion of the induced emotions in movie content, the learning          matrix defined as 𝐷𝑖𝑖 = 𝑛      𝑗=1 𝐴𝑖𝑗 (𝑖 = 1, ..., 𝑛), where 𝐴𝑖𝑗 =
framework first projects the original data to a learned low-         𝛼𝑆𝑖𝑗 + (1 − 𝛼)𝑁𝑖𝑗 . Then the optimal W can be obtained
dimensional feature subspace using dimensionality reduction          by finding the eigenvectors corresponding to the smallest
techniques, then conducts prediction on the learned subspace         eigenvalues of the following eigen-decomposition problem:
using classification techniques. Specifically, the dimensionali-
                                                                                                XLX𝑇 w = 𝜆w.                               (3)
ty reduction process was completed using emotion preserving
embedding (EPE) to learn the subspace for induced arousal              After obtaining W, we can obtain the low-dimensional
and induced valence, whereas the biased discriminant em-             representation of x𝑖 by y𝑖 = W𝑇 x𝑖 .
bedding algorithm (BDE) [5] was implemented to learn the
                                                                     2.2     Biased Discriminant Embedding
Copyright held by the owner/author(s).
MediaEval’18, 29-31 October 2018, Sophia Antipolis, France           BDE is a subspace learning algorithm we have proposed
                                                                     for the same task in the last year [5]. It aims to learn the
MediaEval’18, 29-31 October 2018, Sophia Antipolis, France                                     T. H. Ko, Z. Gu, T. He, and Y. Liu


subspace for the binary fear labels. In this scenario, each         Table 1: Results of arousal prediction on the Media-
data sample x𝑖 is associated with a binary label 𝑙𝑖 ∈ {0, 1},       Eval 2018 Emotional Impact of Movies Task.
with 1 for fear and 0 otherwise. BDE aims to maximize
the biased discriminant information in the learned subspace.                          Run 1    Run 2    Run 3    Run 4
As mentioned in [5], the so-called biased discrimination is                   MSE     0.1493   0.1574   0.1608   0.1623
designed to emphasize the importance of the fear class. The                   PCC     0.0828   0.0650   0.0487   0.0255
objective function of BDE is given as follows:
                                 (︃            )︃
                                    W𝑇 S𝑏 W                         Table 2: Results of valence prediction on the Media-
               W = arg max 𝑡𝑟                    ,            (4)
                         W          W𝑇 S𝑤 W                         Eval 2018 Emotional Impact of Movies Task.
where S𝑤 = 𝑛                                               𝑇
              ∑︀
                 𝑖,𝑗=1 (𝑁𝑖𝑗 × 𝑙𝑖 × 𝑙𝑗 )(x𝑖 − x𝑗 )(x𝑖 − x𝑗 )  and                      Run 1    Run 2    Run 3    Run 4
  𝑏   ∑︀ 𝑛                                         𝑇
S = 𝑖,𝑗=1 (𝑁𝑖𝑗 × |𝑙𝑖 − 𝑙𝑗 |)(x𝑖 − x𝑗 )(x𝑖 − x𝑗 ) denote the
                                                                              MSE     0.1016   0.1089   0.1089   0.1076
biased within-class and between-class scatters, respectively.
                                                                              PCC     0.0499   0.0164   0.0872   0.1142
    The optimal W then can be obtained by finding the eigen-
vectors corresponding to the largest eigenvalues of the follow-
ing generalized eigen-decomposition problem:                        Table 3: Results of fear prediction on the MediaEval
                        S𝑏 w = 𝜆S𝑤 w.                        (5)    2018 Emotional Impact of Movies Task.

                                                                                          Run 1    Run 2    Run 3    Run 4
3    RESULTS AND ANALYSIS
                                                                           Intersection
In this section, we evaluate the performance of our approach                  Union
                                                                                          0.1052   0.0612   0.0360   0.0196
on the MediaEval 2018 Emotional Impact of Movies Task.
There are 93337 and 26600 frames in the development set
and the test set, respectively. We use 11 types of features to      increasing further the dimensionality of subspace may have
construct the original feature vector for each frame, i.e., 1583-   an adverse effect in terms of prediction of induced arousal.
D Auto Color Correlogram (ACC), 256-D Color and Edge                However, results in Table 2 do not yield clear implication
Directivity Descriptor (CEDD), 144-D Color Layout (CL),             in the optimality of valence prediction with respect to the
33-D Edge Histogram (EH), 80-D Fuzzy Color and Texture              dimensionality of the learned subspace. The reason might be
Histogram (FCTH), 192-D Gabor, 60-D Joint descriptor                that we have not yet discovered the optimal dimension of the
joining CEDD and FCTH in one histogram (JCD), 168-                  subspace for valence. Further investigation is needed if we
D Scalable Color (SC), 256-D Tamura, 64-D Local Binary              intend to uncover the key to obtaining an optimal dimension
Patterns (LBP), and 18-D VGG16 fc6 layer (FC6). The total           for learned subspace. From Table 3, we can see that the
dimension of the original feature space is therefore 2854.          performance of our method on fear prediction is unsatisfac-
   For valence/arousal prediction, we use EPE to learn the          tory. A possible reason is the high imbalance between fear
transformation matrix W from the development set and                class and non-fear class, which makes the traditional learning
use W to project the 𝐷-dimensional development and test             mechanism inefficient, even though we have made some effort
data (𝐷 = 2854) to the 𝑑-dimensional subspace. We set               in modeling the class imbalance during subspace learning.
𝑑 = 4, 5, 9, 10 for Runs 1, 2, 3, 4, respectively. We set 𝛼 = 0.5
in our experiment to equally consider the emotion information
                                                                    4   CONCLUSION
and the manifold structure. Then we train the 𝜈-SVR [6]             The paper describes our approach designed for predicting
on the 𝑑-dimensional development set and apply the trained          emotional impact of movies and validate the approach on
model for prediction on the 𝑑-dimensional test set. For SVR,        the MediaEval 2018 Emotional Impact of Movies Task. The
we use RBF kernel and the default settings recommended by           future work will be conducted from the following two as-
libsvm [3]: 𝜈 = 0.5 and 𝛾 = 1/𝑑.                                    pects. First, we are interested in exploring how to build a
   For fear prediction, we use BDE to learn W. Similar to           joint learning mechanism for both arousal and valence, as
the previous experiment, we set 𝑑 = 4, 5, 9, 10 for Runs 1,         these two emotional dimensions are related to each other.
2, 3, 4, respectively. Then we train the 𝜈-SVC [6] on the           Second, we will investigate more effective ways to model
𝑑-dimensional development set and apply the trained model           the class imbalance in subspace learning and the subsequent
for classification on the 𝑑-dimensional test set. Similarly, We     classification, especially for the extremely imbalanced cases.
use RBF kernel and the default settings recommended by
libsvm [3]: 𝜈 = 0.5 and 𝛾 = 1/𝑑.                                    ACKNOWLEDGMENTS
   Tables 1-3 present the results of our approach in which          This work was supported in part by the National Natural Sci-
several observation can be derived. First, in Table 1, Run 1        ence Foundation of China (NSFC) under Grant 61503317, in
(𝑑 = 4) performs the best while Run 4 (𝑑 = 10) performs the         part by the General Research Fund (GRF) from the Research
worst. Moreover the performance drops when dimensionality           Grant Council (RGC) of Hong Kong SAR under Project
of subspace increases. This indicates that the arousal informa-     HKBU12202417, and in part by the SZSTI Grant with the
tion may embed in a very low-dimensional subspace, and thus         Projct Code JCYJ20170307161544087.
Towards Learning Emotional Subspace                                MediaEval’18, 29-31 October 2018, Sophia Antipolis, France


REFERENCES
 [1] Y. Baveye, E. Dellandréa, C. Chamaret, and L. Chen. 2015.
     LIRIS-ACCEDE: A Video Database for Affective Content
     Analysis. IEEE Transactions on Affective Computing 6, 1
     (Jan 2015), 43–55.
 [2] M. Belkin and P. Niyogi. 2001. Laplacian Eigenmaps and
     Spectral Techniques for Embedding and Clustering. In Ad-
     vances in Neural Information Processing Systems 14 (NIPS).
     585–591.
 [3] Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A
     library for support vector machines. ACM Transactions
     on Intelligent Systems and Technology 2 (2011), 27:1–27:27.
     Issue 3.
 [4] E. Dellandrea, M. Huigsloot, L. Chen, Y. Baveye, and M.
     Sjoberg. 2018. The MediaEval 2018 Emotional Impact of
     Movies Task. In Mediaeval 2018 Workshop.
 [5] Y. Liu, Z. Gu, and T. Ko. 2017. HKBU at MediaEval 2017
     Emotional Impact of Movies Task. In Mediaeval 2017 Work-
     shop.
 [6] Bernhard Schölkopf, Alex J. Smola, Robert C. Williamson,
     and Peter L. Bartlett. 2000. New Support Vector Algorithms.
     Neural Comput. 12, 5 (2000), 1207–1245.