<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Learning Emotional Subspace</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tobey H. Ko</string-name>
          <email>tobeyko@hku.hk</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhonglei Gu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tiantian He</string-name>
          <email>tiantian.he@outlook.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yang Liu</string-name>
          <email>csygliu@comp.hkbu.edu.hk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Hong Kong Baptist University</institution>
          ,
          <addr-line>HKSAR</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computing, The Hong Kong Polytechnic University</institution>
          ,
          <addr-line>HKSAR</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Industrial and Manufacturing Systems Engineering, University of Hong Kong</institution>
          ,
          <addr-line>HKSAR</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Institute of Research and Continuing Education, Hong Kong Baptist University</institution>
          ,
          <addr-line>Shenzhen</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>29</fpage>
      <lpage>31</lpage>
      <abstract>
        <p>We introduce a model designed to predict emotional impact of movies through afective video content analysis. Specifically, our approach utilizes a two-stage learning framework, which ifrst conducts subspace learning using emotion preserving embedding (EPE) or biased discriminant embedding (BDE) to uncover the informative subspace from the original feature space according to the continuous or discrete emotional labels, respectively, and then carries out the prediction utilizing the support vector machine (SVM). Experimentation on a movie dataset validates the efectiveness of our learning framework.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        The Emotional Impact of Movies Task in MediaEval 2018
aimed at developing approaches which automatically and
accurately predict the emotional impact of movie content,
when the said movie content containing a certain stimulus,
including either induced valence, induced arousal, or induced
fear, is exposed to the general audience. Automatic video
emotions discriminator capable of identifying movie content
that is potentially inducing harmful emotions is expected
to be developed through the successful implementation of
this task. Approaches proposed for the task are trained and
evaluated using the LIRIS-ACCEDE dataset
(liris-accede.eclyon.fr) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which ofers a collection of 160 professionally made
and amateur movies shared under the Creative Commons
license, out of which 44 of them are selected and annotated
with their respective fear, valence, and arousal labels. More
details of the task requirements and the data description can
be found in the task paper [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        In this paper, a two-stage learning framework is introduced
for automatic prediction of the emotional impact of movie
content. In order to learn an accurate feature
representation of the induced emotions in movie content, the learning
framework first projects the original data to a learned
lowdimensional feature subspace using dimensionality reduction
techniques, then conducts prediction on the learned subspace
using classification techniques. Specifically, the
dimensionality reduction process was completed using emotion preserving
embedding (EPE) to learn the subspace for induced arousal
and induced valence, whereas the biased discriminant
embedding algorithm (BDE) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] was implemented to learn the
subspace for induced fear in movie content. On the learned
low-dimensional feature subspace, we employ the classical
support vector regression and classification techniques, as
they are eficient and efective, to predict the induced
afective emotion of the movie content in both a continuous and
discrete manner.
2
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>LEARNING EMOTIONAL SUBSPACE Emotion Preserving Embedding</title>
      <p>EPE is proposed to learn the subspace for the continuous
arousal and valence labels. Given the training set  =
{(x1, l1), (x2, l2), ..., (x, l)}, where x ∈ R ( = 1, · · · , )
is the feature vector of the -th movie and l = [, ] is the
corresponding label vector containing the arousal label  and
the valence label . EPE aims to learn a  × 
transformation matrix W to map x ( = 1, · · · , ) to a low-dimensional
subspace, where the emotion information and manifold
structure of the dataset can be well preserved. To achieve this
goal, EPE optimizes the following objective function:

W = arg min ∑︁ ‖W(x − x)‖2 · (︀   + (1−  ) ︀) , (1)</p>
      <p>
        W ,=1
where  = (−‖ l − l‖2/2 2) measures the label
similarity of x and x,  = (−|| x − x||2/2 2) measures
the closeness between x and x, and  ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] is the
parameter balancing the emotion information and the manifold
structure. Eq. (1) could be equivalently rewritten as follows:
W = arg min (W XLX W),
      </p>
      <p>
        W
where X = [x1, x2, ..., x] ∈ R×  is the data matrix, L =
D − A is the  ×  Laplacian matrix [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and D is a diagonal
matrix defined as  = ∑︀=1  ( = 1, ..., ), where  =
  + (1 −  ) . Then the optimal W can be obtained
by finding the eigenvectors corresponding to the smallest
eigenvalues of the following eigen-decomposition problem:
XLX w =  w.
(2)
(3)
      </p>
      <p>After obtaining W, we can obtain the low-dimensional
representation of x by y = W x.
2.2</p>
    </sec>
    <sec id="sec-3">
      <title>Biased Discriminant Embedding</title>
      <p>
        BDE is a subspace learning algorithm we have proposed
for the same task in the last year [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. It aims to learn the
(4)
(5)
subspace for the binary fear labels. In this scenario, each
data sample x is associated with a binary label  ∈ {0, 1},
with 1 for fear and 0 otherwise. BDE aims to maximize
the biased discriminant information in the learned subspace.
As mentioned in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the so-called biased discrimination is
designed to emphasize the importance of the fear class. The
objective function of BDE is given as follows:
      </p>
      <p>W = arg max</p>
      <p>W
︃(</p>
      <p>W SW )︃
W SW
,
S = ∑︀,=1( × |  −
 |)(x −
x )(x −
where S = ∑︀,=1( ×  ×  )(x − x )(x − x ) and
x )
 denote the
biased within-class and between-class scatters, respectively.</p>
      <p>The optimal W then can be obtained by finding the
eigenvectors corresponding to the largest eigenvalues of the
following generalized eigen-decomposition problem:</p>
      <p>Sw =  Sw.
3</p>
    </sec>
    <sec id="sec-4">
      <title>RESULTS AND ANALYSIS</title>
      <p>In this section, we evaluate the performance of our approach
on the MediaEval 2018 Emotional Impact of Movies Task.
There are 93337 and 26600 frames in the development set
and the test set, respectively. We use 11 types of features to
construct the original feature vector for each frame, i.e.,
1583D Auto Color Correlogram (ACC), 256-D Color and Edge
Directivity Descriptor (CEDD), 144-D Color Layout (CL),
33-D Edge Histogram (EH), 80-D Fuzzy Color and Texture
Histogram (FCTH), 192-D</p>
      <p>Gabor, 60-D Joint descriptor
joining CEDD and FCTH in one histogram (JCD),
168D Scalable Color (SC), 256-D Tamura, 64-D Local Binary
Patterns (LBP), and 18-D VGG16 fc6 layer (FC6). The total
dimension of the original feature space is therefore 2854.</p>
      <p>For valence/arousal prediction, we use EPE to learn the
transformation matrix</p>
      <p>
        W from the development set and
use W to project the -dimensional development and test
data ( = 2854) to the -dimensional subspace. We set
 = 4, 5, 9, 10 for Runs 1, 2, 3, 4, respectively. We set 
= 0.5
in our experiment to equally consider the emotion information
and the manifold structure. Then we train the  -SVR [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
on the -dimensional development set and apply the trained
model for prediction on the -dimensional test set. For SVR,
we use RBF kernel and the default settings recommended by
libsvm [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]:  = 0.5 and  = 1/.
      </p>
      <p>
        For fear prediction, we use BDE to learn W. Similar to
the previous experiment, we set  = 4, 5, 9, 10 for Runs 1,
2, 3, 4, respectively. Then we train the  -SVC [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] on the
-dimensional development set and apply the trained model
for classification on the -dimensional test set. Similarly, We
use RBF kernel and the default settings recommended by
libsvm [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]:  = 0.5 and  = 1/.
tion may embed in a very low-dimensional subspace, and thus
However, results in Table 2 do not yield clear implication
in the optimality of valence prediction with respect to the
dimensionality of the learned subspace. The reason might be
that we have not yet discovered the optimal dimension of the
subspace for valence. Further investigation is needed if we
intend to uncover the key to obtaining an optimal dimension
for learned subspace. From Table 3, we can see that the
performance of our method on fear prediction is
unsatisfactory. A possible reason is the high imbalance between fear
class and non-fear class, which makes the traditional learning
mechanism ineficient, even though we have made some efort
in modeling the class imbalance during subspace learning.
4
      </p>
    </sec>
    <sec id="sec-5">
      <title>CONCLUSION</title>
      <p>The paper describes our approach designed for predicting
emotional impact of movies and validate the approach on
the MediaEval 2018 Emotional Impact of Movies Task. The
future work will be conducted from the following two
aspects. First, we are interested in exploring how to build a
joint learning mechanism for both arousal and valence, as
these two emotional dimensions are related to each other.
Second, we will investigate more efective ways to model
the class imbalance in subspace learning and the subsequent
classification, especially for the extremely imbalanced cases.</p>
    </sec>
    <sec id="sec-6">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work was supported in part by the National Natural
Science Foundation of China (NSFC) under Grant 61503317, in
part by the General Research Fund (GRF) from the Research
Grant Council (RGC) of Hong Kong SAR under Project
HKBU12202417, and in part by the SZSTI Grant with the
Projct Code JCYJ20170307161544087.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Baveye</surname>
          </string-name>
          , E. Dellandr´ea, C. Chamaret, and
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>LIRIS-ACCEDE: A Video Database for Affective Content Analysis</article-title>
          .
          <source>IEEE Transactions on Affective Computing</source>
          <volume>6</volume>
          ,
          <issue>1</issue>
          (Jan
          <year>2015</year>
          ),
          <fpage>43</fpage>
          -
          <lpage>55</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Belkin</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Niyogi</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering</article-title>
          .
          <source>In Advances in Neural Information Processing Systems 14 (NIPS)</source>
          .
          <volume>585</volume>
          -
          <fpage>591</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Chih-Chung Chang</surname>
          </string-name>
          and
          <string-name>
            <surname>Chih-Jen Lin</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>LIBSVM: A library for support vector machines</article-title>
          .
          <source>ACM Transactions on Intelligent Systems and Technology</source>
          <volume>2</volume>
          (
          <year>2011</year>
          ),
          <volume>27</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          :
          <fpage>27</fpage>
          . Issue 3.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Dellandrea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Huigsloot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Baveye</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Sjoberg</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>The MediaEval 2018 Emotional Impact of Movies Task</article-title>
          .
          <source>In Mediaeval 2018 Workshop.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Gu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Ko</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>HKBU at MediaEval 2017 Emotional Impact of Movies Task</article-title>
          .
          <source>In Mediaeval 2017 Workshop.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Bernhard</given-names>
            <surname>Sch</surname>
          </string-name>
          ¨olkopf, Alex J.
          <string-name>
            <surname>Smola</surname>
          </string-name>
          ,
          <string-name>
            <surname>Robert</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Williamson</surname>
          </string-name>
          , and
          <string-name>
            <surname>Peter L. Bartlett</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>New Support Vector Algorithms</article-title>
          .
          <source>Neural Comput</source>
          .
          <volume>12</volume>
          ,
          <issue>5</issue>
          (
          <year>2000</year>
          ),
          <fpage>1207</fpage>
          -
          <lpage>1245</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>