<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Emotional Impact of Movies Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yang Liu</string-name>
          <email>csygliu@comp.hkbu.edu.hk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhonglei Gu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tobey H. Ko</string-name>
          <email>tobeyko@hku.hk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Hong Kong Baptist University</institution>
          ,
          <addr-line>HKSAR</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Industrial and Manufacturing Systems Engineering, University of Hong Kong</institution>
          ,
          <addr-line>HKSAR</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute of Research and Continuing Education, Hong Kong Baptist University</institution>
          ,
          <addr-line>Shenzhen</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>13</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>In this paper, we describe our model designed for automatic prediction of emotional impact of movies. Specifically, a two-stage learning framework is proposed. First, the dimensionality reduction techniques are employed to discover the key emotion information embedded in the original feature space. Specifically, we use a classical method principal component analysis (PCA) and a new algorithm biased discriminant embedding (BDE) to learn the subspace. After dimensionality reduction, SVM is utilized for prediction. Experimental results validate the efectiveness of our approaches. In this 2017 Emotional Impact of Movies Task, the participants are asked to predict the expected emotional impact of movie content. Specifically, to predict the response of a general audience to a given stimulus [4], either induced valence, induced arousal, or induced fear, from movie clip segments. The dataset used in this task is the LIRIS-ACCEDE dataset (liris-accede.ec-lyon.fr), which contains videos from a set of 160 professionally made and amateur movies, shared un-</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        der Creative Commons licenses that allow redistribution [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
More details of the task requirements as well as the dataset
description can be found in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        In this paper, we propose a two-stage learning framework
to predict the emotional impact of movies. First, we use
dimensionality reduction to project the original data to the
low-dimensional subspace, with the key emotional
information being well preserved. Specifically, we use principal
component analysis (PCA) to extract features for arousal and
valence prediction and propose a new algorithm called biased
discriminant embedding (BDE) to extract features for fear
prediction. After dimensionality reduction, we use support
vector regression and classification [
        <xref ref-type="bibr" rid="ref2 ref5">2, 5</xref>
        ] for continuous and
discrete predictions, respectively.
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>OUR MODEL</title>
    </sec>
    <sec id="sec-3">
      <title>Feature Extraction</title>
      <p>Principal Component Analysis. Given the data
matrix X = [x1, x2, ..., x], where x ∈ R denotes the feature
vector of the -th data point, principal component analysis
Copyright held by the owner/author(s).
maximizes the following objective:
(PCA) aims to learn a  ×  transformation matrix W, which</p>
      <p>W
W = arg max  W (X − x¯1 )(X − x¯1 ) W︁) ,
︁(
(1)
where x¯ = ∑︀=1 x/ and 1 denotes the  × 1 vector with
all entries being 1. The optimization problem in Eq. (1) could
be solved by the standard eigen-decomposition.</p>
      <p>2.1.2</p>
      <p>Biased Discriminant Embedding. Given the data
matrix X = [x1, x2, ..., x] and the label vector l = [1, 2, ..., ],
where  ∈ {0, 1} denotes the corresponding label of x, 1 for
fear and 0 otherwise, biased discriminant embedding (BDE)
aims to maximize the biased discriminant information in the
reduced subspace. The motivation for proposing the biased
discrimination is that in fear prediction, one might be more
interested in the fear class than the non-fear one.</p>
      <p>The objective function of BDE is given as follows:
W = arg max</p>
      <p>W
︃(</p>
      <p>W SW )︃
(2)
(3)
(4)
where S = ∑︀,=1( ×  × )(x −
notes the biased within-class scatter, S = ∑︀
x)(x − x)
de,=1( ×
| − |)(x − x)(x −
scatter, and  = (−|| x −
x)
 denotes the biased between-class
x||2/2 ) measures the
closeness between two data samples x and x. The optimization
problem could be solved by generalized eigen-decomposition.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Emotion Prediction</title>
      <p>2.2.1</p>
      <p>
        Support Vector Regression. For predicting the arousal
and valence values, we use  -SVR [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to train two regressors
separately. The dual problem that  -SVR aims to solve is:
min
, *
1
2
      </p>
      <p>( −  * ) K( −  * ) + (l()) ( −  * )
.. e ( −  * ) = 0, e ( +  * ) ≤</p>
      <p>0 ≤  ,  * ≤ /,  = 1, ..., .</p>
      <p>The prediction label of a new coming vector y is:

=1
 = ∑︁( * −  )(y, y) + .
2.2.2</p>
      <p>
        Support Vector Classification. To predict the binary
fear labels, we use  -SVC [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The dual problem that  -SVC
arousal mse arousal r valence mse
valence r fear accuracy fear precision fear recall fear f1
Run 1
Run 2
Run 3
Run 4
Run 5
      </p>
      <p>︁( ∑︁  (y, y) + )︁ .
3</p>
    </sec>
    <sec id="sec-5">
      <title>RESULTS</title>
      <p>In this section, we report our experimental settings and the
evaluation results. For each 1-second segment, we construct a
1, 271-D feature set, including 256-D Auto Color Correlogram
(acc) features, 144-D Color and Edge Directivity Descriptor
(cedd) features, 33-D Color Layout (cl) features, 80-D Edge
Histogram (eh) features, 192-D Fuzzy Color and Texture
Histogram (fcth) features, 60-D Gabor (gabor) features,
168D Joint descriptor joining CEDD and FCTH in one histogram
(jcd) features, 256-D Local Binary Patterns (lbp) features,
64-D Scalable Color (sc) features, and 18-D Tamura (tamura)
features.</p>
      <p>Subtask 1: For Run 1, we use the original 12, 710-D feature
set (10 seconds) as the input. For Runs 2-5, we use PCA to
reduce the original feature set to the 50-D, 80-D, 57-D, and
40-D subspaces, respectively.  -SVR with RBF kernel is then
used for prediction. We set  = 0.5 and  = 1/.</p>
      <p>Subtask 2: For Run 1, we use the original 12, 710-D feature
set (10 seconds) as the input. For Runs 2-5, we use BDE to
reduce the original feature set to the 2-D subspaces.  -SVC
with RBF kernel is then used for prediction. We set  = 0.1
and  = 1/.
by the task organizers. We can see that the performance in
the low-dimensional subspace is generally worse than that
in the original feature space. The reason might be that the
dimension of the original feature space is quite high so that
such a low-dimensional subspace cannot fully capture the
discriminant information embedded in the original data.</p>
      <p>In addition to the overall performance, we analyze the
contribution of each dimension in the original feature
space. The contribution of the -th dimension is defined as
 = ∑︀   | |, where   denotes the -th
eigenvalue,  denotes the (, )-th element of W, and | · |
denotes the absolute value operator. From Figure 1 we can see
that the acc feature makes important contributions in both
arousal/valance and fear prediction tasks, which indicates its
importance in emotional discriminant analysis.
4</p>
      <p>DISCUSSION AND OUTLOOK
This paper introduces our model designed for predicting
emotional impact of movies. To extract the compact
representation of original feature set, dimensionality reduction
is utilized. We then use SVM for prediction. For the future
work, we are interested in analyzing the relation between
arousal/valance and fear, which could help understanding
the emotional impacts deeply. Moreover, as the ground truth
(labels) of emotions are provided by human beings, they
generally vary with each individual and are somewhat subjective.
We are therefore particularly interested in refining the human
labeled ground truth via machine learning technologies.</p>
    </sec>
    <sec id="sec-6">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work was supported in part by the National Natural
Science Foundation of China under Grant 61503317, and in
part by the Faculty Research Grant of Hong Kong Baptist
University (HKBU) under Project FRG2/16-17/032.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Baveye</surname>
          </string-name>
          , E. Dellandr´ea, C. Chamaret, and
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>LIRIS-ACCEDE: A Video Database for Affective Content Analysis</article-title>
          .
          <source>IEEE Transactions on Affective Computing</source>
          <volume>6</volume>
          ,
          <issue>1</issue>
          (Jan
          <year>2015</year>
          ),
          <fpage>43</fpage>
          -
          <lpage>55</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Chih-Chung Chang</surname>
          </string-name>
          and
          <string-name>
            <surname>Chih-Jen Lin</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>LIBSVM: A library for support vector machines</article-title>
          .
          <source>ACM Transactions on Intelligent Systems and Technology</source>
          <volume>2</volume>
          (
          <year>2011</year>
          ),
          <volume>27</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          :
          <fpage>27</fpage>
          . Issue 3.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Emmanuel</given-names>
            <surname>Dellandrea</surname>
          </string-name>
          , Martijn Huigsloot, Liming Chen, Yoann Baveye, and
          <string-name>
            <given-names>Mats</given-names>
            <surname>Sjoberg</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>The MediaEval 2017 Emotional Impact of Movies Task</article-title>
          .
          <source>In Mediaeval 2017 Workshop.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanjalic</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Extracting moods from pictures and sounds: towards truly personalized TV</article-title>
          .
          <source>IEEE Signal Processing Magazine</source>
          <volume>23</volume>
          ,
          <issue>2</issue>
          (March
          <year>2006</year>
          ),
          <fpage>90</fpage>
          -
          <lpage>100</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Bernhard</given-names>
            <surname>Sch</surname>
          </string-name>
          ¨olkopf, Alex J.
          <string-name>
            <surname>Smola</surname>
          </string-name>
          ,
          <string-name>
            <surname>Robert</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Williamson</surname>
          </string-name>
          , and
          <string-name>
            <surname>Peter L. Bartlett</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>New Support Vector Algorithms</article-title>
          .
          <source>Neural Comput</source>
          .
          <volume>12</volume>
          ,
          <issue>5</issue>
          (
          <year>2000</year>
          ),
          <fpage>1207</fpage>
          -
          <lpage>1245</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>