<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Mining Emotional Features of Movies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yang Liu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhonglei Gu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yu Zhang</string-name>
          <email>zhangyu@cse.ust.hk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yan Liu</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AAOO Tech Limited</institution>
          ,
          <addr-line>Hong Kong SAR</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of CSE, Hong Kong University of Science and Technology</institution>
          ,
          <addr-line>Hong Kong SAR</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Computer Science, Hong Kong Baptist University</institution>
          ,
          <addr-line>Hong Kong SAR</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Department of Computing, The Hong Kong Polytechnic University</institution>
          ,
          <addr-line>Hong Kong SAR</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Institute of Research and Continuing Education, Hong Kong Baptist University</institution>
          ,
          <addr-line>Shenzhen</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <fpage>20</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>In this paper, we present the algorithm designed for mining emotional features of movies. The algorithm dubbed Arousal-Valence Discriminant Preserving Embedding (AVDPE) is proposed to extract the intrinsic features embedded in movies that are essentially di erentiating in both arousal and valence directions. After dimensionality reduction, we use the neural network and support vector regressor to make the nal prediction. Experimental results show that the extracted features can capture most of the discriminant information in movie emotions.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>A ective multimedia content analysis aims to
automatically recognize and analyze the emotions evoked by
multimedia data such as images, music, and videos. It has
a lot of real-world applications such as image search, movie
recommendation, and music classi cation [3, 7{9, 11{14].</p>
      <p>In this 2016 Emotional Impact of Movies Task, the
participants are required to design algorithms to predict the
arousal and valence values of the given movies automatically.
The dataset used in this task is the LIRIS-ACCEDE dataset
(liris-accede.ec-lyon.fr). It contains videos from a set of
160 professionally made and amateur movies, shared under
the Creative Commons licenses that allow redistribution [2].
More details of the task requirements as well as the dataset
description can be found in [5, 10].</p>
      <p>In this paper, we perform both global and continuous
emotion predictions via a proposed supervised
dimensionality reduction algorithm called Arousal-Valence Discriminant
Preserving Embedding (AV-DPE), which learns the compact
representations of the original data. After obtaining the
lowdimensional features, we use the neural network and support
vector regressor to predict the emotion values.</p>
    </sec>
    <sec id="sec-2">
      <title>PROPOSED METHOD</title>
      <p>In order to derive the intrinsic factors in movies that
convey or evoke emotions along the arousal and valence
dimensions, we propose a supervised feature extraction
algorithm dubbed Arousal-Valence Discriminant Preserving
Embedding (AV-DPE) to map the original high-dimensional
representations into a low-dimensional feature subspace, in
which the data with similar A-V values are close to each
other, while the data with di erent A-V values are faraway
from each other.</p>
      <p>
        Let x 2 RD be the high-dimensional feature vector
of the movie, and y = [y(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ); y(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )] be the corresponding
emotion label vector, where y(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) and y(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) denote the arousal
value and valence value, respectively. Given the training
set f(x1; y1); :::; (xn; yn)g, AV-DPE aims at learning a
transformation matrix U = [u1; :::; ud] 2 RD d which is able
to project the original D-dimensional data to an intrinsically
low-dimensional subspace Z = Rd.
      </p>
      <p>In order to describe the similarity between data samples,
we de ne the following adjacency scatter matrix:
n n
Sa = X X Aij (xi
i=1 j=1
xj)(xi
xj)T ;
where Aij denotes the similarity between the i-th and j-th
data points. In our formulation, we use the form of inner
product between the corresponding label vectors associated
with xi and xj. To further normalize the similarity values
into interval [0; 1], we de ne the normalized adjacency
matrix A^ where
^</p>
      <p>
        Aij = hy^i; y^ji = hyi=jjyijj; yj=jjyjjji:
The normalized adjacency scatter matrix is then de ned as:
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
n n
S^a = X X A^ij (xi
i=1 j=1
n n
S^d = X X D^ ij (xi
i=1 j=1
xj)(xi
      </p>
      <p>
        xj)T :
xj)(xi
xj)T ;
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
      </p>
      <p>Similarly, we de ne the normalized discriminant scatter
matrix to characterize the dissimilarity between data points:
where we simply de ne D^ ij = 1 A^ij .</p>
      <p>In order to maximize the distance between data points
with di erent labels while minimizing the distance between
data points with similar labels, the objective function of
AVDPE is formulated as follows:</p>
      <p>U = arg maxftr((UT S^aU)yUT S^dU)g;</p>
      <p>
        U
where tr( ) denotes the matrix trace operation and (S^a)y
denotes the Moore-Penrose pseudoinverse of S^a [6]. The
optimization problem in Eq. (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) can be solved by some
standard matrix decomposition techniques [6].
#1
#2
#3
#4
      </p>
    </sec>
    <sec id="sec-3">
      <title>EXPERIMENTS</title>
      <p>In this section, we report the experimental settings and
the evaluation results.</p>
      <p>Global emotion prediction: we construct a 34-D
feature set, including alpha, asymmetry env, colorfulness,
colorRawEnergy, colorStrength, compositionalBalance,
cutLength, depthOfField, entropyComplexity, atness,
globalActivity, hueCount, lightning, maxSaliencyCount,
medianLightness, minEnergy, nbFades, nbSceneCuts,
nbWhiteFrames, saliencyDisparity, spatialEdgeDistributionArea,
wtf max2stdratio f1-12g and zcr. Note that all above
features are provided by the task organizers.</p>
      <p>Run #1: We use the original 34-D features as the
input, and then use a function tting neural network
[1] with 100 nodes in the hidden layer for prediction.
The Levenberg-Marquardt backpropagation function
is used in training.</p>
      <p>Run #2: We use the original 34-D features as input,
and then use the -support vector regression ( -SVR)
for prediction. In -SVR, the RBF kernel is utilized
with the default setting from LIBSVM [4], i.e., cost =
1, = 0:5, and is then set to be the reciprocal of the
number of feature dimension.</p>
      <p>Run #3: We rst use the proposed AV-DPE to reduce
the original feature space to the 10-D subspace. Then
utilize the neural network for prediction. The setting
of neural network is the same as that in Run #1.
Run #4: We rst use the proposed AV-DPE to reduce
the original feature space to the 10-D subspace. Then
we use the -SVR for prediction. The setting of -SVR
is the same as that in Run #2.</p>
      <p>Continuous emotion prediction: we downsample the
size of each video to 64 36. As a result, we have a 6912-D
feature vector of RGB values for each frame.</p>
      <p>Run #1: We use the original 6912-D features as the
input, and then use the neural network for prediction.
The setting of neural network is the same as that in
Run #1 of global emotion prediction.</p>
      <p>Run #2: We use the original 6912-D features as the
input, and then use the -SVR for prediction. The
setting of -SVR is the same as that in Run #2 of
global emotion prediction.</p>
      <p>Run #3: We rst use the proposed AV-DPE to reduce
the original high-dimensional feature space to the
100D subspace. Then we use the neural network for
prediction. The setting of neural network is the same
as that in Run #1 of global emotion prediction.
Run #4: We rst use the proposed AV-DPE to reduce
the original high-dimensional feature space to the
100D subspace. Then we use the -SVR for prediction.
The setting of -SVR is the same as that in Run #2
of global emotion prediction.</p>
      <p>Table 1 and Table 2 report the results of our system. From
the tables we can see that after dimensionality reduction,
the performance of the reduced features (Run #3 and Run
#4) is generally worse than that of the original features
(Run #1 and Run #2), which indicates that the emotion
information in movies is relatively complex, and thus we may
not be able to fully describe it using just a few dimensions.
However, considering that the dimension of the reduced
features is much less than that of the original features, we
still can conclude that the learned subspace preserves rich
discriminant information of the original feature space.</p>
      <p>Moreover, from both tables we can observe that the
neural network performs more robust than SVR after
dimensionality reduction. The possible reason is that
besides the discriminant ability, the neural network with
the hidden layer has better representation ability of the
original data than SVR, which is also of great importance
in supervised learning tasks.
4.</p>
    </sec>
    <sec id="sec-4">
      <title>CONCLUSIONS</title>
      <p>In this working notes paper, we have proposed a
dimensionality reduction method to extract the emotional
features from movies. By minimizing the distance between
data points with similar emotion levels and maximizing
the distance between data points with di erent emotion
levels simultaneously, the learned subspace keeps most of the
discriminant information and gives relatively robust results
in both global and continuous emotion prediction tasks.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>The authors would like to thank the reviewer for the helpful
comments. This work was supported in part by the National
Natural Science Foundation of China under Grant 61503317.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>[1] http://www.mathworks.com/help/nnet/ref/ fitnet.html?requestedDomain=cn. mathworks.com.</mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Baveye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Dellandrea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chamaret</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>Liris-accede: A video database for a ective content analysis</article-title>
          .
          <source>IEEE Transactions on A ective Computing</source>
          ,
          <volume>6</volume>
          (
          <issue>1</issue>
          ):
          <volume>43</volume>
          {
          <fpage>55</fpage>
          ,
          <string-name>
            <surname>Jan</surname>
          </string-name>
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Canini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Benini</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Leonardi</surname>
          </string-name>
          .
          <article-title>A ective recommendation of movies based on selected connotative features</article-title>
          .
          <source>IEEE Transactions on Circuits and Systems for Video Technology</source>
          ,
          <volume>23</volume>
          (
          <issue>4</issue>
          ):
          <volume>636</volume>
          {
          <fpage>647</fpage>
          ,
          <string-name>
            <surname>April</surname>
          </string-name>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.-C.</given-names>
            <surname>Chang</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.-J.</given-names>
            <surname>Lin</surname>
          </string-name>
          .
          <article-title>LIBSVM: A library for support vector machines</article-title>
          .
          <source>ACM Transactions on Intelligent Systems and Technology</source>
          ,
          <volume>2</volume>
          :
          <issue>27</issue>
          :1{
          <fpage>27</fpage>
          :
          <fpage>27</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Dellandrea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Baveye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sjoberg</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Chamaret</surname>
          </string-name>
          .
          <article-title>The mediaeval 2016 emotional impact of movies task</article-title>
          .
          <source>In Mediaeval 2016 Workshop</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G. H.</given-names>
            <surname>Golub</surname>
          </string-name>
          and
          <string-name>
            <given-names>C. F. Van Loan. Matrix</given-names>
            <surname>Computations</surname>
          </string-name>
          (3rd Ed.). Johns Hopkins University Press, Baltimore,
          <string-name>
            <surname>MD</surname>
          </string-name>
          , USA,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Yu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K. C. C.</given-names>
            <surname>Chan</surname>
          </string-name>
          .
          <article-title>What strikes the strings of your heart? { multi-label dimensionality reduction for music emotion analysis via brain imaging</article-title>
          .
          <source>IEEE Transactions on Autonomous Mental Development</source>
          ,
          <volume>7</volume>
          (
          <issue>3</issue>
          ):
          <volume>176</volume>
          {
          <fpage>188</fpage>
          ,
          <string-name>
            <surname>Sept</surname>
          </string-name>
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Hua</surname>
          </string-name>
          .
          <article-title>What strikes the strings of your heart? { feature mining for music emotion analysis</article-title>
          .
          <source>IEEE Transactions on A ective Computing</source>
          ,
          <volume>6</volume>
          (
          <issue>3</issue>
          ):
          <volume>247</volume>
          {
          <fpage>260</fpage>
          ,
          <year>July 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Zimmermann</surname>
          </string-name>
          . Advisor:
          <article-title>Personalized video soundtrack recommendation by late fusion with heuristic rankings</article-title>
          .
          <source>In Proceedings of the 22nd ACM International Conference on Multimedia</source>
          , pages
          <volume>607</volume>
          {
          <fpage>616</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sjoberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Baveye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. L.</given-names>
            <surname>Quang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          , E. Dellandrea,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schedl</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-H. Demarty</surname>
            , and
            <given-names>L. Chen.</given-names>
          </string-name>
          <article-title>The mediaeval 2015 a ective impact of movies task</article-title>
          .
          <source>In Mediaeval 2015 Workshop</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>O.</given-names>
            <surname>Sourina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          .
          <article-title>Real-time eeg-based emotion recognition for music therapy</article-title>
          .
          <source>Journal on Multimodal User Interfaces</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ):
          <volume>27</volume>
          {
          <fpage>35</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cai</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Xie</surname>
          </string-name>
          .
          <article-title>Modeling emotion in uence in image social networks</article-title>
          .
          <source>IEEE Transactions on A ective Computing</source>
          ,
          <volume>6</volume>
          (
          <issue>3</issue>
          ):
          <volume>286</volume>
          {
          <fpage>297</fpage>
          ,
          <year>July 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.</given-names>
            <surname>Yadati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Katti</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Kankanhalli</surname>
          </string-name>
          . Cavva:
          <article-title>Computational a ective video-in-video advertising</article-title>
          .
          <source>IEEE Transactions on Multimedia</source>
          ,
          <volume>16</volume>
          (
          <issue>1</issue>
          ):
          <volume>15</volume>
          {
          <fpage>23</fpage>
          ,
          <string-name>
            <surname>Jan</surname>
          </string-name>
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Gao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Q.</given-names>
            <surname>Tian</surname>
          </string-name>
          .
          <article-title>A ective visualization and retrieval for music video</article-title>
          .
          <source>IEEE Transactions on Multimedia</source>
          ,
          <volume>12</volume>
          (
          <issue>6</issue>
          ):
          <volume>510</volume>
          {
          <fpage>522</fpage>
          ,
          <string-name>
            <surname>Oct</surname>
          </string-name>
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>