<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Learning Memorability Preserving Subspace for Predicting Media Memorability</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yang Liu</string-name>
          <email>csygliu@comp.hkbu.edu.hk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhonglei Gu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tobey H. Ko</string-name>
          <email>tobeyko@hku.hk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Hong Kong Baptist University</institution>
          ,
          <addr-line>Hong Kong SAR</addr-line>
          ,
          <country country="CN">P.R. China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Industrial and Manufacturing Systems Engineering, The University of Hong Kong</institution>
          ,
          <addr-line>Hong Kong SAR</addr-line>
          ,
          <country country="CN">P.R. China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>HKBU Institute of Research and Continuing Education</institution>
          ,
          <addr-line>Shenzhen</addr-line>
          ,
          <country country="CN">P.R. China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>29</fpage>
      <lpage>31</lpage>
      <abstract>
        <p>This paper describes our approach designed for the MediaEval 2018 Predicting Media Memorability Task. First, a subspace learning method called Memorability Preserving Embedding (MPE) is proposed to learn discriminative subspace from the original feature space according to the memorability scores. Then the Support Vector Regressor (SVR) is applied to the learned subspace for memorability prediction. The prediction performance demonstrates that SVR can achieve good performance even in a very low-dimensional subspace, which implies that the subspace learned by the MPE is capable of preserving important memorability information. Moreover, the results indicate that the short-term memorability is more predictable than the long-term memorability.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Predicting media memorability plays a key role in many
realworld applications such as media retrieval and
recommendation, and has attracted much attention recently [
        <xref ref-type="bibr" rid="ref1 ref10 ref11 ref12 ref14 ref4 ref6 ref9">1, 4, 6, 9–
12, 14</xref>
        ]. The MediaEval 2018 Predicting Media Memorability
Task aims to seek solutions to the problem of predicting
how memorable a video will be [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Specifically, given a set
of training video data (each data sample is associated with
its visual features and the corresponding memorability
score), the participants are asked to build a model using the
training data and utilize the trained model to predict the
memorability score of test data.
      </p>
      <p>
        Images and videos often have very high dimensionality,
which brings computational challenges to the analysis tasks.
To solve the memorability prediction task in an eficient way,
in this paper, we propose a supervised subspace learning
method called Memorability Preserving Embedding (MPE).
The motivation of designing such a subspace learning method
for the task rather than directly performing the prediction
is that we believe most of the discriminative information of
the high-dimensional media data is actually embedded in
a relatively low-dimensional subspace and discovering such
a subspace could enhance the performance of prediction.
Therefore, the proposed MPE aims to learn a
transformation matrix to project the high-dimensional training data
to a low-dimensional subspace, in which the memorability
information and manifold structure of the dataset are well
preserved. In the test stage, we use the learned
transformation matrix to map the test data to the subspace, and apply
a Support Vector Regressor (SVR) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] to the subspace for
ifnal memorability prediction.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>MEMORABILITY PRESERVING</title>
    </sec>
    <sec id="sec-3">
      <title>EMBEDDING</title>
      <p>
        Given the training set  = {(x1, 1), (x2, 2), ..., (x, )},
with x ∈ R ( = 1, · · · , ) being the visual feature vector
of the -th video and  ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] being the corresponding
memorability score, MPE aims to learn a ×  transformation
matrix W to map x ( = 1, · · · , ) to a low-dimensional
subspace, where the memorability information and manifold
structure of the dataset can be well preserved. To achieve
this goal, MPE optimizes the following objective function:

W = arg min ∑︁ ‖W(x − x)‖2 · (︀   + (1−  ) ︀) , (1)
      </p>
      <p>
        W ,=1
where  = (− ( − )2/2 2) measures the similarity
between the memorability score of x and that of x,  =
(−|| x − x||2/2 2) measures the closeness between x and
x, and  ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] is the parameter balancing the memorability
information and the manifold structure.
      </p>
      <p>Eq. (1) could be equivalently rewritten as follows:
W = arg min (W XLX W),</p>
      <p>
        W
where X = [x1, x2, ..., x] ∈ R×  is the data matrix, L =
D − A is the  ×  Laplacian matrix [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and D is a diagonal
matrix defined as  = ∑︀=1  ( = 1, ..., ), where  =
  + (1 −  ) . Then the optimal W can be obtained
by finding the eigenvectors corresponding to the smallest
eigenvalues of the following eigen-decomposition problem:
      </p>
      <p>XLX w =  w.</p>
      <p>After obtaining W, for each high-dimensional data sample
x in the development and test sets, we can obtain its
lowdimensional representation by y = W x. Then we apply
SVR to y for memorability prediction.
(2)
(3)</p>
    </sec>
    <sec id="sec-4">
      <title>RESULTS AND ANALYSIS</title>
      <p>
        In this section, we report our experimental results on the
MediaEval 2018 Predicting Media Memorability Task [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
Specifically, we participate in two subtasks: 1) short-term
memorability subtask and 2) long-term memorability subtask.
      </p>
      <p>
        We use both video specialized features and image features,
which are provided by the task, to construct the original
feature space. For the video features, we use the 101-D C3D
feature vector. For the image features, we use the 122-D
local binary pattern (LBP) feature vector and the 768-D
color histogram feature vector. We select these features as
they have demonstrated good performance in visual analysis
tasks [
        <xref ref-type="bibr" rid="ref15 ref5 ref8">5, 8, 15</xref>
        ]. For each video, the first, the median, and
the last frames are selected as the representatives of the
video, so the total dimension of the original feature space is
 = 101 + 3 × (122 + 768) = 2771.
      </p>
      <p>
        We use all 8000 video data samples in the development
set for training. Before subspace learning, we normalize the
values of diferent features to [
        <xref ref-type="bibr" rid="ref1">0 , 1</xref>
        ]. For the MPE method, we
set  = 0.5 and  = 1.
      </p>
      <p>
        ∙ For Run 1, we set the reduced dimension  = 4. Then
we learn the  ×  (i.e., 2771× 4 in this case)
transformation matrix W via MPE using the development
set, and utilize W to map both development and
test data onto the 4-D subspace. Finally, we train
the  -SVR [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] using the development set in the 4-D
subspace and employ the trained  -SVR model to
predict the memorability score of the test data in
the same subspace. We use the RBF kernel and set
 = 0.5 and  = 1/ [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
∙ For Run 2, we set the reduced dimension  = 5.
∙ For Run 3, we set the reduced dimension  = 9.
∙ For Run 4, we set the reduced dimension  = 10.
      </p>
      <p>The remaining procedure and the parameter setting
in Runs 2, 3, and 4 are the same as those in Run 1.</p>
      <p>Table 1 shows the performance (in terms of Spearman
Correlation and MSE) of our approach. From the results, we
have several observations. First, we observe that the results
(both Spearman and MSE) on the short-term subtask are
better than those on the long-term subtask, which indicates
that the short-term memorability is more predictable than
the long-term memorability. Besides, by comparing the MSE
of runs 1 and 2 ( = 4, 5) and that of runs 3 and 4 ( = 9, 10),</p>
      <sec id="sec-4-1">
        <title>Spearman MSE</title>
      </sec>
      <sec id="sec-4-2">
        <title>Long</title>
        <p>Short</p>
      </sec>
      <sec id="sec-4-3">
        <title>Long</title>
        <p>Short
 = 4
we notice that runs 1 and 2 are better than runs 3 and
4 in terms of Spearman, and are comparable in terms of
MSE. This fact may imply that most of the discriminative
information is embedded in a very low-dimensional subspace
and increasing more dimensions may not necessarily improve
the performance.</p>
        <p>To further validate the efectiveness of subspace learning,
we compare the performance of SVR on the learned subspace
and that on the original 2771-D space using the development
set. We use 5-fold cross validation and average the results.
The Spearman coeficient and MSE in Table 2 show that the
performance on the original space is slightly worse than that
on learned subspaces, supporting our assumption that the
original high-dimensional space may contain redundant or
even noisy information, and reducing the dimensionality with
supervised information could improve the subsequent learning
performance. However, the results in terms of Spearman
coeficient is far from satisfactory. The reason might be that
MPE is a linear mapping method, which is not suficient
to capture the complex discriminant information embedded
in the high-dimensional feature space. This motivates us
to consider extending our method to the nonlinear case to
improve the performance.
4</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>CONCLUSION</title>
      <p>This paper describes our approach designed for memorability
prediction. A subspace learning method, MPE, is proposed to
learn the subspace that preserves the memorability
information. After that, SVR is utilized for memorability prediction
in the learned subspace. The results on the MediaEval 2018
Predicting Media Memorability Task validate the
efectiveness of our approach. Our future work will focus on exploring
the physical meaning of the learned subspace, as this could
improve the interpretability of our approach. Moreover, we
plan to generalize our method to nonlinear scenario to
enhance its data representation ability.</p>
    </sec>
    <sec id="sec-6">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work was supported in part by the National Natural
Science Foundation of China (NSFC) under Grant 61503317
and in part by the General Research Fund (GRF) from the
Research Grant Council (RGC) of Hong Kong SAR under
Project HKBU12202417.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Baveye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cohendet</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Perreira Da Silva, and</article-title>
          <string-name>
            <given-names>P. Le</given-names>
            <surname>Callet</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Deep Learning for Image Memorability Prediction: The Emotional Bias</article-title>
          .
          <source>In Proceedings of the 24th ACM International Conference on Multimedia (MM '16)</source>
          . ACM, New York, NY, USA,
          <fpage>491</fpage>
          -
          <lpage>495</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.-C.</given-names>
            <surname>Chang</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.-J.</given-names>
            <surname>Lin</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>LIBSVM: A library for support vector machines</article-title>
          .
          <source>ACM Transactions on Intelligent Systems and Technology</source>
          <volume>2</volume>
          (
          <year>2011</year>
          ),
          <volume>27</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          :
          <fpage>27</fpage>
          . Issue 3.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Cohendet</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-H. Demarty</surname>
            ,
            <given-names>N. Q. K.</given-names>
          </string-name>
          <string-name>
            <surname>Duong</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Sjoberg</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Ionescu</surname>
          </string-name>
          , and T.-T. Do.
          <source>MediaEval</source>
          <year>2018</year>
          :
          <article-title>Predicting Media Memorability</article-title>
          .
          <source>In Proceedings of the MediaEval 2018 Workshop</source>
          . CEUR-WS, Sophia Antipolis, France,
          <fpage>29</fpage>
          -
          <lpage>31</lpage>
          October,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Cohendet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yadati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. Q. K.</given-names>
            <surname>Duong</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.-H.</given-names>
            <surname>Demarty</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Annotating, Understanding, and Predicting Long-term Video Memorability</article-title>
          .
          <source>In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval (ICMR '18)</source>
          . ACM, New York, NY, USA,
          <fpage>178</fpage>
          -
          <lpage>186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Ferman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Tekalp</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Mehrotra</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Robust color histogram descriptors for video segment retrieval and identification</article-title>
          .
          <source>IEEE Transactions on Image Processing 11</source>
          ,
          <issue>5</issue>
          (
          <year>2002</year>
          ),
          <fpage>497</fpage>
          -
          <lpage>508</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Shao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          , J. Han, and
          <string-name>
            <given-names>T.</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Learning Computational Models of Video Memorability from fMRI Brain Imaging</article-title>
          .
          <source>IEEE Transactions on Cybernetics 45</source>
          , 8 (Aug
          <year>2015</year>
          ),
          <fpage>1692</fpage>
          -
          <lpage>1703</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Niyogi</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Locality Preserving Projections</article-title>
          .
          <source>In Advances in Neural Information Processing Systems 16 (NIPS)</source>
          .
          <volume>153</volume>
          -
          <fpage>160</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ardabilian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Local Binary Patterns and Its Application to Facial Image Analysis: A Survey</article-title>
          .
          <source>IEEE Transactions on Systems, Man, and Cybernetics</source>
          , Part C (
          <article-title>Applications</article-title>
          and Reviews)
          <volume>41</volume>
          ,
          <issue>6</issue>
          (
          <year>2011</year>
          ),
          <fpage>765</fpage>
          -
          <lpage>781</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Isola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Parikh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Torralba</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Oliva</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Understanding the Intrinsic Memorability of Images</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          24,
          <string-name>
            <surname>J.</surname>
            Shawe-Taylor, R. S. Zemel,
            <given-names>P. L.</given-names>
          </string-name>
          <string-name>
            <surname>Bartlett</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Pereira</surname>
            , and
            <given-names>K. Q.</given-names>
          </string-name>
          <string-name>
            <surname>Weinberger</surname>
          </string-name>
          (Eds.). Curran Associates, Inc.,
          <fpage>2429</fpage>
          -
          <lpage>2437</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Isola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Parikh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Torralba</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Oliva</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>What Makes a Photograph Memorable? IEEE Trans</article-title>
          .
          <source>Pattern Anal. Mach. Intell</source>
          .
          <volume>36</volume>
          ,
          <issue>7</issue>
          (
          <year>July 2014</year>
          ),
          <fpage>1469</fpage>
          -
          <lpage>1482</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Khosla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Raju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Torralba</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Oliva</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Understanding and Predicting Image Memorability at a Large Scale</article-title>
          .
          <source>In 2015 IEEE International Conference on Computer Vision</source>
          (ICCV).
          <volume>2390</volume>
          -
          <fpage>2398</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Xiong</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Hu</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Predicting Image Memorability by Multi-view Adaptive Regression</article-title>
          .
          <source>In Proceedings of the 23rd ACM International Conference on Multimedia (MM '15)</source>
          . ACM, New York, NY, USA,
          <fpage>1147</fpage>
          -
          <lpage>1150</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>B.</given-names>
            <surname>Scholkopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Smola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Williamson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. L.</given-names>
            <surname>Bartlett</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>New Support Vector Algorithms</article-title>
          .
          <source>Neural Comput</source>
          .
          <volume>12</volume>
          ,
          <issue>5</issue>
          (
          <year>2000</year>
          ),
          <fpage>1207</fpage>
          -
          <lpage>1245</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shekhar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Singal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kedia</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Shetty</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Show and Recall: Learning What Makes Videos Memorable</article-title>
          .
          <source>In 2017 IEEE International Conference on Computer Vision Workshops (ICCVW)</source>
          .
          <volume>2730</volume>
          -
          <fpage>2739</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bourdev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Torresani</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Paluri</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Learning Spatiotemporal Features with 3D Convolutional Networks</article-title>
          .
          <source>In Proceedings of the 2015 IEEE International Conference on Computer Vision</source>
          (ICCV).
          <volume>4489</volume>
          -
          <fpage>4497</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>