=Paper=
{{Paper
|id=Vol-2283/MediaEval_18_paper_33
|storemode=property
|title=Learning Memorability Preserving Subspace for Predicting Media Memorability
|pdfUrl=https://ceur-ws.org/Vol-2283/MediaEval_18_paper_33.pdf
|volume=Vol-2283
|authors=Yang Liu,Zhonglei Gu,Tobey H. Ko
|dblpUrl=https://dblp.org/rec/conf/mediaeval/LiuGK18
}}
==Learning Memorability Preserving Subspace for Predicting Media Memorability==
Learning Memorability Preserving Subspace for Predicting Media Memorability Yang Liu1,2 , Zhonglei Gu1 , Tobey H. Ko3 1 Department of Computer Science, Hong Kong Baptist University, Hong Kong SAR, P.R. China 2 HKBU Institute of Research and Continuing Education, Shenzhen, P.R. China 3 Department of Industrial and Manufacturing Systems Engineering, The University of Hong Kong, Hong Kong SAR, P.R. China csygliu@comp.hkbu.edu.hk,cszlgu@comp.hkbu.edu.hk,tobeyko@hku.hk ABSTRACT to a low-dimensional subspace, in which the memorability This paper describes our approach designed for the MediaEval information and manifold structure of the dataset are well 2018 Predicting Media Memorability Task. First, a subspace preserved. In the test stage, we use the learned transforma- learning method called Memorability Preserving Embedding tion matrix to map the test data to the subspace, and apply (MPE) is proposed to learn discriminative subspace from the a Support Vector Regressor (SVR) [13] to the subspace for original feature space according to the memorability scores. final memorability prediction. Then the Support Vector Regressor (SVR) is applied to the learned subspace for memorability prediction. The predic- 2 MEMORABILITY PRESERVING tion performance demonstrates that SVR can achieve good EMBEDDING performance even in a very low-dimensional subspace, which Given the training set π³ = {(x1 , π1 ), (x2 , π2 ), ..., (xπ , ππ )}, implies that the subspace learned by the MPE is capable of with xπ β Rπ· (π = 1, Β· Β· Β· , π) being the visual feature vector preserving important memorability information. Moreover, of the π-th video and ππ β [0, 1] being the corresponding the results indicate that the short-term memorability is more memorability score, MPE aims to learn a π·Γπ transformation predictable than the long-term memorability. matrix W to map xπ (π = 1, Β· Β· Β· , π) to a low-dimensional subspace, where the memorability information and manifold 1 INTRODUCTION structure of the dataset can be well preserved. To achieve this goal, MPE optimizes the following objective function: Predicting media memorability plays a key role in many real- world applications such as media retrieval and recommenda- π βοΈ βWπ(xπ βxπ )β2 Β· πΌπππ + (1βπΌ)πππ , (1) (οΈ )οΈ tion, and has attracted much attention recently [1, 4, 6, 9β W = arg min 12, 14]. The MediaEval 2018 Predicting Media Memorability W π,π=1 Task aims to seek solutions to the problem of predicting how memorable a video will be [3]. Specifically, given a set where πππ = ππ₯π(β(ππ β ππ )2 /2π 2 ) measures the similarity of training video data (each data sample is associated with between the memorability score of xπ and that of xπ , πππ = its visual features and the corresponding memorability s- ππ₯π(β||xπ βxπ ||2 /2π 2 ) measures the closeness between xπ and core), the participants are asked to build a model using the xπ , and πΌ β [0, 1] is the parameter balancing the memorability training data and utilize the trained model to predict the information and the manifold structure. memorability score of test data. Eq. (1) could be equivalently rewritten as follows: Images and videos often have very high dimensionality, which brings computational challenges to the analysis tasks. W = arg min π‘π(Wπ XLXπ W), (2) W To solve the memorability prediction task in an efficient way, in this paper, we propose a supervised subspace learning where X = [x1 , x2 , ..., xπ ] β Rπ·Γπ is the data matrix, L = method called Memorability Preserving Embedding (MPE). D β A is the π Γ π Laplacian The motivation of designing such a subspace learning method βοΈ matrix [7], and D is a diagonal matrix defined as π·ππ = π π=1 π΄ππ (π = 1, ..., π), where π΄ππ = for the task rather than directly performing the prediction πΌπππ + (1 β πΌ)πππ . Then the optimal W can be obtained is that we believe most of the discriminative information of by finding the eigenvectors corresponding to the smallest the high-dimensional media data is actually embedded in eigenvalues of the following eigen-decomposition problem: a relatively low-dimensional subspace and discovering such a subspace could enhance the performance of prediction. XLXπ w = πw. (3) Therefore, the proposed MPE aims to learn a transforma- tion matrix to project the high-dimensional training data After obtaining W, for each high-dimensional data sample xπ in the development and test sets, we can obtain its low- Copyright held by the owner/author(s). MediaEvalβ18, 29-31 October 2018, Sophia Antipolis, France dimensional representation by yπ = Wπ xπ . Then we apply SVR to yπ for memorability prediction. MediaEvalβ18, 29-31 October 2018, Sophia Antipolis, France Yang Liu et al. Table 1: The performance (in terms of Spearman Table 2: The performance (in terms of Spearman Correlation and MSE) of our approach on the test Correlation and MSE) of our approach on the de- set of MediaEval 2018 Predicting Media Memorabil- velopment set of MediaEval 2018 Predicting Media ity Task. Memorability Task. Run1 Run2 Run3 Run4 π=4 π=5 π=9 π = 10 π· (π = 4) (π = 5) (π = 9) (π = 10) Long 0.1422 0.1514 0.1654 0.1675 0.1414 Spearman Long 0.0774 0.0962 0.0647 0.0634 Short 0.3047 0.3059 0.3065 0.3070 0.2946 Spearman Short 0.1332 0.1268 0.0656 0.0717 Long 0.0212 0.0212 0.0211 0.0210 0.0211 MSE Long 0.0214 0.0214 0.0213 0.0213 Short 0.0061 0.0061 0.0061 0.0061 0.0062 MSE Short 0.0082 0.0080 0.0078 0.0079 3 RESULTS AND ANALYSIS we notice that runs 1 and 2 are better than runs 3 and 4 in terms of Spearman, and are comparable in terms of In this section, we report our experimental results on the MSE. This fact may imply that most of the discriminative MediaEval 2018 Predicting Media Memorability Task [3]. information is embedded in a very low-dimensional subspace Specifically, we participate in two subtasks: 1) short-term and increasing more dimensions may not necessarily improve memorability subtask and 2) long-term memorability subtask. the performance. We use both video specialized features and image features, To further validate the effectiveness of subspace learning, which are provided by the task, to construct the original we compare the performance of SVR on the learned subspace feature space. For the video features, we use the 101-D C3D and that on the original 2771-D space using the development feature vector. For the image features, we use the 122-D set. We use 5-fold cross validation and average the results. local binary pattern (LBP) feature vector and the 768-D The Spearman coefficient and MSE in Table 2 show that the color histogram feature vector. We select these features as performance on the original space is slightly worse than that they have demonstrated good performance in visual analysis on learned subspaces, supporting our assumption that the tasks [5, 8, 15]. For each video, the first, the median, and original high-dimensional space may contain redundant or the last frames are selected as the representatives of the even noisy information, and reducing the dimensionality with video, so the total dimension of the original feature space is supervised information could improve the subsequent learning π· = 101 + 3 Γ (122 + 768) = 2771. performance. However, the results in terms of Spearman We use all 8000 video data samples in the development coefficient is far from satisfactory. The reason might be that set for training. Before subspace learning, we normalize the MPE is a linear mapping method, which is not sufficient values of different features to [0, 1]. For the MPE method, we to capture the complex discriminant information embedded set πΌ = 0.5 and π = 1. in the high-dimensional feature space. This motivates us β For Run 1, we set the reduced dimension π = 4. Then to consider extending our method to the nonlinear case to we learn the π·Γπ (i.e., 2771Γ4 in this case) transfor- improve the performance. mation matrix W via MPE using the development set, and utilize W to map both development and 4 CONCLUSION test data onto the 4-D subspace. Finally, we train the π-SVR [13] using the development set in the 4-D This paper describes our approach designed for memorability subspace and employ the trained π-SVR model to prediction. A subspace learning method, MPE, is proposed to predict the memorability score of the test data in learn the subspace that preserves the memorability informa- the same subspace. We use the RBF kernel and set tion. After that, SVR is utilized for memorability prediction π = 0.5 and πΎ = 1/π· [2]. in the learned subspace. The results on the MediaEval 2018 β For Run 2, we set the reduced dimension π = 5. Predicting Media Memorability Task validate the effective- β For Run 3, we set the reduced dimension π = 9. ness of our approach. Our future work will focus on exploring β For Run 4, we set the reduced dimension π = 10. the physical meaning of the learned subspace, as this could The remaining procedure and the parameter setting improve the interpretability of our approach. Moreover, we in Runs 2, 3, and 4 are the same as those in Run 1. plan to generalize our method to nonlinear scenario to en- hance its data representation ability. Table 1 shows the performance (in terms of Spearman Correlation and MSE) of our approach. From the results, we have several observations. First, we observe that the results ACKNOWLEDGMENTS (both Spearman and MSE) on the short-term subtask are This work was supported in part by the National Natural better than those on the long-term subtask, which indicates Science Foundation of China (NSFC) under Grant 61503317 that the short-term memorability is more predictable than and in part by the General Research Fund (GRF) from the the long-term memorability. Besides, by comparing the MSE Research Grant Council (RGC) of Hong Kong SAR under of runs 1 and 2 (π = 4, 5) and that of runs 3 and 4 (π = 9, 10), Project HKBU12202417. Learning Memorability Preserving Subspace MediaEvalβ18, 29-31 October 2018, Sophia Antipolis, France REFERENCES Conference on Computer Vision (ICCV). 4489β4497. [1] Y. Baveye, R. Cohendet, M. Perreira Da Silva, and P. Le Cal- let. 2016. Deep Learning for Image Memorability Prediction: The Emotional Bias. In Proceedings of the 24th ACM Inter- national Conference on Multimedia (MM β16). ACM, New York, NY, USA, 491β495. [2] C.-C. Chang and C.-J. Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2 (2011), 27:1β27:27. Issue 3. [3] R. Cohendet, C.-H. Demarty, N. Q. K. Duong, M. Sjoberg, B. Ionescu, and T.-T. Do. MediaEval 2018: Predicting Media Memorability. In Proceedings of the MediaEval 2018 Work- shop. CEUR-WS, Sophia Antipolis, France, 29β31 October, 2018. [4] R. Cohendet, K. Yadati, N. Q. K. Duong, and C.-H. Demarty. 2018. Annotating, Understanding, and Predicting Long-term Video Memorability. In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval (ICMR β18). ACM, New York, NY, USA, 178β186. [5] A. M. Ferman, A. M. Tekalp, and R. Mehrotra. 2002. Robust color histogram descriptors for video segment retrieval and identification. IEEE Transactions on Image Processing 11, 5 (2002), 497β508. [6] J. Han, C. Chen, L. Shao, X. Hu, J. Han, and T. Liu. 2015. Learning Computational Models of Video Memorability from fMRI Brain Imaging. IEEE Transactions on Cybernetics 45, 8 (Aug 2015), 1692β1703. [7] X. He and P. Niyogi. 2003. Locality Preserving Projections. In Advances in Neural Information Processing Systems 16 (NIPS). 153β160. [8] D. Huang, C. Shan, M. Ardabilian, Y. Wang, and L. Chen. 2011. Local Binary Patterns and Its Application to Facial Image Analysis: A Survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 41, 6 (2011), 765β781. [9] P. Isola, D. Parikh, A. Torralba, and A. Oliva. 2011. Under- standing the Intrinsic Memorability of Images. In Advances in Neural Information Processing Systems 24, J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2429β2437. [10] P. Isola, J. Xiao, D. Parikh, A. Torralba, and A. Oliva. 2014. What Makes a Photograph Memorable? IEEE Trans. Pattern Anal. Mach. Intell. 36, 7 (July 2014), 1469β1482. [11] A. Khosla, A. S. Raju, A. Torralba, and A. Oliva. 2015. Understanding and Predicting Image Memorability at a Large Scale. In 2015 IEEE International Conference on Computer Vision (ICCV). 2390β2398. [12] H. Peng, K. Li, B. Li, H. Ling, W. Xiong, and W. Hu. 2015. Predicting Image Memorability by Multi-view Adaptive Re- gression. In Proceedings of the 23rd ACM International Con- ference on Multimedia (MM β15). ACM, New York, NY, USA, 1147β1150. [13] B. Scholkopf, A. J. Smola, R. C. Williamson, and P. L. Bartlet- t. 2000. New Support Vector Algorithms. Neural Comput. 12, 5 (2000), 1207β1245. [14] S. Shekhar, D. Singal, H. Singh, M. Kedia, and A. Shetty. 2017. Show and Recall: Learning What Makes Videos Memorable. In 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). 2730β2739. [15] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. 2015. Learning Spatiotemporal Features with 3D Convolution- al Networks. In Proceedings of the 2015 IEEE International