=Paper=
{{Paper
|id=Vol-1984/Mediaeval_2017_paper_13
|storemode=property
|title=MIC-TJU in MediaEval 2017 Emotional Impact of Movies Task
|pdfUrl=https://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_13.pdf
|volume=Vol-1984
|authors=Yun Yi,Hanli Wang,Jiangchuan Wei
|dblpUrl=https://dblp.org/rec/conf/mediaeval/YiWW17
}}
==MIC-TJU in MediaEval 2017 Emotional Impact of Movies Task==
MIC-TJU in MediaEval 2017 Emotional Impact of Movies Task Yun Yi1,2 , Hanli Wang2,* , Jiangchuan Wei2 1 Department of Mathematics and Computer Science, Gannan Normal University, Ganzhou 341000, China 2 Department of Computer Science and Technology, Tongji University, Shanghai 201804, China ABSTRACT 2.1 Feature Extraction To predict the emotional impact and fear of movies, we In this framework, we evaluate four features, including E- propose a framework which employs four audio-visual fea- moBase10 feature [5], Mel-Frequency Cepstral Coefficients tures. In particular, we utilize the features extracted by the (MFCC) feature [4], Motion Keypoint Trajectory (MKT) methods of motion keypoint trajectory and convolutional feature [15], and Convolutional Networks (ConvNets) fea- neural networks to depict the visual information, and extract ture [12, 14]. a global and a local audio features to describe the audio cues. The early fusion strategy is employed to combine the vectors 2.1.1 MFCC Feature. In affective content analysis, audio of these features. Then, the linear support vector regression modality is essential. MFCC is a famous local audio feature. and support vector machine are used to learn the affective The time window of MFCC is set to 32 ms, and set models. The experimental results show that the combination 50% overlap between two adjacent windows. In order to of these features obtains promising performances. promote the performance, we append delta and double-delta of 20-dimensional vectors into the original MFCC vector. Therefore, a 60-dimensional MFCC vector is generated. We 1 INTRODUCTION apply Principal Component Analysis (PCA) to reduce the dimension of the local feature, and use the Fisher Vector The 2017 emotional impact of movies task is a challenging (FV) model [10] to represent a whole audio file via a task, which contains two subtasks (i.e., valence-arousal signature vector. The cluster number of Gaussian Mixture prediction and fear prediction). A brief introduction about Model (GMM) is set to 512, and the signed square root this challenge has been given in [3]. In this paper, we mainly and L2 norm are utilized to normalize the vectors. In our introduce the system architecture and algorithms used in our experiments, we use the toolbox provided by [4] to calculate framework, and discuss the evaluation results. the vectors of MFCC. 2 FRAMEWORK 2.1.2 EmoBase10 Feature. To depict audio information, The key components of the proposed framework is shown in we extract the EmoBase10 feature [5, 11], which is a glob- Fig. 1, and the highlights of our framework are introduced al and high-level audio feature. As suggested by [5, 11], below. the default parameters are utilized to extract the 1,582- dimensional vector of EmoBase10. The 1,582-dimensional vector results from: (1) 21 functionals applied to 34 Low- sŝĚĞŽ Level Descriptors (LLD) and 34 corresponding delta coeffi- ŵŽĂƐĞϭϬ cients, (2) 19 functionals applied to the 4 pitch-based LLD and their 4 delta coefficient contours, (3) the number of pitch D& Z ^sD ĞƐ onsets and the total duration of the input [5, 11]. Then, the Žƌ Ƶů D