A Regression Approach to Movie Rating Prediction using Multimedia Content and Metadata

A Regression Approach to Movie Rating Prediction using Multimedia Content and Metadata HosseinARahmani srahmani@znu.ac.ir University of Zanjan YasharDeldjoo deldjooy@acm.org Politecnico di Bari MarkusSchedl markus.schedl@jku.at Johannes Kepler University A Regression Approach to Movie Rating Prediction using Multimedia Content and Metadata 853C5E55FF8E0BE5CC639D2A892C1ACA GROBID - A machine learning software for extracting information from scholarly documents

This paper presents the submission of the team MASlab-ZNU to the MMRecSys movie recommendation task, as part of MediaEval 2019. The task involved predicting average movie ratings, standard deviation of ratings, and the number of ratings by using audio and visual features extracted from trailers and the associated metadata. In the proposed work, we model the rating prediction problem as a regression problem and employ different learning models for the prediction task, including ridge regression (RR), support vector regression (SVR), shallow neural network (SNN) and deep neural network (DNN). The results of fairly large amount of experiments on various models and features indicate that combination of DNN+tag features produce the best results for prediction of avgRating and StdRating while for numRating (popularity) it is the combination of RR+tag that significantly outperforms the other competitors, with a large margin.

INTRODUCTION AND RELATED WORK

The global revenue obtained from the media and entertainment (M&E) market in 2017 was approximately 2 trillion US dollars. The four main verticals of the industry in the US include: film (40%), music (6%), book publishing (12%) and video games (8%) [1,2]. The movie industry not only has a large cultural and sociological impact on people but also occupies an important part of the business market in the M&E industry. Producing a new movie means that the company is betting on this movie's success. Being able to predict into the future whether a movie will be successful or not is therefore crucial and requires machine learning techniques. The results of the predictions will be used by producers and investors to decide whether or not to adopt the production of similar movies.

The paper at hand describes the solution by the team MASlab-ZNU for the 2019 Multimedia for Recommender System Task: MovieREC and NewsREEL at MediaEval [6], the movie recommendation subtask. The task involved using of audio, visual features extracted from trailers of the corresponding movies coming from MMTF-14K dataset [4] and metadata to predict three scores: avgRating (representing user appreciation and dis-appreciation of the content), stdRating (characterizing agreement and disagreement between user opinions/ratings), and numRating (reflecting popularity of movies). The audio and visual features have been used to solve various tasks in movie recommendation, see e.g., [3,5].

In the context of movie popularity prediction, Szabo et al. in [10] predict the future popularity of a video based on the number of previous views on YouTube using predictive models based on linear regression. Pinto et al. in [9] extend the work by Szabo et al. by proposing a multivariate linear model and sampling the number of views at regular time slots. Recently, Moghaddam et al. [8] address the problem of movie popularity prediction using visual features of movie trailers.

EXPERIMENTS

This section presents the experiments carried out towards addressing the prediction task. For performance evaluation, we randomly split the original train set into 2 non-overlapping subsets, where we consider 80% train data and 20% validation data. We refer to these datasets as trainSet and validSet throughout the paper. In a prepossessing step, we normalize all features in the dataset, using min-max normalization. We perform a hyper parameter search and report all the results under the best setting. We model the task in question as regression and use the following regression models:

• Linear model using Ridge Regression (RR): We use linear regression to serve as a simple yet standard approach to model the relation between dependent (prediction scores) and independent variables (features) in a linear fashion [7]. The model is given by y = θ T x where x and y represent feature vectors and scores, respectively, θ T contains the linear model coefficients estimated by RR minimize

θ T 1 2 ∥y − θ T x ∥ 2 2 + λ∥θ ∥ 2 2

where ℓ 2 regularization is applied to avoid overfitting of the coefficients.

• Support Vector Regression (SVR): SVR is the regression type of Support Vector Machine (SVM). SVR makes use of a nonlinear transformation function to map the input features to a high-dimensional features space given by y

= n i=1 (a i − a * i ).K(x i , x) +b where K = exp(−||x − x ′ || 2 )

is the Radial Basis Function (RBF) kernel, b is intercept, and a and a * are Lagrange multipliers.

RESULTS AND ANALYSIS

The regression results using the proposed approaches are presented in Table 1.

Predicting average ratings: As can be seen in Table 1, the performance of all audio and visual features, are closely similar to each other. These results with a close margin look similar to the performance of the genre descriptor, e.g. 0.1487 v.s. 0.1466. However, it can be noted that tag features is the best feature to predict the average ratings. These results show that user-generated tags assigned to movies contain semantic information that are well correlated with ratings given to movies by user. One can also observe that the DNN model is the best model to predict the average ratings, though it is only marginally better than the SNN model.

Predicting standard deviation of ratings: The results of predicting standard deviations of ratings show the audio-visual features and genre metadata have very similar results. Again we can see the best feature to predict the standard deviation of ratings is tag metadata using the DNN learning model. Average results indicate that the DNN model is the best learning model to predict the standard deviation of ratings.

Predicting number of ratings: As for predicting number of ratings, it can be seen that except for the tag feature with the best performance, the rest of audio-visual features and genre metadata yield very similar results. For the tag feature, we observe a substantial superiority in performance compared to all other features; it reduces the RMSE by about 55% (0.0232 v.s. 0.0515 for tag v.s. genre), compared to the second best feature (genre). More interestingly, we can see for predicting number of ratings, the best model is the simple RR learning model followed by SNN but not DNN.

CONCLUSION

This paper reports the method used by team MASlab-ZNU for the Multimedia for Recommender Systems task at MediaEval 2019 [6].

Results of experiments using different regression approaches are promising and show the efficacy of audio and visual content in comparison with genre metadata but overall it is the tag feature that provides the best prediction quality in all experimental cases.

• Shallow Neural Network (SNN): To model the relationship between features in a non-linear way, we also apply a neural network to predict the scores. Here, we consider a simple (shallow) neural network model. SNN has a hidden layer with 24 neurons and Rectified Linear Unit (ReLU ) as activation function. As for the output layer we use the Siдmoid activation function.• Deep Neural Network (DNN): This method is similar tothe SNN but we have more hidden layers to consider deeperrelations between features. In our deep model, we have 3hidden layers; the first layer has 128 neurons with ReLU asactivation function; the second layer has 64 neurons anduses a Siдmoid activation function; the third one has 32neurons and uses ReLU.

Table 1 :1Results of regression in terms of RMSE, calculated on the validation set. The first and second best result for each score are shown in bold and italic respectively. RR are the best predictors, in most cases generating the best performance for each feature while SVR is the worst. The final submitted runs are selected based on the ones performing the best on the validSet, which are highlighted in bold in TableMediaEval'19, 27-29 October 2019, Sophia Antipolis, France

. Regarding the comparison of learning models, we can see

Markets Report Media and Entertainment 2018. 2017. 2018. 2018-12-27 Media and Entertainment Industry Overview 2018. 2018. 2018-12-27 Audio-visual encoding of multimedia content for enhancing movie recommendations YasharDeldjoo MihaiGabriel Constantin HamidEghbal-Zadeh BogdanIonescu MarkusSchedl PaoloCremonesi Proc. of the 12th ACM Conference on Recommender Systems, RecSys 2018 of the 12th ACM Conference on Recommender Systems, RecSys 2018

Vancouver, BC, Canada

2018. October 2-7, 2018 MMTF-14K: a multifaceted movie trailer feature dataset for recommendation and retrieval YasharDeldjoo MihaiGabriel Constantin BogdanIonescu MarkusSchedl PaoloCremonesi Proceedings of the 9th ACM Multimedia Systems Conference, MMSys 2018 the 9th ACM Multimedia Systems Conference, MMSys 2018

Amsterdam, The Netherlands

2018. June 12-15, 2018 Movie genome: alleviating new item cold start in movie recommendation. User Model YasharDeldjoo MaurizioFerrari Dacrema MihaiGabriel Constantin HamidEghbal-Zadeh StefanoCereda MarkusSchedl BogdanIonescu PaoloCremonesi User-Adapt. Interact 29 2 2019. 2019 The 2019 Multimedia for Recommender System Task: MovieREC and NewsREEL at MediaEval YasharDeldjoo BenjaminKille MarkusSchedl AndreasLommatzsch JialieShen Working Notes Proceedings of the MediaEval 2019 Workshop

Sophia Antipolis, France

2019. 27-29 October 2019 Multi-feature fusion for predicting social media popularity JinnaLv WuLiu MengZhang HeGong BinWu HuadongMa Proceedings of the 25th ACM international conference on Multimedia the 25th ACM international conference on Multimedia ACM 2017 Predicting Movie Popularity and Ratings with Visual Features MehdiFarshad B Moghaddam RezaElahi ChristophHosseini MarkoTrattner Tkalcic 14th International Workshop On Semantic And Social Media Adaptation And Personalization 2019 Using early view patterns to predict the popularity of youtube videos HenriquePinto JussaraMAlmeida MarcosAGonçalves Proceedings of the sixth ACM international conference on Web search and data mining the sixth ACM international conference on Web search and data mining ACM 2013 Predicting the Popularity of Online Content GaborSzabo BernardoAHuberman Commun. ACM 53 8 2010. 2010