A Regression Approach to Movie Rating Prediction using Multimedia Content and Metadata Hossein A. Rahmani Yashar Deldjoo Markus Schedl University of Zanjan Politecnico di Bari Johannes Kepler University srahmani@znu.ac.ir deldjooy@acm.org markus.schedl@jku.at ABSTRACT views at regular time slots. Recently, Moghaddam et al. [8] address This paper presents the submission of the team MASlab-ZNU to the problem of movie popularity prediction using visual features of the MMRecSys movie recommendation task, as part of MediaEval movie trailers. 2019. The task involved predicting average movie ratings, standard 2 EXPERIMENTS deviation of ratings, and the number of ratings by using audio and This section presents the experiments carried out towards address- visual features extracted from trailers and the associated metadata. ing the prediction task. For performance evaluation, we randomly In the proposed work, we model the rating prediction problem as a split the original train set into 2 non-overlapping subsets, where regression problem and employ different learning models for the we consider 80% train data and 20% validation data. We refer to prediction task, including ridge regression (RR), support vector re- these datasets as trainSet and validSet throughout the paper. In a gression (SVR), shallow neural network (SNN) and deep neural net- prepossessing step, we normalize all features in the dataset, using work (DNN). The results of fairly large amount of experiments on min-max normalization. We perform a hyper parameter search and various models and features indicate that combination of DNN+tag report all the results under the best setting. We model the task in features produce the best results for prediction of avgRating and question as regression and use the following regression models: StdRating while for numRating (popularity) it is the combination of RR+tag that significantly outperforms the other competitors, with • Linear model using Ridge Regression (RR): We use a large margin. linear regression to serve as a simple yet standard ap- proach to model the relation between dependent (predic- 1 INTRODUCTION AND RELATED WORK tion scores) and independent variables (features) in a lin- The global revenue obtained from the media and entertainment ear fashion [7]. The model is given by y = θ T x where x (M&E) market in 2017 was approximately 2 trillion US dollars. The and y represent feature vectors and scores, respectively, four main verticals of the industry in the US include: film (40%), θ T contains the linear model coefficients estimated by RR music (6%), book publishing (12%) and video games (8%) [1, 2]. minimize 12 ∥y − θ T x ∥22 + λ∥θ ∥22 where ℓ2 regularization is The movie industry not only has a large cultural and sociological θT applied to avoid overfitting of the coefficients. impact on people but also occupies an important part of the business • Support Vector Regression (SVR): SVR is the regression market in the M&E industry. Producing a new movie means that the type of Support Vector Machine (SVM). SVR makes use company is betting on this movie’s success. Being able to predict of a nonlinear transformation function to map the input into the future whether a movie will be successful or not is therefore features to a high-dimensional features space given by crucial and requires machine learning techniques. The results of y = ni=1 (ai − ai∗ ).K(x i , x) +b where K = exp(−||x − x || 2 ) Í ′ the predictions will be used by producers and investors to decide is the Radial Basis Function (RBF) kernel, b is intercept, whether or not to adopt the production of similar movies. and a and a ∗ are Lagrange multipliers. The paper at hand describes the solution by the team MASlab- • Shallow Neural Network (SNN): To model the relation- ZNU for the 2019 Multimedia for Recommender System Task: ship between features in a non-linear way, we also apply a MovieREC and NewsREEL at MediaEval [6], the movie recom- neural network to predict the scores. Here, we consider a mendation subtask. The task involved using of audio, visual fea- simple (shallow) neural network model. SNN has a hidden tures extracted from trailers of the corresponding movies coming layer with 24 neurons and Rectified Linear Unit (ReLU ) from MMTF-14K dataset [4] and metadata to predict three scores: as activation function. As for the output layer we use the avgRating (representing user appreciation and dis-appreciation of Siдmoid activation function. the content), stdRating (characterizing agreement and disagree- • Deep Neural Network (DNN): This method is similar to ment between user opinions/ratings), and numRating (reflecting the SNN but we have more hidden layers to consider deeper popularity of movies). The audio and visual features have been used relations between features. In our deep model, we have 3 to solve various tasks in movie recommendation, see e.g., [3, 5]. hidden layers; the first layer has 128 neurons with ReLU as In the context of movie popularity prediction, Szabo et al. in [10] activation function; the second layer has 64 neurons and predict the future popularity of a video based on the number of uses a Siдmoid activation function; the third one has 32 previous views on YouTube using predictive models based on linear neurons and uses ReLU. regression. Pinto et al. in [9] extend the work by Szabo et al. by proposing a multivariate linear model and sampling the number of 3 RESULTS AND ANALYSIS Copyright 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution The regression results using the proposed approaches are presented 4.0 International (CC BY 4.0). in Table 1. Regarding the comparison of learning models, we can see MediaEval’19, 27-29 October 2019, Sophia Antipolis, France MediaEval’19, 27-29 October 2019, Sophia Antipolis, France H. A. Rahmani et al. Table 1: Results of regression in terms of RMSE, calculated on the validation set. The first and second best result for each score are shown in bold and italic respectively. Metadata Audio Visual Score Regressor Average Genre Tag BLF I-Vec AVF Deep RR 0.1401 0.1440 0.1495 0.1482 0.1499 0.1520 0.1472 avgRating SVR 0.1466 0.1462 0.1518 0.1588 0.1523 0.1523 0.1513 SNN 0.1396 0.1457 0.1492 0.1487 0.1496 0.1483 0.1468 DNN 0.1409 0.1370 0.1491 0.1509 0.1501 0.1509 0.1464 RR 0.1410 0.1445 0.1438 0.1421 0.1430 0.1472 0.1436 stdRating SVR 0.1487 0.1396 0.1476 0.1464 0.1495 0.1495 0.1468 SNN 0.1411 0.1433 0.1429 0.1420 0.1435 0.1423 0.1425 DNN 0.1411 0.1356 0.1429 0.1429 0.1428 0.1429 0.1413 RR 0.0515 0.0232 0.0518 0.0523 0.0524 0.0528 0.0473 numRating SVR 0.1892 0.1142 0.1752 0.1656 0.1954 0.1952 0.1461 SNN 0.0533 0.0272 0.0524 0.0526 0.0525 0.0523 0.0483 DNN 0.0529 0.0529 0.0526 0.0527 0.0527 0.0528 0.0527 that DNN and RR are the best predictors, in most cases generating REFERENCES the best performance for each feature while SVR is the worst. The [1] 2018. 2017 Top Markets Report Media and Entertainment. final submitted runs are selected based on the ones performing the https://www.trade.gov/topmarkets/pdf/Top%20Markets%20Media% best on the validSet, which are highlighted in bold in Table 1. 20and%20Entertinment%202017.pdf. (2018). Accessed: 2018-12-27. Predicting average ratings: As can be seen in Table 1, the perfor- [2] 2018. Media and Entertainment Industry Overview. https:// mance of all audio and visual features, are closely similar to each investmentbank.com/media-and-entertainment-industry-overview/. other. These results with a close margin look similar to the perfor- (2018). Accessed: 2018-12-27. [3] Yashar Deldjoo, Mihai Gabriel Constantin, Hamid Eghbal-Zadeh, Bog- mance of the genre descriptor, e.g. 0.1487 v.s. 0.1466. However, it can dan Ionescu, Markus Schedl, and Paolo Cremonesi. 2018. Audio-visual be noted that tag features is the best feature to predict the average encoding of multimedia content for enhancing movie recommenda- ratings. These results show that user-generated tags assigned to tions. In Proc. of the 12th ACM Conference on Recommender Systems, movies contain semantic information that are well correlated with RecSys 2018, Vancouver, BC, Canada, October 2-7, 2018. 455–459. ratings given to movies by user. One can also observe that the DNN [4] Yashar Deldjoo, Mihai Gabriel Constantin, Bogdan Ionescu, Markus model is the best model to predict the average ratings, though it is Schedl, and Paolo Cremonesi. 2018. MMTF-14K: a multifaceted movie only marginally better than the SNN model. trailer feature dataset for recommendation and retrieval. In Proceed- Predicting standard deviation of ratings: The results of predict- ings of the 9th ACM Multimedia Systems Conference, MMSys 2018, ing standard deviations of ratings show the audio-visual features Amsterdam, The Netherlands, June 12-15, 2018. 450–455. and genre metadata have very similar results. Again we can see the [5] Yashar Deldjoo, Maurizio Ferrari Dacrema, Mihai Gabriel Constantin, Hamid Eghbal-zadeh, Stefano Cereda, Markus Schedl, Bogdan Ionescu, best feature to predict the standard deviation of ratings is tag meta- and Paolo Cremonesi. 2019. Movie genome: alleviating new item cold data using the DNN learning model. Average results indicate that start in movie recommendation. User Model. User-Adapt. Interact. 29, the DNN model is the best learning model to predict the standard 2 (2019), 291–343. deviation of ratings. [6] Yashar Deldjoo, Benjamin Kille, Markus Schedl, Andreas Lommatzsch, Predicting number of ratings: As for predicting number of rat- and Jialie Shen. 2019. The 2019 Multimedia for Recommender Sys- ings, it can be seen that except for the tag feature with the best tem Task: MovieREC and NewsREEL at MediaEval. In Working Notes performance, the rest of audio-visual features and genre metadata Proceedings of the MediaEval 2019 Workshop, Sophia Antipolis, France, yield very similar results. For the tag feature, we observe a substan- 27-29 October 2019. tial superiority in performance compared to all other features; it [7] Jinna Lv, Wu Liu, Meng Zhang, He Gong, Bin Wu, and Huadong Ma. reduces the RMSE by about 55% (0.0232 v.s. 0.0515 for tag v.s. genre), 2017. Multi-feature fusion for predicting social media popularity. In Proceedings of the 25th ACM international conference on Multimedia. compared to the second best feature (genre). More interestingly, ACM, 1883–1888. we can see for predicting number of ratings, the best model is the [8] Farshad B Moghaddam, Mehdi Elahi, Reza Hosseini, Christoph Trat- simple RR learning model followed by SNN but not DNN. tner, and Marko Tkalcic. 2019. Predicting Movie Popularity and Rat- ings with Visual Features. In 14th International Workshop On Semantic 4 CONCLUSION And Social Media Adaptation And Personalization. This paper reports the method used by team MASlab-ZNU for the [9] Henrique Pinto, Jussara M Almeida, and Marcos A Gonçalves. 2013. Multimedia for Recommender Systems task at MediaEval 2019 [6]. Using early view patterns to predict the popularity of youtube videos. In Proceedings of the sixth ACM international conference on Web search Results of experiments using different regression approaches are and data mining. ACM, 365–374. promising and show the efficacy of audio and visual content in [10] Gabor Szabo and Bernardo A Huberman. 2010. Predicting the Popu- comparison with genre metadata but overall it is the tag feature larity of Online Content. Commun. ACM 53, 8 (2010), 80–88. that provides the best prediction quality in all experimental cases.