<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Regression Approach to Movie Rating Prediction using Multimedia Content and Metadata</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hossein A. Rahmani</string-name>
          <email>srahmani@znu.ac.ir</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yashar Deldjoo</string-name>
          <email>deldjooy@acm.org</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Markus Schedl</string-name>
          <email>markus.schedl@jku.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Johannes Kepler University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Politecnico di Bari</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Zanjan</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>27</fpage>
      <lpage>29</lpage>
      <abstract>
        <p>This paper presents the submission of the team MASlab-ZNU to the MMRecSys movie recommendation task, as part of MediaEval 2019. The task involved predicting average movie ratings, standard deviation of ratings, and the number of ratings by using audio and visual features extracted from trailers and the associated metadata. In the proposed work, we model the rating prediction problem as a regression problem and employ diferent learning models for the prediction task, including ridge regression (RR), support vector regression (SVR), shallow neural network (SNN) and deep neural network (DNN). The results of fairly large amount of experiments on various models and features indicate that combination of DNN+tag features produce the best results for prediction of avgRating and StdRating while for numRating (popularity) it is the combination of RR+tag that significantly outperforms the other competitors, with a large margin.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION AND RELATED WORK</title>
      <p>
        The global revenue obtained from the media and entertainment
(M&amp;E) market in 2017 was approximately 2 trillion US dollars. The
four main verticals of the industry in the US include: film (40%),
music (6%), book publishing (12%) and video games (8%) [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
The movie industry not only has a large cultural and sociological
impact on people but also occupies an important part of the business
market in the M&amp;E industry. Producing a new movie means that the
company is betting on this movie’s success. Being able to predict
into the future whether a movie will be successful or not is therefore
crucial and requires machine learning techniques. The results of
the predictions will be used by producers and investors to decide
whether or not to adopt the production of similar movies.
      </p>
      <p>
        The paper at hand describes the solution by the team
MASlabZNU for the 2019 Multimedia for Recommender System Task:
MovieREC and NewsREEL at MediaEval [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], the movie
recommendation subtask. The task involved using of audio, visual
features extracted from trailers of the corresponding movies coming
from MMTF-14K dataset [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and metadata to predict three scores:
avgRating (representing user appreciation and dis-appreciation of
the content), stdRating (characterizing agreement and
disagreement between user opinions/ratings), and numRating (reflecting
popularity of movies). The audio and visual features have been used
to solve various tasks in movie recommendation, see e.g., [
        <xref ref-type="bibr" rid="ref3 ref5">3, 5</xref>
        ].
      </p>
      <p>
        In the context of movie popularity prediction, Szabo et al. in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
predict the future popularity of a video based on the number of
previous views on YouTube using predictive models based on linear
regression. Pinto et al. in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] extend the work by Szabo et al. by
proposing a multivariate linear model and sampling the number of
views at regular time slots. Recently, Moghaddam et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] address
the problem of movie popularity prediction using visual features of
movie trailers.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>EXPERIMENTS</title>
      <p>
        This section presents the experiments carried out towards
addressing the prediction task. For performance evaluation, we randomly
split the original train set into 2 non-overlapping subsets, where
we consider 80% train data and 20% validation data. We refer to
these datasets as trainSet and validSet throughout the paper. In a
prepossessing step, we normalize all features in the dataset, using
min-max normalization. We perform a hyper parameter search and
report all the results under the best setting. We model the task in
question as regression and use the following regression models:
• Linear model using Ridge Regression (RR): We use
linear regression to serve as a simple yet standard
approach to model the relation between dependent
(prediction scores) and independent variables (features) in a
linear fashion [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The model is given by y = θT x where x
and y represent feature vectors and scores, respectively,
θT contains the linear model coeficients estimated by RR
minimize 12 ∥y − θT x ∥22 + λ∥θ ∥22 where ℓ2 regularization is
θT
applied to avoid overfitting of the coeficients.
• Support Vector Regression (SVR): SVR is the regression
type of Support Vector Machine (SVM). SVR makes use
of a nonlinear transformation function to map the input
features to a high-dimensional features space given by
y = Íin=1(ai − ai∗).K (xi , x ) + b where K = exp(−||x − x′ ||2)
is the Radial Basis Function (RBF) kernel, b is intercept,
and a and a∗ are Lagrange multipliers.
• Shallow Neural Network (SNN): To model the
relationship between features in a non-linear way, we also apply a
neural network to predict the scores. Here, we consider a
simple (shallow) neural network model. SNN has a hidden
layer with 24 neurons and Rectified Linear Unit ( ReLU )
as activation function. As for the output layer we use the
Siдmoid activation function.
• Deep Neural Network (DNN): This method is similar to
the SNN but we have more hidden layers to consider deeper
relations between features. In our deep model, we have 3
hidden layers; the first layer has 128 neurons with ReLU as
activation function; the second layer has 64 neurons and
uses a Siдmoid activation function; the third one has 32
neurons and uses ReLU.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>RESULTS AND ANALYSIS</title>
      <p>The regression results using the proposed approaches are presented
in Table 1. Regarding the comparison of learning models, we can see
avgRating
stdRating
numRating</p>
      <p>RR
SVR
SNN
DNN
RR
SVR
SNN
DNN
RR
SVR
SNN
DNN
that DNN and RR are the best predictors, in most cases generating
the best performance for each feature while SVR is the worst. The
ifnal submitted runs are selected based on the ones performing the
best on the validSet, which are highlighted in bold in Table 1.
Predicting average ratings: As can be seen in Table 1, the
performance of all audio and visual features, are closely similar to each
other. These results with a close margin look similar to the
performance of the genre descriptor, e.g. 0.1487 v.s. 0.1466. However, it can
be noted that tag features is the best feature to predict the average
ratings. These results show that user-generated tags assigned to
movies contain semantic information that are well correlated with
ratings given to movies by user. One can also observe that the DNN
model is the best model to predict the average ratings, though it is
only marginally better than the SNN model.</p>
      <p>Predicting standard deviation of ratings: The results of
predicting standard deviations of ratings show the audio-visual features
and genre metadata have very similar results. Again we can see the
best feature to predict the standard deviation of ratings is tag
metadata using the DNN learning model. Average results indicate that
the DNN model is the best learning model to predict the standard
deviation of ratings.</p>
      <p>Predicting number of ratings: As for predicting number of
ratings, it can be seen that except for the tag feature with the best
performance, the rest of audio-visual features and genre metadata
yield very similar results. For the tag feature, we observe a
substantial superiority in performance compared to all other features; it
reduces the RMSE by about 55% (0.0232 v.s. 0.0515 for tag v.s. genre),
compared to the second best feature (genre). More interestingly,
we can see for predicting number of ratings, the best model is the
simple RR learning model followed by SNN but not DNN.
4</p>
    </sec>
    <sec id="sec-4">
      <title>CONCLUSION</title>
      <p>
        This paper reports the method used by team MASlab-ZNU for the
Multimedia for Recommender Systems task at MediaEval 2019 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
Results of experiments using diferent regression approaches are
promising and show the eficacy of audio and visual content in
comparison with genre metadata but overall it is the tag feature
that provides the best prediction quality in all experimental cases.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <year>2018</year>
          .
          <source>2017 Top Markets Report Media and Entertainment</source>
          . https://www.trade.gov/topmarkets/pdf/Top%20Markets%
          <fpage>20Media</fpage>
          %
          <fpage>20and</fpage>
          %
          <fpage>20Entertinment</fpage>
          %
          <fpage>202017</fpage>
          .pdf. (
          <year>2018</year>
          ). Accessed:
          <fpage>2018</fpage>
          -12-27.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <fpage>2018</fpage>
          .
          <article-title>Media and Entertainment Industry Overview</article-title>
          . https:// investmentbank.com/media-and
          <article-title>-entertainment-industry-overview/</article-title>
          . (
          <year>2018</year>
          ). Accessed:
          <fpage>2018</fpage>
          -12-27.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Yashar</given-names>
            <surname>Deldjoo</surname>
          </string-name>
          , Mihai Gabriel Constantin, Hamid Eghbal-Zadeh, Bogdan Ionescu, Markus Schedl, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Cremonesi</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Audio-visual encoding of multimedia content for enhancing movie recommendations</article-title>
          .
          <source>In Proc. of the 12th ACM Conference on Recommender Systems, RecSys</source>
          <year>2018</year>
          , Vancouver, BC, Canada, October 2-
          <issue>7</issue>
          ,
          <year>2018</year>
          .
          <fpage>455</fpage>
          -
          <lpage>459</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Yashar</given-names>
            <surname>Deldjoo</surname>
          </string-name>
          , Mihai Gabriel Constantin, Bogdan Ionescu, Markus Schedl, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Cremonesi</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>MMTF-14K: a multifaceted movie trailer feature dataset for recommendation and retrieval</article-title>
          .
          <source>In Proceedings of the 9th ACM Multimedia Systems Conference, MMSys</source>
          <year>2018</year>
          , Amsterdam, The Netherlands, June 12-15,
          <year>2018</year>
          .
          <fpage>450</fpage>
          -
          <lpage>455</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Yashar</given-names>
            <surname>Deldjoo</surname>
          </string-name>
          , Maurizio Ferrari Dacrema, Mihai Gabriel Constantin, Hamid Eghbal-zadeh, Stefano Cereda, Markus Schedl, Bogdan Ionescu, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Cremonesi</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Movie genome: alleviating new item cold start in movie recommendation</article-title>
          .
          <source>User Model. User-Adapt. Interact</source>
          .
          <volume>29</volume>
          ,
          <issue>2</issue>
          (
          <year>2019</year>
          ),
          <fpage>291</fpage>
          -
          <lpage>343</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Yashar</given-names>
            <surname>Deldjoo</surname>
          </string-name>
          , Benjamin Kille, Markus Schedl, Andreas Lommatzsch, and
          <string-name>
            <given-names>Jialie</given-names>
            <surname>Shen</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>The 2019 Multimedia for Recommender System Task: MovieREC and NewsREEL at MediaEval</article-title>
          .
          <source>In Working Notes Proceedings of the MediaEval 2019 Workshop</source>
          , Sophia Antipolis, France,
          <fpage>27</fpage>
          -
          <lpage>29</lpage>
          October
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Jinna</given-names>
            <surname>Lv</surname>
          </string-name>
          , Wu Liu, Meng Zhang, He Gong,
          <string-name>
            <surname>Bin Wu</surname>
          </string-name>
          , and Huadong Ma.
          <year>2017</year>
          .
          <article-title>Multi-feature fusion for predicting social media popularity</article-title>
          .
          <source>In Proceedings of the 25th ACM international conference on Multimedia. ACM</source>
          ,
          <year>1883</year>
          -
          <fpage>1888</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Farshad</surname>
            <given-names>B Moghaddam</given-names>
          </string-name>
          , Mehdi Elahi, Reza Hosseini, Christoph Trattner, and
          <string-name>
            <given-names>Marko</given-names>
            <surname>Tkalcic</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Predicting Movie Popularity and Ratings with Visual Features</article-title>
          .
          <source>In 14th International Workshop On Semantic And Social Media Adaptation And Personalization.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Henrique</given-names>
            <surname>Pinto</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jussara M Almeida</surname>
          </string-name>
          , and Marcos A Gonçalves.
          <year>2013</year>
          .
          <article-title>Using early view patterns to predict the popularity of youtube videos</article-title>
          .
          <source>In Proceedings of the sixth ACM international conference on Web search and data mining. ACM</source>
          ,
          <volume>365</volume>
          -
          <fpage>374</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Gabor</given-names>
            <surname>Szabo and Bernardo A Huberman</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Predicting the Popularity of Online Content</article-title>
          .
          <source>Commun. ACM 53</source>
          ,
          <issue>8</issue>
          (
          <year>2010</year>
          ),
          <fpage>80</fpage>
          -
          <lpage>88</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>