UNIFESP at MediaEval 2016: Predicting Media
                         Interestingness Task

                                                     Jurandy Almeida
            GIBIS Lab, Institute of Science and Technology, Federal University of São Paulo – UNIFESP
                                    12247-014, São José dos Campos, SP – Brazil
                                              jurandy.almeida@unifesp.br


ABSTRACT                                                          2.1    Visual Features
This paper describes the approach proposed by UNIFESP               Instead of using any keyframe visual features provided
for the MediaEval 2016 Predicting Media Interestingness           by the organizers, a simple and fast algorithm was used to
Task and for its video subtask only. The proposed approach        encode visual properties, known as histogram of motion pat-
is based on combining learning-to-rank algorithms for pre-        terns (HMP) [1]. It considers the video movement by the
dicting the interestingness of videos by their visual content.    transitions between frames. For each frame of an input se-
                                                                  quence, motion features are extracted from the video stream.
1.   INTRODUCTION                                                 After that, each feature is encoded as a unique pattern, rep-
                                                                  resenting its spatio-temporal configuration. Finally, those
   Current solutions to predict the interestingness of video      pattens are accumulated to form a normalized histogram.
data are usually based on learning-to-rank strategies. Most
of those research works have focused on using a single machine-   2.2    Learning to Rank Strategies
learned ranker. Recently, combining individual predictions           In this work, the extracted features were classified with
from a set of machine-learned rankers has been established        the following methods:
as an effective way to improve classification performance [9].       Ranking SVM [7]. It is a pairwise ranking method
   This paper presents an approach for predicting the inter-      that uses the traditional SVM classifier to learn a ranking
estingness of videos that relies on different learning-to-rank    function. For that, each query and its possible results are
strategies for processing visual contents. For that, a sim-       mapped to a feature space. Next, a given rank is associ-
ple, yet effective, histogram of motion patterns (HMP) [1]        ated to each point in this space. Finally, a SVM classifier
is used for processing visual information. Then, a simple         is used to find an optimal separating hyperplane between
majority voting scheme [8] is used for combining machine-         those points based on their ranks.
learned rankers and predicting the interestingness of videos.        RankNet [2]. It is a pairwise ranking method that relies
   This work is developed in the MediaEval 2016 Predicting        on a probabilistic model. For that, pairwise rankings are
Media Interestingness Task and for its video subtask only,        transformed into probability distributions, enabling the use
whose goal is to automatically select the most interesting        of probability distribution metrics as cost functions. Thus,
video segments according to a common viewer by using fea-         optimization algorithms can be used to minimize a cost func-
tures derived from audio-visual content or associated textual     tion to perform pairwise rankings. The authors formulate
information. Details about data, task, and evaluation are         this cost function using a neural network in which the learn-
described in [4].                                                 ing rate is controlled with gradient descent steps.
                                                                     RankBoost [5]. It is a pairwise ranking method that
2.   PROPOSED APPROACH                                            relies on boosting algorithms. Initially, each possible result
                                                                  for a given query is mapped to a feature space, in which
   Measuring the degree of interestingness of a video is a
                                                                  each dimension indicates the relative ranking of individual
challenging task. For that, the strategy proposed by Jiang et
                                                                  pairs of results, i.e., whether one result is ranked below or
al. [6] was adopted. It relies on training a model to compare
                                                                  above the other. Thus, the ranking problem is formulated as
the interestingness of video pairs. Thus, given two videos to
                                                                  a binary classification problem. Next, a set of weak rankers
the system, it indicates the more interesting one.
                                                                  are trained iteratively. At each iteration, the resulting pairs
   Roughly speaking, the basic idea is to use machine learn-
                                                                  are re-weighted so that the weight of pairs ranked wrongly
ing algorithms to learn a ranking function based on features
                                                                  is increased whereas the weight of pairs ranked correctly is
extracted from training data, and then apply it to features
                                                                  decreased. Finally, all the weak rankers are combined as a
extracted from testing data.
                                                                  final ranking function.
   The proposed approach predicts the interestingness of videos
                                                                     ListNet [3]. It is an extension of RankNet that, instead
based on combining learning-to-rank algorithms and exploit-
                                                                  of using pairwise rankings, considers all possible results for
ing only visual information.
                                                                  a given query as a single instance, enabling to capture and
                                                                  exploit the instrinsic structure of the data.
                                                                     Majority Voting [8]. It is the simplest method for com-
                                                                  bining the output of a set of classifiers. It relies on assigning
Copyright is held by the author/owner(s).
MediaEval 2016 Workshop, Oct. 20-21, 2016, Hilversum, Nether-     the class of a given result by the most common class assigned
lands.                                                            by all the classifiers.
3.   EXPERIMENTS & RESULTS                                            Table 3 presents the results obtained on the official sub-
   Five different runs were submitted for the video sub-task.      mission runs for 2, 342 video segments from 26 movie trailers
These runs were configured as shown in Table 1. As the             of the test data. Observe that the best results were achieved
proposed approach relies on combining different learning-          by a learning-to-rank algorithm in isolation, more specifi-
to-rank algorithms, one of the runs considers a fusion of          cally, Ranking SVM. It can be noticed that the fusion of
machine-learned rankers. For comparison purposes, the use          all the machine-learned rankers did not improved the over-
of each machine-learned ranker in isolation was evaluated in       all performance. One of the reasons for those results is the
the other runs. All those approaches were calibrated through       strategy adopted for combining learning-to-rank algorithms,
a 4-fold cross validation on the development data.                 in which all of them are treated equally.


                                                                              Table 3: Results of the official submitted runs.
            Table 1: Configurations of Runs                                                     Run        Learning-to-Rank Strategy     MAP (%)
            Run     Learning-to-Rank Strategy                                                    1         Ranking SVM                    18.15
             1      Ranking SVM                                                                  2         RankNet                        16.17
             2      RankNet                                                                      3         RankBoost                      16.17
             3      RankBoost                                                                    4         ListNet                        16.56
             4      ListNet                                                                      5         Majority Voting                16.53
             5      Majority Voting

                                                                      Figure 1 presents the Average Precision (AP) per movie
   The development data was used for training and is com-          trailer achieved in each of the submitted runs. Although
posed by 5, 054 video segments from 52 movie trailers. Each        the MAP obtained for the fusion of all the machine-learned
video segment was represented by a HMP. Notice that only           rankers is not superior to each of them in isolation, the ob-
the visual content was considered, ignoring audio informa-         tained results show the potential of the idea. Notice that
tion and textual metadata. Then, the extracted features            Ranking SVM provides the best results for 8 movie trailers,
were used as input to train the aforementioned machine-            RankNet was the best for 8 movie trailers, and RankBoost
learned rankers. The SVMrank package1 [7] was used for             performs better than both of them in 7 movie trailers. This
running Ranking SVM. The RankLib package2 was used                 clearly indicates that those learning-to-rank algorithms pro-
for running RankNet, RankBoost, and ListNet. Ranking               vide complementary information that can be combined by
SVM was configured with a linear kernel. RankNet, Rank-            fusion techniques aiming at producing better results.
Boost, and ListNet were configured with their default pa-
rameter settings. Next, the trained rankers were used to                                        70
                                                                                                                                    Ranking SVM
predict the rankings of test video segments. The rankings                                                                                 RankNet
associated with the video segments of a same movie trailer                                      60                                     RankBoost
                                                                                                                                           ListNet
were normalized using a z-score normalization. After that, a
                                                                        Average Precision (%)


                                                                                                                                   Majority Voting
                                                                                                50
thresholding method was applied to transform the normal-
ized rankings into binary decisions. It was found empirically                                   40
that better results were obtained when a video segment is
classified as interesting if its normalized rank is greater than                                30
0.7; otherwise, it is classified as non interesting. Finally,
                                                                                                20
the binary decisions of all the rankers are combined using
a majority voting scheme, producing the final classification.                                   10
The effectiveness of each strategy was assessed using Mean
Average Precision (MAP).                                                                         0
                                                                                                      vi
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
                                                                                                      vi o−
   Table 2 presents the results obtained on the development
                                                                                                        de
                                                                                                        de 52
                                                                                                        de 53
                                                                                                        de 54
                                                                                                        de 55
                                                                                                        de 56
                                                                                                        de 57
                                                                                                        de 58
                                                                                                        de 59
                                                                                                        de 60
                                                                                                        de 61
                                                                                                        de 62
                                                                                                        de 63
                                                                                                        de 64
                                                                                                        de 65
                                                                                                        de 66
                                                                                                        de 67
                                                                                                        de 68
                                                                                                        de 69
                                                                                                        de 70
                                                                                                        de 71
                                                                                                        de 72
                                                                                                        de 73
                                                                                                        de 74
                                                                                                        de 75
                                                                                                        de 76
                                                                                                           o−
                                                                                                              77
data. Observe that the performance of the different learning-
to-rank algorithms in isolation is similar, with a small advan-
tage to Ranking SVM. By analyzing the confidence intervals,        Figure 1: AP per movie trailer achieved in each run.
it can be noticed that the results achieved by the fusion of
all the machine-learned rankers seem promising.
                                                                   4.                           CONCLUSIONS
Table 2: Results obtained on the development data.                    The proposed approach has explored only visual proper-
                           Conf. Interval (95%)                    ties. Different learning-to-rank strategies were considered,
          Run     Avg.
                            min.          max.                     including a fusion of all of them. Obtained results demon-
            1     15.19     13.99        16.38                     strate that the proposed approach is promising. Future
            2     13.82     12.09        15.55                     works include the investigation of a smarter strategy for
            3     14.67     12.93        16.42                     combining learning-to-rank algorithms and considering other
            4     13.32     12.06        14.57
            5     14.71     12.69        16.73
                                                                   information sources to include more features semantically
                                                                   related to visual content.
1
  https://www.cs.cornell.edu/people/tj/svm_light/
svm_rank.html (As of September 2016)                               5.                           ACKNOWLEDGMENTS
2
  https://sourceforge.net/p/lemur/wiki/RankLib/ (As                  The author would like to thank CAPES, CNPq, and FAPESP
of September 2016)                                                 (grant #2016/06441-7) for funding.
6.   REFERENCES
[1] J. Almeida, N. J. Leite, and R. S. Torres. Comparison
    of video sequences with histograms of motion patterns.
    In ICIP, pages 3673–3676, 2011.
[2] C. J. C. Burges, T. Shaked, E. Renshaw, A. Lazier,
    M. Deeds, N. Hamilton, and G. N. Hullender. Learning
    to rank using gradient descent. In ICML, pages 89–96,
    2005.
[3] Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li.
    Learning to rank: from pairwise approach to listwise
    approach. In ICML, pages 129–136, 2007.
[4] C.-H. Demarty, M. Sjöberg, B. Ionescu, T.-T. Do,
    H. Wang, N. Q. K. Duong, and F. Lefebvre. Mediaeval
    2016 predicting media interestingness task. In Proc. of
    the MediaEval 2016 Workshop, Hilversum, Netherlands,
    Oct. 20–21 2016.
[5] Y. Freund, R. D. Iyer, R. E. Schapire, and Y. Singer.
    An efficient boosting algorithm for combining
    preferences. Journal of Machine Learning Research,
    4:933–969, 2003.
[6] Y.-G. Jiang, Y. Wang, R. Feng, X. Xue, Y. Zheng, and
    H. Yang. Understanding and predicting interestingness
    of videos. In AAAI, pages 1113–1119, 2013.
[7] T. Joachims. Training linear svms in linear time. In
    ACM SIGKDD, pages 217–226, 2006.
[8] L. Lam and C. Y. Suen. Application of majority voting
    to pattern recognition: an analysis of its behavior and
    performance. IEEE Trans. Systems, Man, and
    Cybernetics, Part A, 27(5):553–568, 1997.
[9] L. T. Li, D. C. G. Pedronette, J. Almeida, O. A. B.
    Penatti, R. T. Calumby, and R. S. Torres. A rank
    aggregation framework for video multimodal geocoding.
    Multimedia Tools and Applications, 73(3):1323–1359,
    2014.