<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Simula @ MediaEval 2016 Context of Experience Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Konstantin Pogorelov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Riegler</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pål Halvorsen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carsten Griwodz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Simula Research Laboratory and University of Oslo konstantin</institution>
          ,
          <addr-line>michael, paalh</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <fpage>20</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>This paper presents our approach for the Context of Multimedia Experience Task of the MediaEval 2016 Benchmark. We present di erent analyses of the given data using different subsets of data sources and combinations of it. Our approach gives a baseline evaluation indicating that metadata approaches work well but that also visual features can provide useful information for the given problem to solve.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        In this paper we present our solutions for the Context of
Experience Task: recommending videos suiting a watching
situation [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], which is part of the MediaEval 2016
Benchmark. The Context of Experience task's main purpose is to
explore multimedia content that is watched under a certain
situation. This situation can be seen as the context under
that the multimedia content is consumed. The use case for
the task is watching movies during a ight.
      </p>
      <p>
        The hypothesis is that watching movies during a speci c
context situation will change the preferences of the
viewers. This is related to similar hypotheses in the eld of
recommender systems as presented in for example [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ]
where context is also an important in uencing factor.
Nevertheless, it is also closely related to the eld of quality of
experience [
        <xref ref-type="bibr" rid="ref4 ref8 ref9">9, 8, 4</xref>
        ] because the context during a ight, such
as loud noises and other distractions, can play an important
role for which movies viewers chose to watch.
      </p>
      <p>Participants of the context of experience task are asked
to classify a list of movies into the two classes, namely,
+goodonairplane or -goodonairplane. To tackle this
problem we propose three di erent approaches. All three
methods use information extracted directly from the movies or
the metadata containing information about the movies in
combination with a machine-learning-based classi er. The
remainder of the paper is organized as following. At rst, we
will give a detailed explanation of our three approaches and
the classi cation algorithm that we used. This is followed
by a description of the experimental setup and the results.
Finally, we draw a conclusion.</p>
    </sec>
    <sec id="sec-2">
      <title>APPROACHES</title>
      <p>In this section we will describe our three proposed runs
in more detail. For all runs we use the same classi cation
algorithm to get the nal class.</p>
      <p>
        The classi cation algorithm that we used for all three
runs is the PART algorithm [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which is based on
PARTial decision Trees. PART relies on decision lists and uses
a separate-and-conquer approach to create them. In each
iteration PART creates a partial decision tree. For each
iteration the algorithm nds the best leaf in the tree and uses
it as a rule. This is repeated until a best set of rules is
found for the given data. The advantage of PART is that
it is very simple. The simplicity is achieved by using rule
based learning and decision nding that does not require
global optimization. A possible disadvantage of the
algorithm is that the rule sets are rather big compared to other
decision based algorithms such as C4.5 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] or RIPPER [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Nevertheless, for our use case this is not important
because the dataset is rather small [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. For all our runs we
use the WEKA machine learning library implementation of
PART with the provided (optimal) standard settings [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
2.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Metadata</title>
      <p>For the metadata only approach we used only metadata
provided be the task dataset. We limited the metadata to
the following attributes: rating, country, language, year,
runtime, Rotten Tomatoes score, IMDB score, Metacritic
score, and genre. We pre-processed and transformed rating,
language, countries and genre into numeric values for the
classi cation. The di erent scores for the di erent movie
scoring pages were normalized to a scale from 1:0 to 10. If
a value was missing in the dataset we manually searched for
the information in the Internet and replaced it with what we
found. If we could not nd ratings for all scoring services
we used 5:0 (average score) as value.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Visual Information</title>
      <p>For the visual data we downloaded the trailers from the
provided links and extracted all frames. From each frame we
extracted di erent visual features and combined them into
one feature vector for the classi cation (with a dimension of
3; 866 values).</p>
      <p>
        For the visual features, we decided to use several di erent
global features. The features that we used for this work are:
joint histogram, JPEG coe cient histogram, Tamura, fuzzy
opponent histogram, simple color histogram, fuzzy color
histogram, rotation invariant local binary pattern, fuzzy color
and texture histogram, local binary patterns and opponent
histogram, PHOG, rank and opponent histogram, color
layout, CEDD, Gabor, opponent histogram, edge histogram,
scalable color and JCD. All the features have been extracted
using the LIRE open source library [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. A detailed
description of all features can be found in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
2.3
      </p>
    </sec>
    <sec id="sec-5">
      <title>Metadata and</title>
    </sec>
    <sec id="sec-6">
      <title>Visual Information Combined</title>
      <p>
        For the nal run we combined the metadata with the
visual feature information. To combine the visual information
with the metadata we rst run the classi er on the visual
information with a modi cation so that the output was not
binary but a probability for each class. This probability
then is added to the metadata as two additional features
(probability to be negative or positive). The extended
feature vector then is used for nding the nal class. This can
be seen as a kind of late fusion approach which is in general
seen as better performing than early fusion in literature [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
    </sec>
    <sec id="sec-7">
      <title>EXPERIMENTAL SETUP</title>
      <p>The by the task provided dataset contains all in all 318
movies split into training and testset. For each run we
calculated the F1-score, precision and recall. The testset
contains 223 movies. For the trailers, only links were provided,
and we had to download them. Furthermore, the posters
of the movies were also provided but we did not use them
in our approaches. Apart form the movies we did also use
the provided metadata. We did not collect any additional
data such as full length movies, etc and we did not use the
pre-extracted visual, text and audio features. The goal of
the of the task was, as mentioned before, to automatically
identify if a movie is suitable to be watched during a ight
or not.</p>
      <p>We assessed three di erent methods executed in three
runs. An overview of the conducted runs can be found in
table 3 where we provide a summarized overview and short
descriptions of each method. The organizers also provided
a baseline for comparison based on a simple random tree
algorithm (last row in the tables).</p>
      <p>Table 3 gives a detailed overview of the results in terms of
true positives, false positives, true negatives and false
negatives achieved by our runs and the baseline. Table 3 depicts
the o cial results of the task metrics for our runs and the
baseline. All three runs outperformed the baseline signi
cantly. R1 which used metadata and visual information at
the same time had the lowest performance. This was
surprising for us since we were thinking that this approach would
perform best. A reason for the weak performance could be
the way of how we combine the di erent features. The
second best of our runs is R2 that uses metadata only. This
is not surprising since metadata is well known for
performing well and in general better than content based classi
cation. R3 was the best performing approach and even
outperformed the metadata approach which was not expected.
It seems that for the use case of watching movies on a ight
the visual features of the movie play an important role. The
reason therefore could be that movies with brighter colors
are preferred. Nevertheless, we have to investigate his in
more detail to give a nal conclusion.
5.</p>
    </sec>
    <sec id="sec-8">
      <title>CONCLUSION</title>
      <p>This paper presented three approaches for the context of
experience task, which were able to classify movies into two
subsets for being suitable or not to be watched on an
airplane. The results and insights gained by evaluating our
di erent methods indicate that there is a di erence between
what people would like to watch during a ight and that
this di erence is detectable to a certain extend by automatic
analysis of metadata and content based information.</p>
      <p>Nevertheless, we would clearly see the need for extending
the work by using multiple and larger datasets.
Additionally, it might be important to collect user opinions not by
crowdsourcing but by actually travelling people.
6.</p>
    </sec>
    <sec id="sec-9">
      <title>ACKNOWLEDGMENT</title>
      <p>This work has been funded by the NFR-funded FRINATEK
project "E cient Execution of Large Workloads on Elastic
Heterogeneous Resources" (EONS) (project number 231687)
funded by the Norwegian Research Council.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>W. W.</given-names>
            <surname>Cohen</surname>
          </string-name>
          .
          <article-title>Fast e ective rule induction</article-title>
          .
          <source>In Proceedings of the twelfth international conference on machine learning</source>
          , pages
          <volume>115</volume>
          {
          <fpage>123</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Frank</surname>
          </string-name>
          and
          <string-name>
            <given-names>I. H.</given-names>
            <surname>Witten</surname>
          </string-name>
          .
          <article-title>Generating accurate rule sets without global optimization</article-title>
          . In J. Shavlik, editor,
          <source>Fifteenth International Conference on Machine Learning</source>
          , pages
          <volume>144</volume>
          {
          <fpage>151</fpage>
          . Morgan Kaufmann,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hall</surname>
          </string-name>
          , E. Frank,
          <string-name>
            <given-names>G.</given-names>
            <surname>Holmes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pfahringer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Reutemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and I. H.</given-names>
            <surname>Witten</surname>
          </string-name>
          .
          <article-title>The weka data mining software: an update</article-title>
          .
          <source>ACM SIGKDD explorations newsletter</source>
          ,
          <volume>11</volume>
          (
          <issue>1</issue>
          ):
          <volume>10</volume>
          {
          <fpage>18</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lebreton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Raake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Barkowsky</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. Le</given-names>
            <surname>Callet</surname>
          </string-name>
          .
          <article-title>Evaluating complex scales through subjective ranking</article-title>
          .
          <source>In Proc. of QoMEX)</source>
          . IEEE,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lux</surname>
          </string-name>
          . Lire:
          <article-title>Open source image retrieval in java</article-title>
          .
          <source>In Proceedings of the 21st ACM international conference on Multimedia</source>
          , pages
          <volume>843</volume>
          {
          <fpage>846</fpage>
          . ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lux</surname>
          </string-name>
          and
          <string-name>
            <given-names>O.</given-names>
            <surname>Marques</surname>
          </string-name>
          .
          <article-title>Visual information retrieval using java and lire</article-title>
          .
          <source>Synthesis Lectures on Information Concepts</source>
          ,
          <source>Retrieval, and Services</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ):1{
          <fpage>112</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Quinlan</surname>
          </string-name>
          .
          <source>C4</source>
          .
          <article-title>5: programs for machine learning</article-title>
          .
          <source>Elsevier</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Redi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , H. de Ridder,
          <string-name>
            <surname>and I. Heynderickx.</surname>
          </string-name>
          <article-title>How passive image viewers became active multimedia users</article-title>
          .
          <source>In Visual Signal Quality Assessment</source>
          . Springer,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>U.</given-names>
            <surname>Reiter</surname>
          </string-name>
          , K. Brunnstrom,
          <string-name>
            <surname>K. De Moor</surname>
            , M.-
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Larabi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Pereira</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Pinheiro</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>You</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Zgank</surname>
          </string-name>
          .
          <article-title>Factors in uencing quality of experience</article-title>
          .
          <source>In Quality of Experience</source>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Riegler</surname>
          </string-name>
          , ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Spampinato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Halvorsen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Griwodz</surname>
          </string-name>
          .
          <article-title>The mediaeval 2016 context of experience task: Recommending videos suiting a watching situation</article-title>
          .
          <source>In Proceedings of the MediaEval 2016 Workshop</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Riegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Spampinato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Halvorsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Markussen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Pogorelov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Griwodz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Stensland</surname>
          </string-name>
          .
          <article-title>Right in ight?: A dataset for exploring the automatic prediction of movies suitable for a watching situation</article-title>
          .
          <source>In Proc. of MMSys. ACM</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Said</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Berkovsky</surname>
          </string-name>
          , and E. W. De Luca.
          <article-title>Putting things in context: Challenge on context-aware movie recommendation</article-title>
          .
          <source>In Proc. of CAMRa. ACM</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Said</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Berkovsky</surname>
          </string-name>
          , and E. W. De Luca.
          <article-title>Group recommendation in context</article-title>
          .
          <source>In Proc. of CAMRa. ACM</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>C. G.</given-names>
            <surname>Snoek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Worring</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. W.</given-names>
            <surname>Smeulders</surname>
          </string-name>
          .
          <article-title>Early versus late fusion in semantic video analysis</article-title>
          .
          <source>In Proceedings of the 13th annual ACM international conference on Multimedia</source>
          , pages
          <volume>399</volume>
          {
          <fpage>402</fpage>
          . ACM,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>