<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Introduction to a Task on Context of Experience: Recommending Videos Suiting a Watching Situation</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Delft University of Technology</institution>
          ,
          <country country="NL">Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Michael Riegler</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Simula Research Laboratory</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Catania</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>14</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>We propose a Context of Experience task, whose aim it is to explore the suitability of video content for watching in certain situations. Speci cally, we look at the situation of watching movies on an airplane. As a viewing context, airplanes are characterized by small screens and distracting viewing conditions. We assume that movies have properties that make them more or less suitable to this context. We are interested in developing systems that are able to reproduce a general judgment of viewers about whether a given movie is a good movie to watch during a ight. We provide a data set including a list of movies and human judgments concerning their suitability for airplanes. The goal of the task is to use movie metadata and audio-visual features extracted from movie trailers in order to automatically reproduce these judgments. A basic classi cation system demonstrates the feasibility and viability of the task.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>The challenge of the Context of Experience task is to
automatically predict viewers' judgments on whether video
content is suitable for a particular watching situation.
Ultimately, the aim is to build a recommender system that would
provide viewers with recommendations of content for a given
context. Currently, the majority of work on video content
recommendation focuses on personal preferences, and
overlooks cases in which context might have a strong impact on
preference relatively independently of the personal tastes of
speci c viewers. Particularly strong in uence of context can
be expected in psychologically stressful or physically
uncomfortable situations.</p>
      <p>
        For our task, we choose one such situation, with which
a large number of people have quite frequent experience:
watching movies on an airplane. In this situation, a large
majority of viewers share a common goal, which we
consider to be a viewing intent. The goal is to pass time as
pleasantly and meaningfully as possible, while con ned in
the small space of an airplane cabin, which is characterized
by a number of distractors. We take the large number of
websites discussing movies to watch on airplanes (e.g., [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ])
as evidence that this viewing intent is dominant among air
travellers. Although the scope of this task is limited to the
airplane scenario, we emphasize that the challenge of
Context of Experience is a much broader area of interest. Other
examples of stressful contexts where videos are becoming
increasingly important include hospital waiting rooms, and
dentists o ces, where videos are shown during treatment.
2.
      </p>
    </sec>
    <sec id="sec-2">
      <title>TASK DESCRIPTION</title>
      <p>
        For the task we provide the participants a list of movies,
including links to descriptions and video trailers. The
assignment of the task is to classify each movie into
+goodonairplane / -goodonairplane classes. Therefore, the ground
truth of the task is derived from two sources: A list of movies
actually used by a major airline1, as well as user judgments
on movies that are collected via a crowdsourcing tool2. Task
participants should form their own hypothesis about what
is important for users viewing movies on an airplane, and
design an approach using appropriate features and a
classi er, or decision function. Figure 1 gives an impression of
a screen commonly used on an airplane and the very
speci c attributes regarding size and quality of the video. The
value of the task lies in understanding the ability of
contentbased and metadata-based features to discriminate the kind
of movies that people would like to watch on small screens
under stressful or somehow not normal situations. Since the
multimedia content that users watch on ights can in uence
their well being and overall experience this task is related
to the quality of multimedia experience work like for
exam1http://www.klm.com/travel/no_en/prepare_for_
travel/on_board/entertainment/onboard_movies.htm
2https://crowdflower.com/
ple [
        <xref ref-type="bibr" rid="ref1 ref3 ref4 ref5 ref6">6, 5, 3, 4, 1</xref>
        ]. Apart from that, the task also includes the
area of user intent since the intent of the users, why they
want to watch movies on the airplane, is a strong in
uencing force on what they watch [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. Task participants are
provided with a collection of videos, i.e., trailers as a
representative for the movie because of copyright issues, and the
context, e.g., video URL, metadata, user votes etc. Apart
from that we also provide di erent pre-extracted features,
including visual and audio features. The participants are
asked to develop methods that will predict to which intent
class the video belongs, respectively, good or bad to watch
on an airplane.
      </p>
      <p>To tackle the task it can be addressed by leveraging
techniques from multiple multimedia-related disciplines,
including such as social computing (intent), machine learning
(classi cation), multimedia content analysis, multimodal fusion,
and crowdsourcing. Further we hope that it will be
useful for content provider, since the exploitation of intent in
combination with users' satisfaction could lead to more
sophisticated ways to develop methods of providing a better
service to the users.
3.</p>
    </sec>
    <sec id="sec-3">
      <title>DATA SET</title>
      <p>The data set we provide is released, including titles and
links, that allow participants to gather online metadata and
trailers for movies. We do not, as already mentioned,
provide the video les because of copyright restrictions. Movies
are collected based on movie lists from a major international
airline, in our case, KLM Royal Dutch Airlines. The nal
list of movies is a merged set of movies collected between
February and April 2015. The video data set contains both
positive and negative samples, whereas the negative
examples are carefully sampled from IMDB in order to create a
fair and representative negative class. The data set is split
into a training set and a test set. In order to collect user
judgments, we use an existing system that has been built
for the purpose of collecting user feedback of this sort. We
evaluate systems both with respect to the airline's choice
of movies, and the crowd's choice of airline-suitable movies.
Votes about the labels collected by crowdsourcing are
considered as the authoritative labels. For this reason,
crowdworkers are asked to rank a small set of movies with respect
to how strongly they would like to watch this video on an
airplane. This ranking is then combined to create the class
for each movie in the training and test data.</p>
      <p>Technical details. Overall, the data set contains 318
movies. Links to trailers are collected from IMDB and YouTube.
Participants are also allowed to collect their own data such
as full length movies, more metadata and user comments,
etc. The goal of systems that are developed to address this
task should be to automatically identify appropriate
content, i.e., whether a movie should be recommended for
being watched on an airplane or not. To achieve this goal,
the methods should not require manual or crowdsourced
input. The data set contains extracted visual, audio and text
features. Furthermore, we provide metadata collected from
IMDB including user comments. The visual features that
are provided are: Histogram of Oriented Gradients (HOG),
Color Moments, Local Binary Patterns (LBP) and Gray
Level Run Length Matrix. The audio descriptors are
MelFrequency Cepstral Coe cients (MFCC). The development
set contains 95 labelled movies. The test data contains 223
movies without labels.</p>
      <p>Features used
Metadata + user ratings
Only user ratings
Only visual information
Only metadata</p>
      <p>Evaluation. For the evaluation we use the standard
metrics Precision, Recall and weighted F1 score. Negative and
positive classes in both data sets are balanced. Participants
are asked to submit a predicted class for each movie in the
test data set. The metrics then are calculated and provided
to the participants. For a transparent and fair procedure,
the labels used for the evaluation will be released together
with the results.</p>
      <p>
        Initial results. To con rm the viability of the task, and
show the possibilities opened by this data set, we carried
out some basic classi cation experiments. For the classi
cation we used the Weka library. As classi er we choose
the rule based PART classi er. This classi er uses separate
and conquer to generate a decision list. From this, it builds
a decision tree where the best leaves are used as rules for
the classi er [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Table 1 show the results of our four initial
runs. For the evaluation we used the weighted average of
precision, recall and F1-score. The rst run uses metadata
(language, year published, genre, country, runtime and age
rating) in combination with user ratings as input for the
classi er. This run is our best performer. It clearly
outperforms the naive baseline, which is 0.5 (precision, recall and
F1-score). The second run uses user ratings alone. This run
performs well with recall, but poorly with precision. This
implies that receiving certain user ratings is a necessary, but
not a su cient condition for being a movie that is good to
watch on an airplane. Taken together, the rst two runs
conrm that the task is non-trivial, and that it is also viable.
The third run uses visual features. This run scores below
the naive baseline. However, the approach to visual
classication here was relatively simple. Additional exploratory
experiments, not reported here, revealed that visual features
do have the ability to approve results when used in
combination with other features. Such combinations are interesting
for future work.
      </p>
      <p>Finally, the last run con rms that metadata without user
ratings is able to yield performance above the naive baseline.
An information gain based analysis of all features ranked
genre, publication year, country, language and runtime as
the top ve features.</p>
    </sec>
    <sec id="sec-4">
      <title>4. SUMMARY</title>
      <p>The task is challenging due to the complex relationship
between the multimedia content, and viewers' perceptions
and reception. We hope that the novel use case will inspire
researchers to investigation of user intent and context of
experience. Understanding user intent and what users need in
order to have the best experience is an important emerging
topic in the area of multimedia research.</p>
    </sec>
    <sec id="sec-5">
      <title>ACKNOWLEDGMENT</title>
      <p>This work is funded by the Norwegian FRINATEK project
"EONS" (#231687) &amp; the EC project CrowdRec (#610594).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Borowiak</surname>
          </string-name>
          and
          <string-name>
            <given-names>U.</given-names>
            <surname>Reiter</surname>
          </string-name>
          .
          <article-title>Long duration audiovisual content: Impact of content type and impairment appearance on user quality expectations over time</article-title>
          .
          <source>In Quality of Multimedia Experience (QoMEX)</source>
          ,
          <source>2013 Fifth International Workshop on</source>
          , pages
          <volume>200</volume>
          {
          <fpage>205</fpage>
          . IEEE,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Frank</surname>
          </string-name>
          and
          <string-name>
            <given-names>I. H.</given-names>
            <surname>Witten</surname>
          </string-name>
          .
          <article-title>Generating accurate rule sets without global optimization</article-title>
          .
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lebreton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Raake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Barkowsky</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. Le</given-names>
            <surname>Callet</surname>
          </string-name>
          .
          <article-title>Evaluating complex scales through subjective ranking</article-title>
          .
          <source>In Quality of Multimedia Experience (QoMEX)</source>
          ,
          <source>2014 Sixth International Workshop on</source>
          , pages
          <volume>303</volume>
          {
          <fpage>308</fpage>
          . IEEE,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Rainer</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Timmerer</surname>
          </string-name>
          .
          <article-title>A quality of experience model for adaptive media playout</article-title>
          .
          <source>In Quality of Multimedia Experience (QoMEX)</source>
          ,
          <source>2014 Sixth International Workshop on</source>
          , pages
          <volume>177</volume>
          {
          <fpage>182</fpage>
          . IEEE,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Redi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , H. de Ridder,
          <string-name>
            <surname>and I. Heynderickx.</surname>
          </string-name>
          <article-title>How passive image viewers became active multimedia users</article-title>
          .
          <source>In Visual Signal Quality Assessment</source>
          , pages
          <volume>31</volume>
          {
          <fpage>72</fpage>
          . Springer,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>U.</given-names>
            <surname>Reiter</surname>
          </string-name>
          , K. Brunnstrom,
          <string-name>
            <surname>K. De Moor</surname>
            , M.-
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Larabi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Pereira</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Pinheiro</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>You</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Zgank</surname>
          </string-name>
          .
          <article-title>Factors in uencing quality of experience</article-title>
          .
          <source>In Quality of Experience</source>
          , pages
          <volume>55</volume>
          {
          <fpage>72</fpage>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Riegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Calvet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Calvet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Halvorsen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Griwodz</surname>
          </string-name>
          .
          <article-title>Exploitation of producer intent in relation to bandwidth and qoe for online video streaming services</article-title>
          .
          <source>In Proceedings of the 25th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video</source>
          , pages
          <volume>7</volume>
          {
          <fpage>12</fpage>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Soleymani</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Larson</surname>
          </string-name>
          .
          <article-title>Crowdsourcing for a ective annotation of video: Development of a viewer-reported boredom corpus</article-title>
          .
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Tripinsurance</surname>
          </string-name>
          .
          <article-title>Best movies guide for airplanes</article-title>
          . http://www.tripinsurance.com/tips/guide
          <article-title>-to-the-bestmoviestv-shows-to-watch-on-a-plane. [last visited</article-title>
          ,
          <source>Dezember. 10</source>
          ,
          <year>2014</year>
          ].
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>