<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jaeyoung Choi</string-name>
          <email>jaeyoung@icsi.berkeley.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudia Hauff</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olivier Van Laere</string-name>
          <email>oliviervanlaere@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bart Thomee</string-name>
          <email>bthomee@yahoo-inc.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Blueshift Labs</institution>
          ,
          <addr-line>San Francisco</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Delft University of Technology</institution>
          ,
          <country country="NL">the Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>International Computer Science Institute</institution>
          ,
          <addr-line>Berkeley</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Yahoo Labs</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>14</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>The sixth edition of the Placing Task at MediaEval introduces two new sub-tasks: (1) locale-based placing, which emphasizes the need to move away from an evaluation purely based on latitude and longitude towards an entity-centered evaluation, and (2) mobility-based placing, which addresses predicting missing locations within a sequence of movements; the latter is a speci c real-world use case that so far has received little attention within the research community. Two additional changes over the previous years are the introduction of open source organizer baselines for both sub-tasks shortly after the o cial data release, and the implementation of a live leaderboard, which allows the participants to gain insights into the e ectiveness of their approaches compared to the o cial baselines and in relation to each other at an early stage, before the actual run submissions are due.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The Placing Task challenges participants to develop
techniques to automatically annotate photos and videos with
their geolocation using their visual content and/or textual
metadata. In particular, we wish to see those taking part to
extend and improve upon the contributions of participants
from previous editions, as well as of the research community
at large, e.g. [
        <xref ref-type="bibr" rid="ref1 ref10 ref3 ref5 ref7 ref8">7, 10, 3, 1, 5, 8</xref>
        ]. Although the Placing Task
has indeed been shown to be a \research catalyst" [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for
geoprediction of social multimedia, with each edition of the
task it becomes a greater challenge to alter the benchmark
su ciently to allow and motivate participants to make
substantial changes to their frameworks and systems instead of
small technical ones|this year's introduction of organizer
baselines, a leaderboard, as well as novel sub-tasks were
driven by this consideration.
      </p>
    </sec>
    <sec id="sec-2">
      <title>DATA</title>
      <p>
        This year's edition of the Placing Task was based on the
YFCC100M1 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which to date is the largest social
multimedia collection that is publicly and freely available. The full
dataset consists of 100 million Flickr2 Creative Commons3
      </p>
      <sec id="sec-2-1">
        <title>1https://bit.ly/yfcc100md 2https://www.flickr.com 3https://www.creativecommons.org</title>
        <sec id="sec-2-1-1">
          <title>Training</title>
          <p>#Photos #Videos</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Testing #Photos #Videos</title>
          <p>Locale-based placing sub-task
4; 672; 382</p>
          <p>
            Mobility-based placing sub-task
licensed photos and videos with associated metadata.
Similar to last year's edition [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ], we sampled a subset of the
YFCC100M for training and testing, see Table 1. The need for
two separate datasets arose from the task requirements
(described in Section 3). No user appeared both in the training
set and in the test set, and to minimize user and location
bias, each user was limited to contributing at most 250
photos and 50 videos, where no photos/videos were included
that were taken by a user less than 10 minutes apart. The
rather uncontrolled nature of the data (sampled from
longitudinal, large-scale, noisy and biased raw data) confronts
participants with additional challenges. To lower the
entrance barrier, we precomputed and provided participants
with fteen visual, and three aural features commonly used
in multimedia analysis for each of the media objects
including SIFT, Gist, color and texture histograms for visual
analysis, and MFCC for audio analysis [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ].
3.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>TASKS</title>
      <p>Locale-based sub-task: In this sub-task, participants were
given a hierarchy of places across the world, ranging across
neighborhoods, cities, regions, countries and continents. For
each photo and video, they were asked to pick a node (i.e.
a place) from the hierarchy in which they most con dently
believe it had been taken. While the ground truth locations
of the photos and videos were associated with the most
accurate nodes (i.e. the leaves) in the hierarchy, the participants
could express a reduced con dence in their location
estimates by selecting nodes at higher levels in the hierarchy.
If their con dence was su ciently high, participants could
naturally directly estimate the geographic coordinate of the
photo/video instead of choosing a node from the hierarchy.</p>
      <p>As our place hierarchy we used version 2.0 of the open
source GADM database4, which contains the spatial
boundaries of the world's administrative areas. As the GADM only
contains data up to city level, we manually supplemented it
with neighbourhood data for several cities obtained from the
geo-game ClickThatHood5. In total, the hierarchy contains
221,458 leaf nodes that are spread across 253 countries. The
hierarchy has a maximum depth of 7 and an average depth
of 4.33, with each place being a variation of the general
hierarchy:
Country!State!Province!County!City!Neighborhood
Due to the use of the hierarchy, only photos and videos taken
within any of the GADM boundaries were part of this
subtask, and thus media captured in or above international
waters were excluded.</p>
      <p>Mobility-based sub-task: In this sub-task, participants
were given a sequence of photos taken in a certain city by
a speci c user, of which not all photos were associated with
a geographic coordinate (e.g. the user took some photos
when GPS was temporarily unavailable). The participants
were asked to predict the locations of those photos with
missing coordinates. The nearly 150K training photos of
this sub-task were divided into 23,116 sequences, while the
approximately 33K test photos were separated into 5,119
sequences. From each sequence in the test set about 30%
of the coordinates were missing, which are the ones that
needed to be predicted.</p>
    </sec>
    <sec id="sec-4">
      <title>RUNS</title>
      <p>Participants may submit up to ve attempts (`runs') for
each sub-task. They can make use of the provided
metadata and precomputed features, as well as external resources
(e.g. gazetteers, dictionaries, Web corpora), depending on
the run type. We distinguish between the following ve run
types:
Run 1: Only provided textual metadata may be used.
Run 2: Only provided visual &amp; aural features may be used.
Run 3: Only provided textual metadata, visual features
and the visual &amp; aural features may be used.</p>
      <p>Run 4{5: Everything is allowed, except for crawling the
exact items contained in the test set, or any items by
a test user taken within 24 hours before the rst and
after the last timestamp of a photo sequence in the
mobility test set.</p>
    </sec>
    <sec id="sec-5">
      <title>EVALUATION</title>
      <p>
        For the locale-based sub-task, the evaluation metric is based
on a hierarchical distance between the ground truth node
and the predicted node or coordinate in the place hierarchy.
The mobility-based sub-task is evaluated according to the
familiar geographic distance-based metric, where for each
test item the distance is computed between the ground truth
coordinate and the estimated coordinate. One important
di erence with past editions is that this year we measure
geographic distances with Karney's formula [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]; this formula
is based on the assumption that the shape of the Earth is
an oblate spheroid, which produces more accurate distances
than methods such as the great-circle distance that assume
the shape of the Earth to be a sphere.
As task organizers, we provided two open source baselines
to the participants, one for the locale6 sub-task and one for
the mobility7 sub-task. Additionally, we implemented a live
leaderboard that allowed participants to submit runs and
view their relative standing towards others, as evaluated on
a representative development set (i.e. part of, but not the
complete, test set).
7.
      </p>
      <sec id="sec-5-1">
        <title>6http://bit.ly/1gsrmvx</title>
        <p>7http://bit.ly/1K8vUy8</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ekambaram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kelm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gottlieb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sikora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ramchandran</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Friedland</surname>
          </string-name>
          .
          <article-title>Human vs machine: establishing a human baseline for multimodal location estimation</article-title>
          .
          <source>In Proceedings of the ACM International Conference on Multimedia</source>
          , pages
          <volume>867</volume>
          {
          <fpage>876</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thomee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Friedland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Borth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Elizalde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gottlieb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Carrano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pearce</surname>
          </string-name>
          , et al.
          <article-title>The Placing Task: a large-scale geo-estimation challenge for social-media videos and images</article-title>
          .
          <source>In Proceedings of the ACM International Workshop on Geotagging and Its Applications in Multimedia</source>
          , pages
          <volume>27</volume>
          {
          <fpage>31</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Hau</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Houben</surname>
          </string-name>
          .
          <article-title>Placing images on the world map: a microblog-based enrichment approach</article-title>
          .
          <source>In Proceedings of the ACM Conference on Research and Development in Information Retrieval</source>
          , pages
          <volume>691</volume>
          {
          <fpage>700</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Karney</surname>
          </string-name>
          .
          <article-title>Algorithms for geodesics</article-title>
          .
          <source>Journal of Geodesy</source>
          ,
          <volume>87</volume>
          (
          <issue>1</issue>
          ):
          <volume>43</volume>
          {
          <fpage>55</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kelm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schmiedeke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Friedland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ekambaram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ramchandran</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Sikora</surname>
          </string-name>
          .
          <article-title>A novel fusion method for integrating multiple modalities and knowledge for multimodal location estimation</article-title>
          .
          <source>In Proceedings of the ACM International Workshop on Geotagging and Its Applications in Multimedia</source>
          , pages
          <volume>7</volume>
          {
          <fpage>12</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kelm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thomee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Trevisiol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Choi</surname>
          </string-name>
          , O. van
          <string-name>
            <surname>Laere</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Schockaert</surname>
            , G. Jones,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Serdyukov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Murdock</surname>
            , and
            <given-names>G. Friedland.</given-names>
          </string-name>
          <article-title>The benchmark as a research catalyst: charting the progress of geo-prediction for social multimedia</article-title>
          .
          <source>In Multimodal Location Estimation of Videos and Images</source>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rae</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Kelm</surname>
          </string-name>
          .
          <source>Working Notes for the Placing Task at MediaEval</source>
          <year>2012</year>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Serdyukov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Murdock</surname>
          </string-name>
          , and R. van Zwol.
          <article-title>Placing Flickr photos on a map</article-title>
          .
          <source>In Proceedings of the ACM Conference on Research and Development in Information Retrieval</source>
          , pages
          <volume>484</volume>
          {
          <fpage>491</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Thomee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Shamma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Friedland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            and
            <surname>Elizalde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Poland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Borth</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>YFCC100M: The new data in multimedia research</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <year>2015</year>
          . To appear.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Trevisiol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jegou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Delhumeau</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Gravier. Retrieving</surname>
          </string-name>
          geo
          <article-title>-location of videos with a divide &amp; conquer hierarchical multimodal approach</article-title>
          .
          <source>In Proceedings of the ACM International Conference on Multimedia Retrieval</source>
          , pages
          <fpage>1</fpage>
          <issue>{8</issue>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>