<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jaeyoung Choi</string-name>
          <email>jaeyoung@icsi.berkeley.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudia Hauff</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olivier Van Laere</string-name>
          <email>oliviervanlaere@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bart Thomee</string-name>
          <email>bthomee@google.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Blueshift Labs</institution>
          ,
          <addr-line>San Francisco, CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Delft University of Technology</institution>
          ,
          <country country="NL">the Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Google</institution>
          ,
          <addr-line>San Bruno, CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>International Computer Science Institute</institution>
          ,
          <addr-line>Berkeley, CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <fpage>20</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>The seventh edition of the Placing Task at MediaEval focuses on two challenges: (1) estimation-based placing, which addresses estimating the geographic location where a photo or video was taken, and (2) veri cation-based placing, which addresses verifying whether a photo or video was indeed taken at a pre-speci ed geographic location. Like the previous edition, we made the organizer baselines for both subtasks available as open source code, and published a live leaderboard that allows the participants to gain insights into the e ectiveness of their approaches compared to the o cial baselines and in relation to each other at an early stage, before the actual run submissions are due.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The Placing Task challenges participants to develop
techniques to automatically determine where in the world
photos and videos were captured based on analyzing their
visual content and/or textual metadata, optionally augmented
with knowledge from external resources like gazetteers. In
particular, we aim to see those taking part to improve upon
the contributions of participants from previous editions, as
well as of the research community at large, e.g. [
        <xref ref-type="bibr" rid="ref11 ref2 ref4 ref6 ref8 ref9">8, 11, 4, 2,
6, 9</xref>
        ]. Although the Placing Task has indeed been shown to
be a \research catalyst" [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] for geo-prediction of social
multimedia, with each edition of the task it becomes a greater
challenge to alter the benchmark su ciently to allow and
motivate participants to make substantial changes to their
frameworks and systems instead of small technical ones. The
introduction of the veri cation sub-task this year was driven
by this consideration, as it requires participants to integrate
a notion of con dence in their location predictions to decide
whether or not a photo or video was taken in a particular
country, state, city or neighborhood.
      </p>
    </sec>
    <sec id="sec-2">
      <title>DATA</title>
      <p>
        This year's edition of the Placing Task was once again based
on the YFCC100M [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], which to date is the largest
publicly and freely available social multimedia collection, and
which can be obtained through the Yahoo Webscope
program1. The full dataset consists of 100 million Flickr2
Cre
      </p>
      <sec id="sec-2-1">
        <title>1https://bit.ly/yfcc100md 2https://www.flickr.com</title>
        <sec id="sec-2-1-1">
          <title>Training</title>
          <p>#Photos #Videos</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Testing #Photos #Videos</title>
          <p>
            4,991,679
24,955
1,497,464
29,934
ative Commons3 licensed photos and videos with associated
metadata. Similar to last year's edition [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ], we sampled a
subset of the YFCC100M for training and testing, see Table 1.
No user appeared both in the training set and in the test
set, and to minimize user and location bias, each user was
limited to contributing at most 250 photos and 50 videos,
where no photos/videos were included that were taken by a
user less than 10 minutes apart. We included both test sets
used in the Placing Tasks of 2014 and 2015 in this year's
test set, allowing us to assess how the location estimation
performance has improved over time.
          </p>
          <p>
            The rather uncontrolled nature of the data (sampled from
longitudinal, large-scale, noisy and biased raw data)
confronts participants with additional challenges. To lower the
entrance barrier, we precomputed and provided participants
with fteen visual, and three aural features commonly used
in multimedia analysis for each of the media objects
including SIFT, Gist, color and texture histograms for visual
analysis, and MFCC for audio analysis [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ], which together with
the original photo and video content are publicly and freely
available through the Multimedia Commons Initiative4. In
addition, several expansion packs have been released by the
creators of the YFCC100M dataset, such as detected visual
concepts and Exif metadata, which could prove useful for
the participants.
3.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>TASKS</title>
      <p>Estimation-based sub-task: In this sub-task,
participants were given a hierarchy of places across the world,
ranging across neighborhoods, cities, regions, countries and
continents. For each photo and video, they were asked to pick
a node (i.e. a place) from the hierarchy in which they most
con dently believe it had been taken. While the ground
truth locations of the photos and videos were associated
with their actual coordinates and thus in essence the most
accurate nodes (i.e. the leaves) in the hierarchy, the
participants could express a reduced con dence in their location
estimates by selecting nodes at higher levels in the hierarchy.</p>
      <sec id="sec-3-1">
        <title>3https://www.creativecommons.org 4http://www.mmcommons.org</title>
        <p>If their con dence was su ciently high, participants could
naturally directly estimate the geographic coordinate of the
photo/video instead of choosing a node from the hierarchy.</p>
        <p>As our place hierarchy we used the Places expansion pack
of the YFCC100M dataset, in which each geotagged photo and
video is geotagged to its corresponding place, which follows
a variation of the general hierarchy:
Country!State!City!Neighborhood
Due to the use of the hierarchy, only photos and videos that
were successfully reverse geocoded were included in this
subtask, and thus media captured in or above international
waters were excluded.</p>
        <p>Veri cation-based sub-task: In this sub-task,
participants were given a photo or video and a place from the
hierarchy, and were asked to verify whether or not the media
item was really captured in the given place. In the test set,
we randomly switched the locations of 50% of the photos
and videos, where we required that those switched were at
least taken in a di erent country. Then, for 25% of the
media items we removed the neighborhood level and below, for
25% the city level and below, and for 25% the state level and
below, enabling us to assess how the level of the hierarchy
a ects the veri cation quality of the participants' systems.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>RUNS</title>
      <p>Participants may submit up to ve attempts (`runs') for
each sub-task. They can make use of the provided
metadata and precomputed features, as well as external resources
(e.g. gazetteers, dictionaries, Web corpora), depending on
the run type. We distinguish between the following ve run
types:
Run 1: Only provided textual metadata may be used.
Run 2: Only provided visual &amp; aural features may be used.
Run 3: Only provided textual metadata, visual features
and the visual &amp; aural features may be used.</p>
      <p>Run 4{5: Everything is allowed, except for crawling the
exact items contained in the test set.</p>
    </sec>
    <sec id="sec-5">
      <title>EVALUATION</title>
      <p>
        For the estimation-based sub-task, the evaluation metric is
based on the geographic distance between the ground truth
coordinate and the predicted coordinate or place from the
hierarchy. Whenever a participant estimates a place from
the hierarchy, we substitute it by its geographic centroid.
We measure geographic distances with Karney's formula [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ];
this formula is based on the assumption that the shape of
the Earth is an oblate spheroid, which produces more
accurate distances than methods such as the great-circle distance
that assume the shape of the Earth to be a sphere. For the
veri cation-based sub-task, we measure the classi cation
accuracy.
As task organizers, we provided two open source baselines5
to the participants, one for the estimation sub-task and one
for the veri cation sub-task. Additionally, we implemented
a live leaderboard that allowed participants to submit runs
and view their relative standing towards others, as evaluated
on a representative development set (i.e. part of the, but not
the complete, test set).
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hau</surname>
          </string-name>
          ,
          <string-name>
            <surname>O. Van Laere</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Thomee</surname>
          </string-name>
          .
          <article-title>The Placing Task at MediaEval 2015</article-title>
          .
          <source>In Working Notes of the MediaEval Benchmarking Initiative for Multimedia Evaluation</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ekambaram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kelm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gottlieb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sikora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ramchandran</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Friedland</surname>
          </string-name>
          .
          <article-title>Human vs machine: establishing a human baseline for multimodal location estimation</article-title>
          .
          <source>In Proceedings of the ACM International Conference on Multimedia</source>
          , pages
          <volume>867</volume>
          {
          <fpage>876</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thomee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Friedland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Borth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Elizalde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gottlieb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Carrano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pearce</surname>
          </string-name>
          , et al.
          <article-title>The Placing Task: a large-scale geo-estimation challenge for social-media videos and images</article-title>
          .
          <source>In Proceedings of the ACM International Workshop on Geotagging and Its Applications in Multimedia</source>
          , pages
          <volume>27</volume>
          {
          <fpage>31</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Hau</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Houben</surname>
          </string-name>
          .
          <article-title>Placing images on the world map: a microblog-based enrichment approach</article-title>
          .
          <source>In Proceedings of the ACM Conference on Research and Development in Information Retrieval</source>
          , pages
          <volume>691</volume>
          {
          <fpage>700</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Karney</surname>
          </string-name>
          .
          <article-title>Algorithms for geodesics</article-title>
          .
          <source>Journal of Geodesy</source>
          ,
          <volume>87</volume>
          (
          <issue>1</issue>
          ):
          <volume>43</volume>
          {
          <fpage>55</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kelm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schmiedeke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Friedland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ekambaram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ramchandran</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Sikora</surname>
          </string-name>
          .
          <article-title>A novel fusion method for integrating multiple modalities and knowledge for multimodal location estimation</article-title>
          .
          <source>In Proceedings of the ACM International Workshop on Geotagging and Its Applications in Multimedia</source>
          , pages
          <volume>7</volume>
          {
          <fpage>12</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kelm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thomee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Trevisiol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Choi</surname>
          </string-name>
          , O. van
          <string-name>
            <surname>Laere</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Schockaert</surname>
            , G. Jones,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Serdyukov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Murdock</surname>
            , and
            <given-names>G. Friedland.</given-names>
          </string-name>
          <article-title>The benchmark as a research catalyst: charting the progress of geo-prediction for social multimedia</article-title>
          .
          <source>In Multimodal Location Estimation of Videos and Images</source>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rae</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Kelm</surname>
          </string-name>
          .
          <source>Working Notes for the Placing Task at MediaEval</source>
          <year>2012</year>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Serdyukov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Murdock</surname>
          </string-name>
          , and R. van Zwol.
          <article-title>Placing Flickr photos on a map</article-title>
          .
          <source>In Proceedings of the ACM Conference on Research and Development in Information Retrieval</source>
          , pages
          <volume>484</volume>
          {
          <fpage>491</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B.</given-names>
            <surname>Thomee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Shamma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Friedland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Elizalde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Poland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Borth</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>YFCC100M: The new data in multimedia research</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>59</volume>
          (
          <issue>2</issue>
          ):
          <volume>64</volume>
          {
          <fpage>73</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Trevisiol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jegou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Delhumeau</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Gravier. Retrieving</surname>
          </string-name>
          geo
          <article-title>-location of videos with a divide &amp; conquer hierarchical multimodal approach</article-title>
          .
          <source>In Proceedings of the ACM International Conference on Multimedia Retrieval</source>
          , pages
          <fpage>1</fpage>
          <issue>{8</issue>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>