<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Adrian Popescu</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>CEA, LIST</institution>
          ,
          <addr-line>91190 Gif-sur-Yvette</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>CERTH-ITI</institution>
          ,
          <addr-line>Thermi-Thessaloniki</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <fpage>16</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>We describe the participation of the USEMP team in the Placing Task at MediaEval 2014. We submitted four textual runs which are inspired by CEA LIST's 2013 participation. Our entries are based on probabilistic place modeling but also exploit machine tag and/or user modeling. The best results were obtained when all these types of information are combined. The accuracy of automatic at 1km reaches 0:235 when using only training data provided by organizers and 0:441 with the use of external data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The goal of the task is to produce location estimates for a
test set of 500,000 images and videos using a set of
approximately ve million geotagged images and videos and their
metadata for training. A full description of the challenge
and of the associated dataset is provided in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Our runs
were implemented using, for a large part, methods described
in CEA LIST's participation at Placing Task 2013 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. For
this reason, after a short presentation of the methods, runs
and obtained results, we focus on failure analysis.
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>METHOD DESCRIPTION</title>
    </sec>
    <sec id="sec-3">
      <title>Probabilistic location models</title>
      <p>
        Language models are successfully introduced in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] as an
alternative to gazetteer-based geolocation and were
progressively improved in following years. Test photos can be placed
anywhere in the physical world and the training data
provided by the organizers is insu cient in order to build robust
probabilistic models. To verify the assumption that better
results are obtained with the use of more data, we exploited:
(1) all geotagged metadata from the YFCC dataset 1, after
removing all test items and (2) an additional set of 90
million geotagged metadata from Flickr.
      </p>
      <p>
        Similar to last year [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the surface of the earth was split
in (nearly) rectangular cells of size 0.01 of latitude and
longitude degree (approximately 1km2 size). User counts were
used instead of tag counts in order to mitigate the in uence
of bulk tagging. Both titles and tags were taken into
account and are referred to as tags hereafter. Put simply, we
1http://webscope.sandbox.yahoo.com/
catalog.php?datatype=i&amp;did=67
computed the probability of a tag in a cell by dividing its
user count in that cell by its total user counts in all cells.
Given a test item, we simply summed-up contributions of
individual tags to nd the most probable cell for that item.
Finally, the photo was placed at the center of the cell.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Machine tag modeling</title>
      <p>
        The authors of [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] show that machine tags can improve
automatic geotagging quality. In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] we propose a machine tag
processing method which models only machine tags which
are strongly associated to locations. (i.e. Foursquare, Lastfm
and Upcoming entries) and we exploited it this year.
2.3
      </p>
    </sec>
    <sec id="sec-5">
      <title>User modeling</title>
      <p>
        If images do not have associated tags or if these tags are
not geographically discriminant, placing photos with
probabilistic models is likely to fail. To overcome this problem, we
exploited a simple user modeling technique [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which
computes the most probable cell of a user. Only photos which
are at least 24 hours away from any of the user's test set
images were exploited to reduce the risk of learning from
test data. We downloaded up to 500 geotagged images per
user in order to determine her most probable cell.
2.4
      </p>
    </sec>
    <sec id="sec-6">
      <title>Fusion</title>
      <p>We propose a late fusion scheme which is empirically
derived from tests with a validation dataset. Since they are
associated to precise locations or geolocated events, processed
machine tags are very reliable and were used in priority. If
there were no machine tags, location models were exploited
to predict the most probable location of a set of tags.
Finally, if there were no tags available or if the prediction score
was below a threshold, the photo was placed in the most
probable cell of the user who uploaded it. The threshold
for replacing location models with user models was
empirically determined on the validation dataset. We exploited
user models for the 30% of test images which had the lowest
placing scores.
3.</p>
    </sec>
    <sec id="sec-7">
      <title>RUNS</title>
      <p>We submitted the following runs: RU N1 - exploited
location models and machine tags from training data provided
by the organizers; RU N3 - combined location models and
machine tags from the entire geotagged YFCC dataset, after
excluding test items; RU N4 exploited tags and user models;
RU N5 - exploited YFCC location models, machine tags and
user models. We present the performance of the submitted
runs in Table 1. The best results were obtained when
com</p>
      <p>
        P@X km
1 10
0.235 0.408
0.428 0.582
0.418 0.597
0.441 0.613
bining all types of available information. As expected, the
largest contribution was due to location models. The large
gap between RU N1 and the others con rms that the use of
supplementary training data is very bene cial. The di
erence of precision at close range (P@0.1) between RU N3 and
RU N4 con rms that machine tags are very useful for
precise geolocation. Inversely, if larger errors are admitted, user
models become more useful than machine tags. The
combination of these types of cues in RU N5 gives the best
performance for all precision ranges. The results obtained this
year are in the same range as those we reported in 2013 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
con rming thus that our geolocation pipeline has consistent
behavior over di erent datasets.
      </p>
    </sec>
    <sec id="sec-8">
      <title>FAILURE ANALYSIS</title>
      <p>
        In addition to the submitted runs, we tested other con
gurations which gave lower results and brie y describe them
here. We notably tried a combination of location models and
gazetteer information in order to give a privileged role to
toponyms such as administrative division names (i.e.
countries, regions, cities). The addition of the gazetteer gave
lower results compared to the sole use of location models.
This negative result could be explained by the strong
ambiguity which characterizes the geographic domain. As we
mentioned, we also tried to add a dataset of 90 million
geotagged metadata to the YFCC full training data.
Contrarily to existing literature [
        <xref ref-type="bibr" rid="ref2 ref4">4, 2</xref>
        ], the use of this
supplementary dataset actually degraded the overall quality of results.
This negative result might indicate that probabilistic models
reach saturation when too much metadata are available.
      </p>
      <p>In Figures 1 and 2, we present a visualization of
geotagging performance for RU N1 and RU N5 and the performance
di erence between the two runs is clearly re ected .
Geotagging is precise in most European regions and worse for
the other regions. Low performances can be easily explained
by sparse data for Africa, Asia or South America. However,
the imprecision is also high for the United States, the region
of the world which concentrates the largest number of
geotagged images. In this case, poor geotagging could be due to
a very high ambiguity of place names. For instance, there
are dozens of places called London or Paris in the US. If
there is not enough disambiguation information associated
to them in annotations, photos tagged with these toponyms
will be placed in Europe.
5.</p>
    </sec>
    <sec id="sec-9">
      <title>FUTURE WORK</title>
      <p>Due to lack of time, we did not submit a visual run this
year. While visual geotagging still lags well behind
textual geotagging, it would be interesting to explore if it is
possible to predict coordinates accurately at least for
visually distinctive objects such as Points of Interest.
Regarding text models, we would like to investigate in more depth
why adding more data from outside YFCC degrades
performance. Equally interesting, it would be interesting to
investigate ways to select reliable annotations before computing
location models.</p>
    </sec>
    <sec id="sec-10">
      <title>ACKNOWLEDGMENT 6. 7.</title>
      <p>This work is supported by the USEMP FP7 project, partly
funded by the EC under contract number 611596.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Choi</surname>
          </string-name>
          and al.
          <article-title>The placing task: A large-scale geo-estimation challenge for social-media videos and images</article-title>
          .
          <source>In Proc. of GeoMM'14.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Popescu</surname>
          </string-name>
          .
          <article-title>Cea list's participation at mediaeval 2013 placing task</article-title>
          .
          <source>In MediaEval</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Serdyukov</surname>
          </string-name>
          and al.
          <article-title>Placing ickr photos on a map</article-title>
          .
          <source>In Proc. of SIGIR</source>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Trevisiol</surname>
          </string-name>
          and al.
          <string-name>
            <surname>Retrieving</surname>
          </string-name>
          geo
          <article-title>-location of videos with a divide &amp; conquer hierarchical multimodal approach</article-title>
          .
          <source>In ICMR, pages 1{8</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>