=Paper= {{Paper |id=Vol-1263/paper70 |storemode=property |title=USEMP at MediaEval Placing Task 2014 |pdfUrl=https://ceur-ws.org/Vol-1263/mediaeval2014_submission_70.pdf |volume=Vol-1263 |dblpUrl=https://dblp.org/rec/conf/mediaeval/PopescuPK14 }} ==USEMP at MediaEval Placing Task 2014== https://ceur-ws.org/Vol-1263/mediaeval2014_submission_70.pdf
                      USEMP at MediaEval Placing Task 2014

                     Adrian Popescu1 , Symeon Papadopoulos2 , Ioannis Kompatsiaris2
                            1
                                CEA, LIST, 91190 Gif-sur-Yvette, France, adrian.popescu@cea.fr
                            2
                                CERTH-ITI, Thermi-Thessaloniki, Greece, {papadop,ikom}@iti.gr




ABSTRACT                                                           computed the probability of a tag in a cell by dividing its
We describe the participation of the USEMP team in the             user count in that cell by its total user counts in all cells.
Placing Task at MediaEval 2014. We submitted four textual          Given a test item, we simply summed-up contributions of
runs which are inspired by CEA LIST’s 2013 participation.          individual tags to find the most probable cell for that item.
Our entries are based on probabilistic place modeling but          Finally, the photo was placed at the center of the cell.
also exploit machine tag and/or user modeling. The best
results were obtained when all these types of information
                                                                   2.2    Machine tag modeling
are combined. The accuracy of automatic at 1km reaches               The authors of [4] show that machine tags can improve au-
0.235 when using only training data provided by organizers         tomatic geotagging quality. In [2] we propose a machine tag
and 0.441 with the use of external data.                           processing method which models only machine tags which
                                                                   are strongly associated to locations. (i.e. Foursquare, Lastfm
                                                                   and Upcoming entries) and we exploited it this year.
1.    INTRODUCTION
  The goal of the task is to produce location estimates for a      2.3    User modeling
test set of 500,000 images and videos using a set of approx-          If images do not have associated tags or if these tags are
imately five million geotagged images and videos and their         not geographically discriminant, placing photos with proba-
metadata for training. A full description of the challenge         bilistic models is likely to fail. To overcome this problem, we
and of the associated dataset is provided in [1]. Our runs         exploited a simple user modeling technique [2], which com-
were implemented using, for a large part, methods described        putes the most probable cell of a user. Only photos which
in CEA LIST’s participation at Placing Task 2013 [2]. For          are at least 24 hours away from any of the user’s test set
this reason, after a short presentation of the methods, runs       images were exploited to reduce the risk of learning from
and obtained results, we focus on failure analysis.                test data. We downloaded up to 500 geotagged images per
                                                                   user in order to determine her most probable cell.
2.    METHOD DESCRIPTION
                                                                   2.4    Fusion
2.1    Probabilistic location models                                  We propose a late fusion scheme which is empirically de-
   Language models are successfully introduced in [3] as an        rived from tests with a validation dataset. Since they are as-
alternative to gazetteer-based geolocation and were progres-       sociated to precise locations or geolocated events, processed
sively improved in following years. Test photos can be placed      machine tags are very reliable and were used in priority. If
anywhere in the physical world and the training data pro-          there were no machine tags, location models were exploited
vided by the organizers is insufficient in order to build robust   to predict the most probable location of a set of tags. Fi-
probabilistic models. To verify the assumption that better         nally, if there were no tags available or if the prediction score
results are obtained with the use of more data, we exploited:      was below a threshold, the photo was placed in the most
(1) all geotagged metadata from the YFCC dataset 1 , after         probable cell of the user who uploaded it. The threshold
removing all test items and (2) an additional set of ∼90           for replacing location models with user models was empir-
million geotagged metadata from Flickr.                            ically determined on the validation dataset. We exploited
   Similar to last year [2], the surface of the earth was split    user models for the 30% of test images which had the lowest
in (nearly) rectangular cells of size 0.01 of latitude and lon-    placing scores.
gitude degree (approximately 1km2 size). User counts were
used instead of tag counts in order to mitigate the influence      3.    RUNS
of bulk tagging. Both titles and tags were taken into ac-
                                                                      We submitted the following runs: RU N1 - exploited loca-
count and are referred to as tags hereafter. Put simply, we
                                                                   tion models and machine tags from training data provided
1
  http://webscope.sandbox.yahoo.com/                               by the organizers; RU N3 - combined location models and
catalog.php?datatype=i&did=67                                      machine tags from the entire geotagged YFCC dataset, after
                                                                   excluding test items; RU N4 exploited tags and user models;
                                                                   RU N5 - exploited YFCC location models, machine tags and
Copyright is held by the author/owner(s).                          user models. We present the performance of the submitted
MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain     runs in Table 1. The best results were obtained when com-
                               P@X km
     Run    0.01     0.1       1    10         100    1000
     #1    0.007    0.016    0.235 0.408      0.481   0.618
     #3    0.026    0.043    0.428 0.582      0.644   0.753
     #4      0      0.012    0.418 0.597      0.679   0.779
     #5    0.026    0.043    0.441 0.613      0.691   0.787

              Table 1: P@X precision at X km.




                                                                    Figure 2: Average error plot for RU N5 . Red/blue dots cor-
                                                                    respond to large errors/precise geotagging respectively.


                                                                    difference between the two runs is clearly reflected . Geo-
                                                                    tagging is precise in most European regions and worse for
                                                                    the other regions. Low performances can be easily explained
                                                                    by sparse data for Africa, Asia or South America. However,
                                                                    the imprecision is also high for the United States, the region
Figure 1: Average error plot for RU N1 . Red/blue dots cor-         of the world which concentrates the largest number of geo-
respond to large errors/precise geotagging respectively.            tagged images. In this case, poor geotagging could be due to
                                                                    a very high ambiguity of place names. For instance, there
                                                                    are dozens of places called London or Paris in the US. If
bining all types of available information. As expected, the         there is not enough disambiguation information associated
largest contribution was due to location models. The large          to them in annotations, photos tagged with these toponyms
gap between RU N1 and the others confirms that the use of           will be placed in Europe.
supplementary training data is very beneficial. The differ-
ence of precision at close range (P@0.1) between RU N3 and          5.   FUTURE WORK
RU N4 confirms that machine tags are very useful for pre-              Due to lack of time, we did not submit a visual run this
cise geolocation. Inversely, if larger errors are admitted, user    year. While visual geotagging still lags well behind tex-
models become more useful than machine tags. The com-               tual geotagging, it would be interesting to explore if it is
bination of these types of cues in RU N5 gives the best per-        possible to predict coordinates accurately at least for visu-
formance for all precision ranges. The results obtained this        ally distinctive objects such as Points of Interest. Regard-
year are in the same range as those we reported in 2013 [2],        ing text models, we would like to investigate in more depth
confirming thus that our geolocation pipeline has consistent        why adding more data from outside YFCC degrades perfor-
behavior over different datasets.                                   mance. Equally interesting, it would be interesting to inves-
                                                                    tigate ways to select reliable annotations before computing
4.   FAILURE ANALYSIS                                               location models.
   In addition to the submitted runs, we tested other config-
urations which gave lower results and briefly describe them         6.   ACKNOWLEDGMENT
here. We notably tried a combination of location models and           This work is supported by the USEMP FP7 project, partly
gazetteer information in order to give a privileged role to         funded by the EC under contract number 611596.
toponyms such as administrative division names (i.e. coun-
tries, regions, cities). The addition of the gazetteer gave         7.   REFERENCES
lower results compared to the sole use of location models.
                                                                    [1] J. Choi and al. The placing task: A large-scale
This negative result could be explained by the strong am-
                                                                        geo-estimation challenge for social-media videos and
biguity which characterizes the geographic domain. As we
                                                                        images. In Proc. of GeoMM’14.
mentioned, we also tried to add a dataset of ∼90 million
geotagged metadata to the YFCC full training data. Con-             [2] A. Popescu. Cea list’s participation at mediaeval 2013
trarily to existing literature [4, 2], the use of this supplemen-       placing task. In MediaEval, 2013.
tary dataset actually degraded the overall quality of results.      [3] P. Serdyukov and al. Placing flickr photos on a map. In
This negative result might indicate that probabilistic models           Proc. of SIGIR 2009.
reach saturation when too much metadata are available.              [4] M. Trevisiol and al. Retrieving geo-location of videos
   In Figures 1 and 2, we present a visualization of geotag-            with a divide & conquer hierarchical multimodal
ging performance for RU N1 and RU N5 and the performance                approach. In ICMR, pages 1–8, 2013.