=Paper= {{Paper |id=None |storemode=property |title=CEA LIST’s Participation at MediaEval 2013 Placing Task |pdfUrl=https://ceur-ws.org/Vol-1043/mediaeval2013_submission_59.pdf |volume=Vol-1043 |dblpUrl=https://dblp.org/rec/conf/mediaeval/Popescu13 }} ==CEA LIST’s Participation at MediaEval 2013 Placing Task== https://ceur-ws.org/Vol-1043/mediaeval2013_submission_59.pdf

CEA LIST’s Participation at MediaEval 2013 Placing Task

Adrian Popescu
CEA, LIST, Vision & Content Engineering Laboratory, 91190 Gif-sur-Yvette, France.
adrian.popescu@cea.fr

1. ABSTRACT models based only on internal training data provided by or-
At MediaEval Placing Task 2013 [2], we focused on im- ganizers (around 8.5 million Flickr metadata pieces) [2] but
proving our last year’s participation in four directions: (1) also by adding an external training data set of 90 million
exploit a larger geotagged dataset in order to improve the items. The first model includes 78897 unique tags and the
quality of a standard geolocation language model, (2) model second contains 128488 unique tags from the test set. The
machine tags, (3) estimate the geographicity of tags associ- difference with the total number of tags from the test set
ated to geolocated photos and (4) exploit user cues in order indicates that an important number of Flickr tags are used
complement language models whenever these last are likely by only one user and have little social relevance. Given a
to fail. Obtained results show that all modifications pro- test item, we simply sum up contributions of individual tags
posed this year have a positive effect. A “standard” based to find the most probable cell of that item. For each cell,
only on the training data (cues (1)+(2)) has the poorest we determine the most probable location by averaging the
performance, with P@1km = 0.268, while P@1km = 0.434 latitudes and longitudes of photos in that cell and use these
when all cues are used. coordinates after the detection of the most probable cell.

4. MACHINE TAGS PROCESSING
2. INTRODUCTION
Machine tags are associated to Flickr data either man-
Language models were successfully introduced in [4] as an ually or automatically and some of them give very precise
alternative to gazetteer based geolocation and refined pro- geolocation information. Geotags (latitude and longitude
gressively in different editions of MediaEval Placing Task. triples) are an obvious type of machine tags that can be ex-
The best performing state of the art systems combine lan- ploited. Since they are provided in a modified format (no
guage models and user modeling [5], [3]. The search space information about the sign of the coordinate and no dec-
in Placing Task is very wide (the physical word) and it is imals), we learned their correlation with real coordinates
only partially covered by the training data provided by the from the internal and the external training sets. P@1km
organizers [2]. We complemented this training set with ex- varies from 0.99 for foursquare to 0.97 for upcoming. We
ternal Flickr data, from which we removed all test set items, obtained the following coverage with internal and respec-
in order to study the effect of dataset size. If properly mod- tively external models: foursquare - 1604, respectively 6031
eled, machine tags give very precise information about a test items; geotags - 10954, respectively 13783 items; lastfm
photo’s location [5] and here we propose a method to ex- - 90, respectively 1347 items; upcoming - 292, respectively
ploit them in priority. Geographicity (i.e. the geographic 955 items. Whenever a photo has associated machine tags,
intent) of textual annotations was poorly studied and we we exploit them instead of standard language models and
the geographicity of individual tags using spatial statistical they are cascaded in descending P@1km scores obtained on
technique. Finally, user modeling introduces a supplemen- a part of the training set.
tary constraint since we need to have user data available but
this condition is fulfilled for most social networks.
5. TAG GEOGRAPHICITY
Geographicity is a property that was studied [1] but is still
3. LANGUAGE MODELS a hot topic, especially for ambiguous and rare tags, which
Similarly to last year [3], the surface on the Earth is split are targeted with our method. The objective here is to find
in (nearly) rectangular in rectangles of size 0.01 of latitude a criterion that separates tags that are well localized from
and longitude degree (approximately 1km size). Since the other tags and, consequently, to be able to estimate if a
proposed tag modeling is independent of the presence/absence photo can be geolocalized precisely or not. For instance, Cat
of other tags, we kept only the tags that appear in the test is not spatially discriminant, Cambridge is discriminant but
set #5 (262,000 test images), totaling over 155000 tags in is highly ambiguous while Torre Agbar is spatially discrimi-
order to speed up computation. In order to estimate the nant and appears in a single place. These differences should
effect of rare tags and contrarily to last year, we consid- be reflected by the geographicity score which is calculated
ered all tags regardless of their user frequency. To mitigate by computing the probability of a tag to appear around its
the effects of bulk tagging, cell tag probability is computed most probable cells from the language models. At most 10%
as the number of different users that used the tag in the of the top cells (i.e. cells with most photos in them) but no
cell divided by the overall tag’s user count. We computed more than five are retained as seeds, with a minimum dis-
tance of 50 km between two seeds. Then we compute the
probability of a tag to appear in a radius of 15 km around
Copyright is held by the author/owner(s).
MediaEval 2013 Workshop October 18-19, 2013, Barcelona, Spain all seeds. Several cells are retained in order to deal with
. ambiguous tags, such as Cambridge. The radius is chosen
Table 1: Geographicity score vs. geolocation preci- Table 2: P@X precision at X km. err@1 - median
sion in 0.2 intervals. error at 1 km.
Decile 0.2 0.4 0.6 0.8 1 Run P@0.1 P@1 P@10 P@100 P@1000 err@1
Items 28163 23732 22463 16880 122971 #1 0.074 0.26 0.43 0.5 0.63 98.8
P@1km 0.005 0.02 0.051 0.169 0.334 #2 0.067 0.38 0.58 0.67 0.79 3.45
P@10km 0.024 0.071 0.128 0.439 0.716 #3 0.133 0.43 0.62 0.71 0.81 2.07
#4 0.132 0.43 0.63 0.72 0.83 2.08

in order to cover tags whose geographical span goes from
very localized to city scale, which are exploitable in order to geographicity and user modeling; RUN4 - RUN3 and the
localized items with city scale precision. use of temporal cues.
The geographicity score of a tag (geo) is defined between
0 (non-discriminant) and 1 (perfectly discriminant). For in- We present the performances of the different runs in Ta-
stance, a photo tagged with cat is a priori harder to pin on ble 2. The exploitation of decoded geotags, introduced in
the map than a photo tagged with Cambridge, which is in [5], is debatable since one could claim they can be assimi-
its turn harder to localize than a photo tagged with Torre lated to training information. They make for around 4% of
Agbar. We select 214214 tagged photos from the training the dataset for internal models and 5% for external location
set and use the rest of it to create location models. The models. Without their use, geolocation scores would be re-
results presented in table 1 indicate that there is a correla- duced by less than 4% and 5% and performances remain in-
tion between geographicity scores and localization precision. teresting with respect to scores reported in past campaigns.
Photos whose max geographicity score (geo <= 0.2) is small The comparison of RUN1 with the other runs indicates
are very hard to localize and, in this case, user cues could that adding external data to language models has a positive
be used instead of location models. effect on performances. In particular, RUN2 is similar to
Although obtained scores often make sense, we noticed RUN1 but it exploits a much larger training set. The use of
two pitfalls that are probably due the incompleteness and supplementary data results in more robust language models
noisy character of Flickr annotations. First, there are some and we hypothesize that adding even more supplementary
very rare tags whose geographicity score is 1 while they training data would further improve results. The superior
are not geographically discriminant. Second, the proposed performances of RUN3 indicate that adding user modeling is
approach is not fitted to large entities such as regions or beneficial since precision is improved at all scales. RUN3 and
countries since their surface is much larger than the radius RUN4 have nearly equal performances up to 10 km precision
chosen to model geographicity. One interesting finding is and the introduction of temporal cues is only useful at larger
that around 35% of the test set only contains tags with scales. This result is probably explained by the fact that it
geo <= 0.6. In such cases, accurate geolocation would be is usually improbable for users to move in regions of size
difficult regardless of the location models used since there is greater than tens of kilometers.
no precise spatial information associated to the images. We didn’t have time to submit visual runs but we plan to
implement a two stage approach in which a global feature
6. USER MODELING is used to retrieve a number of similar images and then a
Last year [3], we proposed a simple user modeling that geometric check is performed to find images that depict the
extracted the user’s top cell (i.e. the cell including the high- same object. In function of the performances of visual pro-
est number of user photos). Since test and training sets cessing, we will decide about its integration in the geoloca-
provided by the organizers don’t share users, the modeling tion cascade. Currently, rare tags all have high geographic-
was realized with external resources. We have downloaded ity scores while only a part of them are actually useful. We
metadata for each user and, to avoid overfitting, we removed will study ways to separate useful rare tags from the others
all items whose time stamp is less than 24 hours from any in order to improve geolocation precision. Finally, we will
of the test item. Unlike last year, when nearly 25% of data build language models that don’t include any contributions
could was placed in the top cell, this year only 4% of user from test users to evaluate the effect of removing any prior
annotations are in the top cell. However, this percentage is knowledge about the test set.
much higher than that of photos with geo <= 0.3 and, in
such cases, user models replace language models. In addi- 8. REFERENCES
tion, geographicity is also used in conjunction with tempo- [1] Z. Cheng and al. You are where you tweet: a
ral metadata. If two images shared by the same user have content-based approach to geo-locating twitter users. In
timestamps within a 24 hours interval and their geographic- Proc. of CIKM 2010, 2010.
ity score difference is at least 0.2, we transfer coordinates [2] C. Hauff, B. Thomee, and M. Trevisiol. Working Notes
from the item with the larger score to the other. for the Placing Task at MediaEval 2013, 2013.
[3] A. Popescu and N. Ballas. Cea list’s participation at
7. RESULTS AND DISCUSSION mediaeval 2012 placing task. In MediaEval, 2012.
We have submitted the following runs, using a cascade of [4] P. Serdyukov and al. Placing flickr photos on a map. In
techniques in the order presented below: RUN1 - machine Proc. of SIGIR 2009.
tag detection and location models based exclusively on in- [5] M. Trevisiol and al. Retrieving geo-location of videos
ternal training data; RUN2 - machine tag detection and with a divide & conquer hierarchical multimodal
location models based all training data; RUN3 - machine approach. In ICMR, pages 1–8, 2013.
tag detection, location models based on all training data,