=Paper= {{Paper |id=None |storemode=property |title=Exploration of Feature Combination in Geo-visual Ranking for Visual Content-based Location Prediction |pdfUrl=https://ceur-ws.org/Vol-1043/mediaeval2013_submission_33.pdf |volume=Vol-1043 |dblpUrl=https://dblp.org/rec/conf/mediaeval/LiRLH13 }} ==Exploration of Feature Combination in Geo-visual Ranking for Visual Content-based Location Prediction== https://ceur-ws.org/Vol-1043/mediaeval2013_submission_33.pdf

Exploration of Feature Combination in Geo-visual Ranking
for Visual Content-based Location Prediction

Xinchao Li1 , Michael Riegler12 , Martha Larson1 , Alan Hanjalic1
1
Multimedia Information Retrieval Lab, Delft University of Technology
2
Klagenfurt University
{x.li-3,m.a.larson,a.hanjalic}@tudelft.nl, michael.riegler@edu.uni-klu.ac.at

ABSTRACT
In this paper, we present a visual-content-based approach
that predicts where in the world a social image was tak-
en. We employ a ranking method that assigns a query pho-
to the geo-location of its most likely geo-visual neighbor in
the social image collection. The experiments carried out on
the MediaEval Placing Task 2013 data set support the con-
clusion that the exploration of candidate photo’s geo-visual
neighbors and the combination of local and global image fea-
tures can improve the prediction accuracy of visual-content-
Figure 1: Geo-visual ranking system overview
based geo-location estimation system.

1. INTRODUCTION ing, the same as in [2], we perform geo-visual expansion of
The research question of the Placing Task is how to es- each candidate photo, a, to create a geo-visual expansion set
timate the geo-location of one image, given its image at- Ea based on local features. The candidate photos are then
tributes, and all available metadata [1]. ranked by P (q|a), which reﬂects the closeness of their sim-
A variety of information sources have been exploited for ilarity to the query photo q. Formally, P (q|a) is expressed
predicting geo-location. User-contributed text annotations as,
have been used as a basis of a large range of successful geo- ∑
coordinate predication algorithms [3]. This work exploits P (q|a) ∝ Simvis (e, q) (1)
the natural link between text annotation and location (e.g., e∈Ea

tags often include place names and other location-speciﬁc where Ea is the set of geographically nearby photos of photo
vocabulary) in order to predict at which location around a with high visual similarities to query q. Then, in the
the globe a photo was taken. The drawback of textual an- ﬁnal step, Location Propagation, the geo-location of the top
notations (i.e., metadata) is that it needs to be manually ranked photo is propagated to the query photo.
created by the user, a time consuming task. As a result,
a large percentage of images are not associated with any
tags and cannot be geo-located with text based approaches
3. EXPERIMENTAL FRAMEWORK
(13.4% of test photos of the 2013 data set do not contain any
tags). As an appealing alternative to text-based approach-
3.1 Dataset
es, in this paper, we present a visual-content-based approach The proposed system is evaluated on a set of 8, 801, 050
for geo-coordinate prediction. geo-tagged Flickr1 photos released by the MediaEval 2013
Placing Task [1]. Since the release includes only the metada-
2. SYSTEM DESCRIPTION ta and not the images themselves, we re-crawled the images
using the links in the metadata. Because some photos were
Our approach consists of four steps, as depicted in the sys-
removed after the dataset was collected, the ﬁnal re-crawled
tem overview, Fig. 1. In the ﬁrst step, Local Feature-based
collection contains 8, 799, 260 photos, 8, 537, 368 for training
Image Retrieval, we create a set of candidate photos for giv-
and 261, 892 for test.
en query q by retrieving all visually similar images based
on local features from the database up to a visual similarity 3.2 Calculating visual similarity
threshold, k. In the second step, Global Feature-based Im-
Our approach to geo-location prediction exploits visual
age Selection, we rank all the candidate photos from step one
similarity between photos. To calculate visual similarity
by their visual similarity with the query based on global fea-
based on local features, we choose SURF, and the bag-of-
tures, and select the top t ranked photos as the ﬁnal selected
visual-words scheme to build the search engine. To calcu-
candidate set Evis@k&t . In the third step, Geo-Visual Rank-
late visual similarity based on global features, we use Joint
Composite Descriptor (JCD), which encodes the color, edge
directivity, and texture histogram of the image.
Copyright is held by the author/owner(s). 1
MediaEval 2013 Workshop, October 18-19, 2013, Barcelona, Spain http://www.ﬂickr.com/
4. EXPERIMENTAL RESULTS
4.1 General performance evaluation
RUN 1: Baseline, the system only applies Local Feature-
based Image Retrieval, and uses the geo-location of
the local feature-based visually most similar photo
as estimated location.
RUN 2: Baseline + Geo-Visual Ranking
RUN 3: Baseline + Global Feature-based Image Selec-
tion
RUN 4: Baseline + Global Feature-based Image Selec-
tion + Geo-Visual Ranking

The run results are presented in Table 1. In evaluation Figure 2: Distribution of queries over diﬀerent levels
radius 1km, compared with the baseline method, Run1, of geo-visual redundancy.
Run2 achieves about 18% improvement and Run3 achieves
about 17.7% improvement. The best performed one, Run4,
achieves about 37.4% improvement, which is more than the
sum of the previous two.

Table 1: Run results (261, 892 photos): percentage of
test photos located within {1, 10, 100, 500, 1000}km of
the ground truth.
<1km <10km <100km <500km <1000km
Run1 2.0% 2.6% 3.5% 7.9% 14.6%
Run2 2.4% 3.1% 4.2% 8.5% 14.8%
Run3 2.4% 3.1% 4.0% 8.4% 15.3%
Run4 2.8% 3.7% 4.7% 9.2% 15.9%

4.2 Experimental analysis
Figure 3: Prediction accuracy within 1km for queries
As the query photo is from a social image collection, there
with diﬀerent levels of geo-visual redundancy.
are certain properties in the collection that can aﬀect the
prediction accuracy of visual-content-based approaches. For
example, queries about one popular landmark and queries Feature-based Image Selection can also beneﬁt the sys-
about an individual user’s car may get diﬀerent prediction tem with Geo-Visual Ranking , especially for queries with
performance. For the purpose of our investigation, we deﬁne medium geo-visual redundancy level.
geo-visual redundancy for a given query photo as the number
of photos that are taken within 1km radius of the query 5. CONCLUSION
photo and also ranked in the top 10, 000 of the rank list
We have presented a ranking approach addressing the
of the query from the local feature-based image retrieval
challenging task of predicting geo-location using only the
system.
visual content of images. The main observation is that the
The distribution of the query photos over the geo-visual
combination of local and global image features can compen-
redundancy ranges is illustrated in Fig. 2. Over half of the
sate each other, and together with geo-visual ranking, they
queries do not have visually similar photos within their geo-
improve the prediction accuracy of visual-content-based geo-
neighborhood, which suggests that in these cases there is not
location estimation system. Future work will include an in-
another photo in the dataset that depicts the same scene or
vestigation of the optimal role of local and global features
object at the query location. This observation demonstrates
within the geo-visual ranking scheme for visual-content-based
how challenging it is to predict the geo-location of a social
geo-location estimation.
image purely from its visual content.
Fig. 3 breaks down the geo-location prediction perfor- 6. REFERENCES
mance over diﬀerent levels of geo-visual redundancy. Com-
paring Run3 with Run1, we see that the Global Feature- [1] C. Hauﬀ, B. Thomee, and M. Trevisiol. Working Notes
based Image Selection step can improve the local feature- for the Placing Task at MediaEval 2013, 2013.
based system for queries with diﬀerent geo-visual redun- [2] X. Li, M. Larson, and A. Hanjalic. Geo-visual ranking
dancy level. Comparing Run2 with Run1 and Run4 with for location prediction of social images. In Proc. ICMR
Run3, we see that the Geo-Visual Ranking step can boost ’13, 2013.
the performance for queries with high geo-visual redundan- [3] P. Serdyukov, V. Murdock, and R. van Zwol. Placing
cy. Comparing Run4 with Run2, we see that the Global Flickr photos on a map. In Proc. SIGIR ’09, 2009.