Exploration of Feature Combination in Geo-visual Ranking
      for Visual Content-based Location Prediction

                        Xinchao Li1 , Michael Riegler12 , Martha Larson1 , Alan Hanjalic1
                           1
                               Multimedia Information Retrieval Lab, Delft University of Technology
                                                     2
                                                       Klagenfurt University
                {x.li-3,m.a.larson,a.hanjalic}@tudelft.nl, michael.riegler@edu.uni-klu.ac.at


ABSTRACT
In this paper, we present a visual-content-based approach
that predicts where in the world a social image was tak-
en. We employ a ranking method that assigns a query pho-
to the geo-location of its most likely geo-visual neighbor in
the social image collection. The experiments carried out on
the MediaEval Placing Task 2013 data set support the con-
clusion that the exploration of candidate photo’s geo-visual
neighbors and the combination of local and global image fea-
tures can improve the prediction accuracy of visual-content-
                                                                       Figure 1: Geo-visual ranking system overview
based geo-location estimation system.

1. INTRODUCTION                                                   ing, the same as in [2], we perform geo-visual expansion of
   The research question of the Placing Task is how to es-        each candidate photo, a, to create a geo-visual expansion set
timate the geo-location of one image, given its image at-         Ea based on local features. The candidate photos are then
tributes, and all available metadata [1].                         ranked by P (q|a), which reﬂects the closeness of their sim-
   A variety of information sources have been exploited for       ilarity to the query photo q. Formally, P (q|a) is expressed
predicting geo-location. User-contributed text annotations        as,
have been used as a basis of a large range of successful geo-                                ∑
coordinate predication algorithms [3]. This work exploits                          P (q|a) ∝      Simvis (e, q)             (1)
the natural link between text annotation and location (e.g.,                                  e∈Ea

tags often include place names and other location-speciﬁc         where Ea is the set of geographically nearby photos of photo
vocabulary) in order to predict at which location around          a with high visual similarities to query q. Then, in the
the globe a photo was taken. The drawback of textual an-          ﬁnal step, Location Propagation, the geo-location of the top
notations (i.e., metadata) is that it needs to be manually        ranked photo is propagated to the query photo.
created by the user, a time consuming task. As a result,
a large percentage of images are not associated with any
tags and cannot be geo-located with text based approaches
                                                                  3.     EXPERIMENTAL FRAMEWORK
(13.4% of test photos of the 2013 data set do not contain any
tags). As an appealing alternative to text-based approach-
                                                                  3.1 Dataset
es, in this paper, we present a visual-content-based approach       The proposed system is evaluated on a set of 8, 801, 050
for geo-coordinate prediction.                                    geo-tagged Flickr1 photos released by the MediaEval 2013
                                                                  Placing Task [1]. Since the release includes only the metada-
2. SYSTEM DESCRIPTION                                             ta and not the images themselves, we re-crawled the images
                                                                  using the links in the metadata. Because some photos were
  Our approach consists of four steps, as depicted in the sys-
                                                                  removed after the dataset was collected, the ﬁnal re-crawled
tem overview, Fig. 1. In the ﬁrst step, Local Feature-based
                                                                  collection contains 8, 799, 260 photos, 8, 537, 368 for training
Image Retrieval, we create a set of candidate photos for giv-
                                                                  and 261, 892 for test.
en query q by retrieving all visually similar images based
on local features from the database up to a visual similarity     3.2 Calculating visual similarity
threshold, k. In the second step, Global Feature-based Im-
                                                                     Our approach to geo-location prediction exploits visual
age Selection, we rank all the candidate photos from step one
                                                                  similarity between photos. To calculate visual similarity
by their visual similarity with the query based on global fea-
                                                                  based on local features, we choose SURF, and the bag-of-
tures, and select the top t ranked photos as the ﬁnal selected
                                                                  visual-words scheme to build the search engine. To calcu-
candidate set Evis@k&t . In the third step, Geo-Visual Rank-
                                                                  late visual similarity based on global features, we use Joint
                                                                  Composite Descriptor (JCD), which encodes the color, edge
                                                                  directivity, and texture histogram of the image.
Copyright is held by the author/owner(s).                         1
MediaEval 2013 Workshop, October 18-19, 2013, Barcelona, Spain        http://www.ﬂickr.com/
4. EXPERIMENTAL RESULTS
4.1 General performance evaluation
RUN 1: Baseline, the system only applies Local Feature-
       based Image Retrieval, and uses the geo-location of
       the local feature-based visually most similar photo
       as estimated location.
RUN 2: Baseline + Geo-Visual Ranking
RUN 3: Baseline + Global Feature-based Image Selec-
       tion
RUN 4: Baseline + Global Feature-based Image Selec-
       tion + Geo-Visual Ranking

  The run results are presented in Table 1. In evaluation       Figure 2: Distribution of queries over diﬀerent levels
radius 1km, compared with the baseline method, Run1,            of geo-visual redundancy.
Run2 achieves about 18% improvement and Run3 achieves
about 17.7% improvement. The best performed one, Run4,
achieves about 37.4% improvement, which is more than the
sum of the previous two.

Table 1: Run results (261, 892 photos): percentage of
test photos located within {1, 10, 100, 500, 1000}km of
the ground truth.
        <1km <10km <100km <500km <1000km
 Run1     2.0%     2.6%       3.5%       7.9%        14.6%
 Run2     2.4%     3.1%       4.2%       8.5%        14.8%
 Run3     2.4%     3.1%       4.0%       8.4%        15.3%
 Run4     2.8%     3.7%       4.7%       9.2%        15.9%


4.2 Experimental analysis
                                                                Figure 3: Prediction accuracy within 1km for queries
  As the query photo is from a social image collection, there
                                                                with diﬀerent levels of geo-visual redundancy.
are certain properties in the collection that can aﬀect the
prediction accuracy of visual-content-based approaches. For
example, queries about one popular landmark and queries         Feature-based Image Selection can also beneﬁt the sys-
about an individual user’s car may get diﬀerent prediction      tem with Geo-Visual Ranking , especially for queries with
performance. For the purpose of our investigation, we deﬁne     medium geo-visual redundancy level.
geo-visual redundancy for a given query photo as the number
of photos that are taken within 1km radius of the query         5.   CONCLUSION
photo and also ranked in the top 10, 000 of the rank list
                                                                  We have presented a ranking approach addressing the
of the query from the local feature-based image retrieval
                                                                challenging task of predicting geo-location using only the
system.
                                                                visual content of images. The main observation is that the
  The distribution of the query photos over the geo-visual
                                                                combination of local and global image features can compen-
redundancy ranges is illustrated in Fig. 2. Over half of the
                                                                sate each other, and together with geo-visual ranking, they
queries do not have visually similar photos within their geo-
                                                                improve the prediction accuracy of visual-content-based geo-
neighborhood, which suggests that in these cases there is not
                                                                location estimation system. Future work will include an in-
another photo in the dataset that depicts the same scene or
                                                                vestigation of the optimal role of local and global features
object at the query location. This observation demonstrates
                                                                within the geo-visual ranking scheme for visual-content-based
how challenging it is to predict the geo-location of a social
                                                                geo-location estimation.
image purely from its visual content.
  Fig. 3 breaks down the geo-location prediction perfor-        6.   REFERENCES
mance over diﬀerent levels of geo-visual redundancy. Com-
paring Run3 with Run1, we see that the Global Feature-          [1] C. Hauﬀ, B. Thomee, and M. Trevisiol. Working Notes
based Image Selection step can improve the local feature-           for the Placing Task at MediaEval 2013, 2013.
based system for queries with diﬀerent geo-visual redun-        [2] X. Li, M. Larson, and A. Hanjalic. Geo-visual ranking
dancy level. Comparing Run2 with Run1 and Run4 with                 for location prediction of social images. In Proc. ICMR
Run3, we see that the Geo-Visual Ranking step can boost             ’13, 2013.
the performance for queries with high geo-visual redundan-      [3] P. Serdyukov, V. Murdock, and R. van Zwol. Placing
cy. Comparing Run4 with Run2, we see that the Global                Flickr photos on a map. In Proc. SIGIR ’09, 2009.