UQ-DKE’s Participation at MediaEval 2014 Placing Task

                               Jiewei Cao, Zi Huang, Yang Yang, Heng Tao Shen
                               School of Information Technology and Electrical Engineering
                                                University of Queensland
                                                Brisbane, QLD, Australia
                      j.cao3@uq.edu.au, {huang, yang.yang, shenht}@itee.uq.edu.au


ABSTRACT                                                           on this training set. First, items without tags were re-
In this paper, we describe our approach as part of the Me-         moved. Second, we converted all tags to lowercase and
diaEval 2014 Placing Task evaluation. We first identify tags       special characters were removed. Finally, this resulted in
that are most indicative of geographic location by calculat-       a pre-processed training set with 4,148,564 items. Unless
ing the spatial-aware weighting for all tags in the training       specified, this pre-processed training set is used in the fol-
set. These weighting are applied to a language model-based         lowing experiments.
retrieval framework. To address the geo-tagging problem,           2.2    Spatial-aware Tag Weighting
we find the most similar training item and propagate its lo-
cation to the test item. Base on last year’s experience, we           We use a Ripley’s K statistic based tag selection method
further improve the accuracy by utilizing the geo-location         [4] to select the most spatial-aware tags by analyzing the
correlation of images/videos uploaded by the same user.            spatial distribution of tags. Specifically, equation (1) was
                                                                   applied to calculate the weighting for each tag t. Given a
                                                                   set Qt contains the locations of the images/videos which tag
1.     INTRODUCTION                                                t has been assigned, and Nt = |Qt | is the total number of
   The MediaEval 2014 Placing Task requires participants to        occurrences of tag t, we have:
assign geographical coordinates (latitude and longitude) to                          P                                            w
Flickr images or videos (we denote them by Flickr items for                             p∈Qt (|{q|q ∈ Qt , q 6= p, d(p, q) ≤ λ}|)
                                                                     s(t) = logNt ·                                                 ,
description simplicity), we refer to [3] for a detailed descrip-                                          Nt2
tion. Firstly, we identify spatial-aware tags in the training                                                                       (1)
set using a tag selection method based on Ripley’s K statis-       where d(·) is the distance function. The weighting s(t) is
tic [6]. To address the geo-tagging problem, we apply a            similar to “tf-idf”: the first part log(Nt ) will prefer tag with
language model-based document retrieval model to find the          large frequency; the second part will downgrade the s(t) if
most similar training item and propagate its location to the       tag t spreads all over the world and vise verse. Specifically,
test item. Here, we consider each Flickr item’s tags (title and    when w = 1, if all the images with tag t cluster in a small
description are excluded) as a document. Usually, a docu-          region (controlled by λ), the second part will near to 1, oth-
ment contains 5 to 10 tags and the tag’s order is disregarded.     erwise, near to 0. In practice, Qt doesn’t need to contain all
Given a test item, a query is constructed by using its tags        the items with tag t. For example, if there are more than
and then retrieve the most relevant document from training         1 million Flickr items have t, we can only sample 5000 or
set. The spatial-aware tag weighting is applied to give differ-    so of them, which will be sufficient enough to calculate the
ent weighting for each tag in the query. Experiments show          weighting. For each tag in the training set, we calculate its
that spatial-aware weighting efficiently improved the accu-        spatial-aware weighting by equation (1).
racy. Base on last year’s experience [1], we further improve
the accuracy by exploiting the geo-correlation between test        2.3    Retrieval Model
items within the same user collection1 .                              We use the framework proposed by [5] which combines
                                                                   the language model and inference network as our retrieval
2.     METHODOLOGY                                                 model. This model provides a set of structured query oper-
                                                                   ators [7] to express complex concepts, each of which can be
2.1     Data Pre-processing                                        considered a query node in an inference network. Bayesian
                                                                   Smoothing with Dirichlet priors [8] is applied to avoid a zero
  A total of 5,025,000 geo-referenced Flickr items are pro-
                                                                   probability when a query contains a tag that doesn’t occur
vided as training data. For language model-based approach,
                                                                   in the training documents. Given a test item, we use the cal-
we treat each Flickr item’s tags as a document. Other sur-
                                                                   culated spatial-aware weighting to assign different weighting
rounded texts, such as title or description, are not used in
                                                                   to the tags in the query, and then retrieve the most relevant
our approach. We carried out two preliminary filter steps
                                                                   training item and propagate its location to the test item.
1
    http://www.flickr.com/help/collections/
                                                                   2.4    Collection Geo-correlation
                                                                      To address the data sparsity issue of training data, [2]
Copyright is held by the author/owner(s).                          jointly estimated the geo-locations of all of the test items,
MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain     where each test item was treated as “virtual” training data
         airport                       wheel                   brisbane, qagoma, qag            hitachi, trip         d800, nikon

                   Figure 1: Images and their tags in a collection named Brisbane Trip 2014 created by the user.

     Table 1: Percentage (%) of correctly detected locations and median error distance (in km) of each run in kilometer.

                         Within     10m        100m    1km     10km      100km         1000km   Median Error(km)
                         Baseline   1.09        4.87   17.06   34.22      43.00         56.59        380.38
                          Run 1     1.07        4.98   19.57   41.71      52.46         63.61         51.07
                          Run 3     1.08        5.05   20.23   43.68      56.03         69.08         27.32


and consequently boosted the performance of the algorithm.                York City (40.7127, −74.0059) in our case, for test items
On the other hand, [1] proposed a method that utilize the                 that without tag, whereas run 3 utilized the collection geo-
geo-correlation between test items within the same user col-              correlation as discussed in section 2.4. As we can see, both
lection.                                                                  spatial-aware tag weight and collection geo-correlation can
   Flickr users can organize their images and videos by as-               help improve the geo-tagging accuracy. In this paper, we
signing them to different collections (or albums). Intuitively,           have set fixed values for w, λ and µ and avoided tailoring
items within the same collection would be highly geo-correlated.          these values to the problem. However, we believe there is
Take Figure 1 as an example, a user shared his images dur-                potential for improvement in the results through the optimal
ing a trip to Brisbane, and organized them into a collec-                 selection of these parameters for the particular data.
tion named Brisbane Trip 2014. As we can see, not every
images in this collection is well tagged because user only                4.     REFERENCES
tagged the images he loved or interested in, and leaving
                                                                          [1] J. Cao. Photo set refinement and tag segmentation in
others un-tagged or poorly tagged. Moreover, it’s difficult
                                                                              georeferencing flickr photos. In MediaEval, volume 1043
for us to predict their location by the image itself because
                                                                              of CEUR Workshop Proceedings. CEUR-WS.org, 2013.
none of them contain particular landmark or landscape. Im-
                                                                          [2] J. Choi, G. Friedland, V. Ekambaram, and
ages/videos with completely different tags or visual content
                                                                              K. Ramchandran. Multimodal location estimation of
could be considered as taken in the same location or nearby,
                                                                              consumer media: Dealing with sparse training data. In
if they were within the same user collection. For tag-based
                                                                              Multimedia and Expo (ICME), 2012 IEEE
geo-tagging approaches, a poorly tagged query item will re-
                                                                              International Conference on, pages 43–48. IEEE, 2012.
sult in a bad estimation. However, if this item belongs to
a user collection which contains one or more images/videos                [3] J. Choi, B. Thomee, G. Friedland, L. Cao, K. Ni,
with well estimated location (usually well tagged or contain                  D. Borth, B. Elizalde, L. Gottlieb, C. Carrano,
landmark), then we can use the centroid location of this                      R. Pearce, and D. Poland. The placing task: A
collection as the estimation for the poorly estimated one.                    large-scale geo-estimation challenge for social-media
   In this paper we adopted similar strategy as last year we                  videos and images. In Proceedings of the 3rd ACM
did to find test items within the same collection, please refer               International Workshop on Geotagging and Its
[1] for details. Given a test item with no tag, we use the                    Applications in Multimedia, 2014.
most frequent location of well estimated test items within                [4] O. Van Laere, J. A. Quinn, S. Schockaert, and
the same collection as the finial estimation.                                 B. Dhoedt. Spatially aware term selection for
                                                                              geotagging. IEEE Trans. Knowl. Data Eng.,
                                                                              26(1):221–234, 2014.
3.   RESULTS AND DISCUSSION                                               [5] D. Metzler and W. B. Croft. Combining the language
                                                                              model and inference network approaches to retrieval.
   There are five different test sets and we chose test 5 whose
                                                                              Inf. Process. Manage., 40(5):735–750, 2004.
size is 510,000. Following [4], we set w = 1 and λ = 40km in
equation (1) to favor tags that occur centered around a small             [6] B. D. Ripley. Spatial statistics, volume 575. John Wiley
number of locations. We set µ = 5 for Dirichlet Smoothing                     & Sons, 2005.
because the average document length is around 5 in our case,              [7] T. Strohman, D. Metzler, H. Turtle, and W. B. Croft.
which means there are 5 tags in each document on average.                     Indri: A language model-based search engine for
   We submitted two runs (run 1 and run 3) and the results                    complex queries. In Proceedings of the International
of our experiments are shown in table 1. Run 2 is omitted                     Conference on Intelligent Analysis, volume 2, pages
which requires only visual and audio cues can be used. Base-                  2–6. Citeseer, 2005.
line approach used the same retrieval model as run 1, but the             [8] C. Zhai and J. D. Lafferty. A study of smoothing
spatial-aware tag weighting were not applied. Both baseline                   methods for language models applied to ad hoc
approach and run 1 assigned a default location, e.g., New                     information retrieval. In SIGIR, pages 334–342, 2001.