UQ-DKE’s Participation at MediaEval 2014 Placing Task Jiewei Cao, Zi Huang, Yang Yang, Heng Tao Shen School of Information Technology and Electrical Engineering University of Queensland Brisbane, QLD, Australia j.cao3@uq.edu.au, {huang, yang.yang, shenht}@itee.uq.edu.au ABSTRACT on this training set. First, items without tags were re- In this paper, we describe our approach as part of the Me- moved. Second, we converted all tags to lowercase and diaEval 2014 Placing Task evaluation. We first identify tags special characters were removed. Finally, this resulted in that are most indicative of geographic location by calculat- a pre-processed training set with 4,148,564 items. Unless ing the spatial-aware weighting for all tags in the training specified, this pre-processed training set is used in the fol- set. These weighting are applied to a language model-based lowing experiments. retrieval framework. To address the geo-tagging problem, 2.2 Spatial-aware Tag Weighting we find the most similar training item and propagate its lo- cation to the test item. Base on last year’s experience, we We use a Ripley’s K statistic based tag selection method further improve the accuracy by utilizing the geo-location [4] to select the most spatial-aware tags by analyzing the correlation of images/videos uploaded by the same user. spatial distribution of tags. Specifically, equation (1) was applied to calculate the weighting for each tag t. Given a set Qt contains the locations of the images/videos which tag 1. INTRODUCTION t has been assigned, and Nt = |Qt | is the total number of The MediaEval 2014 Placing Task requires participants to occurrences of tag t, we have: assign geographical coordinates (latitude and longitude) to P w Flickr images or videos (we denote them by Flickr items for p∈Qt (|{q|q ∈ Qt , q 6= p, d(p, q) ≤ λ}|) s(t) = logNt · , description simplicity), we refer to [3] for a detailed descrip- Nt2 tion. Firstly, we identify spatial-aware tags in the training (1) set using a tag selection method based on Ripley’s K statis- where d(·) is the distance function. The weighting s(t) is tic [6]. To address the geo-tagging problem, we apply a similar to “tf-idf”: the first part log(Nt ) will prefer tag with language model-based document retrieval model to find the large frequency; the second part will downgrade the s(t) if most similar training item and propagate its location to the tag t spreads all over the world and vise verse. Specifically, test item. Here, we consider each Flickr item’s tags (title and when w = 1, if all the images with tag t cluster in a small description are excluded) as a document. Usually, a docu- region (controlled by λ), the second part will near to 1, oth- ment contains 5 to 10 tags and the tag’s order is disregarded. erwise, near to 0. In practice, Qt doesn’t need to contain all Given a test item, a query is constructed by using its tags the items with tag t. For example, if there are more than and then retrieve the most relevant document from training 1 million Flickr items have t, we can only sample 5000 or set. The spatial-aware tag weighting is applied to give differ- so of them, which will be sufficient enough to calculate the ent weighting for each tag in the query. Experiments show weighting. For each tag in the training set, we calculate its that spatial-aware weighting efficiently improved the accu- spatial-aware weighting by equation (1). racy. Base on last year’s experience [1], we further improve the accuracy by exploiting the geo-correlation between test 2.3 Retrieval Model items within the same user collection1 . We use the framework proposed by [5] which combines the language model and inference network as our retrieval 2. METHODOLOGY model. This model provides a set of structured query oper- ators [7] to express complex concepts, each of which can be 2.1 Data Pre-processing considered a query node in an inference network. Bayesian Smoothing with Dirichlet priors [8] is applied to avoid a zero A total of 5,025,000 geo-referenced Flickr items are pro- probability when a query contains a tag that doesn’t occur vided as training data. For language model-based approach, in the training documents. Given a test item, we use the cal- we treat each Flickr item’s tags as a document. Other sur- culated spatial-aware weighting to assign different weighting rounded texts, such as title or description, are not used in to the tags in the query, and then retrieve the most relevant our approach. We carried out two preliminary filter steps training item and propagate its location to the test item. 1 http://www.flickr.com/help/collections/ 2.4 Collection Geo-correlation To address the data sparsity issue of training data, [2] Copyright is held by the author/owner(s). jointly estimated the geo-locations of all of the test items, MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain where each test item was treated as “virtual” training data airport wheel brisbane, qagoma, qag hitachi, trip d800, nikon Figure 1: Images and their tags in a collection named Brisbane Trip 2014 created by the user. Table 1: Percentage (%) of correctly detected locations and median error distance (in km) of each run in kilometer. Within 10m 100m 1km 10km 100km 1000km Median Error(km) Baseline 1.09 4.87 17.06 34.22 43.00 56.59 380.38 Run 1 1.07 4.98 19.57 41.71 52.46 63.61 51.07 Run 3 1.08 5.05 20.23 43.68 56.03 69.08 27.32 and consequently boosted the performance of the algorithm. York City (40.7127, −74.0059) in our case, for test items On the other hand, [1] proposed a method that utilize the that without tag, whereas run 3 utilized the collection geo- geo-correlation between test items within the same user col- correlation as discussed in section 2.4. As we can see, both lection. spatial-aware tag weight and collection geo-correlation can Flickr users can organize their images and videos by as- help improve the geo-tagging accuracy. In this paper, we signing them to different collections (or albums). Intuitively, have set fixed values for w, λ and µ and avoided tailoring items within the same collection would be highly geo-correlated. these values to the problem. However, we believe there is Take Figure 1 as an example, a user shared his images dur- potential for improvement in the results through the optimal ing a trip to Brisbane, and organized them into a collec- selection of these parameters for the particular data. tion named Brisbane Trip 2014. As we can see, not every images in this collection is well tagged because user only 4. REFERENCES tagged the images he loved or interested in, and leaving [1] J. Cao. Photo set refinement and tag segmentation in others un-tagged or poorly tagged. Moreover, it’s difficult georeferencing flickr photos. In MediaEval, volume 1043 for us to predict their location by the image itself because of CEUR Workshop Proceedings. CEUR-WS.org, 2013. none of them contain particular landmark or landscape. Im- [2] J. Choi, G. Friedland, V. Ekambaram, and ages/videos with completely different tags or visual content K. Ramchandran. Multimodal location estimation of could be considered as taken in the same location or nearby, consumer media: Dealing with sparse training data. In if they were within the same user collection. For tag-based Multimedia and Expo (ICME), 2012 IEEE geo-tagging approaches, a poorly tagged query item will re- International Conference on, pages 43–48. IEEE, 2012. sult in a bad estimation. However, if this item belongs to a user collection which contains one or more images/videos [3] J. Choi, B. Thomee, G. Friedland, L. Cao, K. Ni, with well estimated location (usually well tagged or contain D. Borth, B. Elizalde, L. Gottlieb, C. Carrano, landmark), then we can use the centroid location of this R. Pearce, and D. Poland. The placing task: A collection as the estimation for the poorly estimated one. large-scale geo-estimation challenge for social-media In this paper we adopted similar strategy as last year we videos and images. In Proceedings of the 3rd ACM did to find test items within the same collection, please refer International Workshop on Geotagging and Its [1] for details. Given a test item with no tag, we use the Applications in Multimedia, 2014. most frequent location of well estimated test items within [4] O. Van Laere, J. A. Quinn, S. Schockaert, and the same collection as the finial estimation. B. Dhoedt. Spatially aware term selection for geotagging. IEEE Trans. Knowl. Data Eng., 26(1):221–234, 2014. 3. RESULTS AND DISCUSSION [5] D. Metzler and W. B. Croft. Combining the language model and inference network approaches to retrieval. There are five different test sets and we chose test 5 whose Inf. Process. Manage., 40(5):735–750, 2004. size is 510,000. Following [4], we set w = 1 and λ = 40km in equation (1) to favor tags that occur centered around a small [6] B. D. Ripley. Spatial statistics, volume 575. John Wiley number of locations. We set µ = 5 for Dirichlet Smoothing & Sons, 2005. because the average document length is around 5 in our case, [7] T. Strohman, D. Metzler, H. Turtle, and W. B. Croft. which means there are 5 tags in each document on average. Indri: A language model-based search engine for We submitted two runs (run 1 and run 3) and the results complex queries. In Proceedings of the International of our experiments are shown in table 1. Run 2 is omitted Conference on Intelligent Analysis, volume 2, pages which requires only visual and audio cues can be used. Base- 2–6. Citeseer, 2005. line approach used the same retrieval model as run 1, but the [8] C. Zhai and J. D. Lafferty. A study of smoothing spatial-aware tag weighting were not applied. Both baseline methods for language models applied to ad hoc approach and run 1 assigned a default location, e.g., New information retrieval. In SIGIR, pages 334–342, 2001.