Geo_ML @ MediaEval Placing Task 2015 Nghia Duong-Trung, Martin Wistuba, Lucas Rego Drumond, Lars Schmidt-Thieme Information Systems and Machine Learning Lab University of Hildesheim, Germany {duongn,wistuba,ldrumond,schmidt-thieme}@ismll.uni-hildesheim.de ABSTRACT Moreover, we discuss what has been learned in comparison We participated in the MediaEval Benchmarking whose goal with the baseline in section 4. is to concentrate on the multimodal geo-location prediction on the Yahoo! Flickr Creative Commons 100M dataset - the 2. TASK DESCRIPTION placing task. It challenges participants to develop models We have m, v geotagged multimedia items in the train- and/or techniques to estimate the geographic locations of ing and test data respectively, and n features describing the Flickr resources based on textual metadata, e.g. titles, each item. These features are drawn from textual meta- descriptions and tags. We aim to find a procedure that is data. Each item is annotated with a geo-location y ∈ R2 , conceptual to understand, simple to implement and flexible y = (y lat , y lon ) where y lat ∈ R is the latitude and y lon ∈ R to integrate different techniques. In this paper, we present is the longitude. Given some training data X train ∈ Rm×n , a three-step approach to tackle the locale-based sub-task. and the respective labels Y train ∈ Rm×2 , we aim to find a model f : Rn →P R2 such that for some test data X test ∈ v 1. INTRODUCTION Rv×n , the error test i=1 d(f (Xi ), Yitest ) is minimal, where test v×2 Y ∈ R is the true geo-location matrix and d is the The placing task is the challenge offered by the MediaEval Karney distance [5]. Multimedia Benchmarking [1] Initiative that proposes mo- tivations for working with geotagged applications and solu- tions [2]. The task focuses on the development of models 3. PROPOSED APPROACH AND RESULTS to predict the geo-location, i.e. the latitude and longitude, In this section, we discuss the data preprocessing tech- of multimedia items based on their metadata and/or visual niques we employed. Then, we present our proposed three- features. Estimating the geo-location accurately will enable step approach. us to provide a wide range of applications such as geo-aware recommendations and targeted advertisements. 3.1 Data preprocessing The Yahoo Flickr Creative Commons 100 Million Dataset1 Before feeding the dataset to our three-step approach, we (YFCC100M) which is the largest public multimedia col- pre-processed the data as follows. All given metadata de- lection contains a total of 100 million photos and videos scription was converted into a bag of words representation, captured over 10 years [8]. Under the umbrella of geo- consisting of all words/unigrams that mutually appear in location prediction, we focus on the locale-based placing task both training and test set. Then, term frequency - inverse which aims to estimate the geographic coordinates of a given document frequency features were computed to reflect how photo/video. This year’s task dataset is based on a subset important a word is to a description in a collection. The of the YFCC100M. The training data consists of 4,695,149 features with low-variance were discarded. The number of items, while the test set contains 949,889 items. The chal- features after data preprocessing is 20,000. lenge baseline is described in [6]. In this paper we exploit the availability and plurality of 3.2 Proposed approach textual metadata, especially the titles, users tags, machine The following part is the paper’s main contribution. We tags and descriptions to develop our three-step approach: simultaneously explain the theoretical purposes and describe (i) K-means clustering of multimedia items by their latitude how our three-step approach works for the aim of finding the and longitude coordinates; (ii) learning a linear support vec- f model mentioned in section 2. We devised a three-step tor machine on textual contents to predict cluster member- procedure. ship; and (iii) exploiting a K-nearest neighbor regression to find the closest item in the same predicted cluster and return 1. K-means clustering. The target geo-location y con- its geo-location as prediction. The theoretical purposes why sists of two labels y lat and y lon . The basic idea in the we split our system into 3 steps are discussed in section 3.2. first step is to transform a multi-target prediction task into a multi-class classification task. The idea of an 1 http://bit.ly/yfcc100md equally squared grid is not applicable since geographic coordinates of items are spread all over the world. In order to find regions of interest we cluster the items Copyright is held by the author/owner(s). on the training set using K-means [3]. At the end of MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany this step, we have a cluster assignment vector c ∈ C m , where the i-th element ci contains the cluster assigned distance # items percentage to the i-th instance based on its geo-location yi . 0.001 km 504 0.05 % 0.01 km 1051 0.11 % 2. Linear support vector machine. Now that we 1 km 11849 1.25 % have identified clusters, we need to learn a model on 10 km 287807 30.03 % X train and c in order to map the test instances to 100 km 418831 44.09 % those clusters. For that reason, we use a classifier 1000 km 566791 59.67 % which has c as the target and X train as the train- 10000 km 911364 95.94 % ing domain. From now on, the task of geo-location 40000 km 949889 100.00 % prediction can be treated as a multi-class classification problem. The dataset associated with corresponding Table 1: Details of our challenge submission. With clusters c is trained by the linear SVM g : Rn → c the median error of 352.47 km, we are at the 4th with L2 regularization [4]. position over all participants in the leaderboard. 3. K-nearest neighbor regression. Once we have esti- mated to which cluster ci a test instance Xitest should of conceptual steps. This architecture enables improvement belong to, its predicted geo-location ŷtest i is that of the in future experiments. We can easily replace and integrate nearest neighbor in the same cluster g(Xitest ). The co- new techniques in the workflow without redesigning the com- ordinates of Xitest are predicted using 1-NN regression plete system. For example, we can replace K-means cluster- [7] on all the training instances belonging to g(Xitest ). ing by K-medoids clustering or mean-shift clustering. In The evaluation metric is the median Karney distance d : addition, we can also apply feature selection or dimension R2 ×R2 → R+ between the actual yi and predicted locations reduction on X train before feeding it into Step 2. ŷi . We apply grid search to find the best value combina- tion Pv of all hyperparameters test that minimize the distance error 6. ACKNOWLEDGMENTS i=1 d(f (Xi ), Yitest ). At the end of the evaluation, we We would like to thank the MediaEval organizers for their have the number of clusters k = 1000, and the cost s = 0.01 baseline code and instructions2 . Nghia Duong-Trung is spon- for the linear SVM. Those aforementioned steps yield the sored by a grant from Ministry of Education and Training pseudocode below. (MOET) of Vietnam under the national project no. 911. Algorithm 1 Three-step approach 7. REFERENCES INPUT: X train , X test , Y train , cost s, number of clusters k [1] Jaeyoung Choi, Claudia Hauff, Olivier Van Laere, and Bart Thomee, The placing task at mediaeval 2015, 1: # Step 1: k-means clustering (2015). 2: c ← Kmeans(Y train , k) [2] Jaeyoung Choi, Bart Thomee, Gerald Friedland, 3: # Step 2: Linear SVM Liangliang Cao, Karl Ni, Damian Borth, Benjamin 4: g ← LinearSV M (X train , s, c) Elizalde, Luke Gottlieb, Carmen Carrano, Roger 5: # Step 3: k-nearest neighbor Pearce, et al., The placing task: A large-scale 6: for i = 1 . . . v do geo-estimation challenge for social-media videos and 7: ci ← g(Xitest ) images, Proceedings of the 3rd ACM Multimedia 8: X, Y ← rows of X train , Y train belonging to cluster ci Workshop on Geotagging and Its Applications in 9: ŷi ← 1N N Regression(X, Y ) Multimedia, ACM, 2014, pp. 27–31. 10: end for [3] John A Hartigan and Manchek A Wong, Algorithm as 11: return ŷi 136: A k-means clustering algorithm, Applied statistics (1979), 100–108. [4] Thorsten Joachims, Text categorization with support 4. EXPERIMENTAL RESULTS vector machines: Learning with many relevant features, Springer, 1998. Our implementation achieves a median error of 352.47 km [5] Charles FF Karney, Algorithms for geodesics, Journal to the test set. The baseline median error is 71.45 km. In of Geodesy 87 (2013), no. 1, 43–55. table 1, we present our evaluation results in more details. To compare what has been done with the baseline, we only [6] Olivier Van Laere, Steven Schockaert, and Bart apply K-means on Y train without any textual knowledge Dhoedt, Georeferencing flickr resources based on textual or language models. We also do not apply feature ranking. meta-data, Information Sciences 238 (2013), 52 – 74. Those issues will lead to further improvement and we would [7] Leif E Peterson, K-nearest neighbor, Scholarpedia 4 like to discuss it in section 5. (2009), no. 2, 1883. [8] Bart Thomee, David A Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian 5. CONCLUSION AND OUTLOOK Borth, and Li-Jia Li, The new data and new challenges We have presented our three-step approach to the geo- in multimedia research, arXiv preprint location prediction problem based on only textual metadata arXiv:1503.01817 (2015). without exploiting any language models and topic discovery to investigate how reliable and robust this approach actually 2 https://github.com/ovlaere/placing- is. We have split the geo-location prediction into a sequence text/tree/mediaeval2015