Geo_ML @ MediaEval Placing Task 2015

          Nghia Duong-Trung, Martin Wistuba, Lucas Rego Drumond, Lars Schmidt-Thieme
                                     Information Systems and Machine Learning Lab
                                           University of Hildesheim, Germany
                   {duongn,wistuba,ldrumond,schmidt-thieme}@ismll.uni-hildesheim.de


ABSTRACT                                                         Moreover, we discuss what has been learned in comparison
We participated in the MediaEval Benchmarking whose goal         with the baseline in section 4.
is to concentrate on the multimodal geo-location prediction
on the Yahoo! Flickr Creative Commons 100M dataset - the         2.     TASK DESCRIPTION
placing task. It challenges participants to develop models          We have m, v geotagged multimedia items in the train-
and/or techniques to estimate the geographic locations of        ing and test data respectively, and n features describing
the Flickr resources based on textual metadata, e.g. titles,     each item. These features are drawn from textual meta-
descriptions and tags. We aim to find a procedure that is        data. Each item is annotated with a geo-location y ∈ R2 ,
conceptual to understand, simple to implement and flexible       y = (y lat , y lon ) where y lat ∈ R is the latitude and y lon ∈ R
to integrate different techniques. In this paper, we present     is the longitude. Given some training data X train ∈ Rm×n ,
a three-step approach to tackle the locale-based sub-task.       and the respective labels Y train ∈ Rm×2 , we aim to find a
                                                                 model f : Rn →P         R2 such that for some test data X test ∈
                                                                                          v
1.     INTRODUCTION                                              Rv×n , the error                   test
                                                                                          i=1 d(f (Xi    ), Yitest ) is minimal, where
                                                                   test       v×2
                                                                 Y      ∈ R          is the true geo-location matrix and d is the
   The placing task is the challenge offered by the MediaEval
                                                                 Karney distance [5].
Multimedia Benchmarking [1] Initiative that proposes mo-
tivations for working with geotagged applications and solu-
tions [2]. The task focuses on the development of models         3.     PROPOSED APPROACH AND RESULTS
to predict the geo-location, i.e. the latitude and longitude,      In this section, we discuss the data preprocessing tech-
of multimedia items based on their metadata and/or visual        niques we employed. Then, we present our proposed three-
features. Estimating the geo-location accurately will enable     step approach.
us to provide a wide range of applications such as geo-aware
recommendations and targeted advertisements.                     3.1      Data preprocessing
   The Yahoo Flickr Creative Commons 100 Million Dataset1          Before feeding the dataset to our three-step approach, we
(YFCC100M) which is the largest public multimedia col-           pre-processed the data as follows. All given metadata de-
lection contains a total of 100 million photos and videos        scription was converted into a bag of words representation,
captured over 10 years [8]. Under the umbrella of geo-           consisting of all words/unigrams that mutually appear in
location prediction, we focus on the locale-based placing task   both training and test set. Then, term frequency - inverse
which aims to estimate the geographic coordinates of a given     document frequency features were computed to reflect how
photo/video. This year’s task dataset is based on a subset       important a word is to a description in a collection. The
of the YFCC100M. The training data consists of 4,695,149         features with low-variance were discarded. The number of
items, while the test set contains 949,889 items. The chal-      features after data preprocessing is 20,000.
lenge baseline is described in [6].
   In this paper we exploit the availability and plurality of    3.2      Proposed approach
textual metadata, especially the titles, users tags, machine       The following part is the paper’s main contribution. We
tags and descriptions to develop our three-step approach:        simultaneously explain the theoretical purposes and describe
(i) K-means clustering of multimedia items by their latitude     how our three-step approach works for the aim of finding the
and longitude coordinates; (ii) learning a linear support vec-   f model mentioned in section 2. We devised a three-step
tor machine on textual contents to predict cluster member-       procedure.
ship; and (iii) exploiting a K-nearest neighbor regression to
find the closest item in the same predicted cluster and return        1. K-means clustering. The target geo-location y con-
its geo-location as prediction. The theoretical purposes why             sists of two labels y lat and y lon . The basic idea in the
we split our system into 3 steps are discussed in section 3.2.           first step is to transform a multi-target prediction task
                                                                         into a multi-class classification task. The idea of an
1
    http://bit.ly/yfcc100md                                              equally squared grid is not applicable since geographic
                                                                         coordinates of items are spread all over the world. In
                                                                         order to find regions of interest we cluster the items
Copyright is held by the author/owner(s).                                on the training set using K-means [3]. At the end of
MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany              this step, we have a cluster assignment vector c ∈ C m ,
        where the i-th element ci contains the cluster assigned                    distance    # items    percentage
        to the i-th instance based on its geo-location yi .                       0.001 km        504       0.05 %
                                                                                   0.01 km       1051       0.11 %
     2. Linear support vector machine. Now that we                                   1 km       11849       1.25 %
        have identified clusters, we need to learn a model on                       10 km      287807       30.03 %
        X train and c in order to map the test instances to                         100 km     418831       44.09 %
        those clusters. For that reason, we use a classifier                       1000 km     566791       59.67 %
        which has c as the target and X train as the train-                       10000 km     911364       95.94 %
        ing domain. From now on, the task of geo-location
                                                                                  40000 km     949889      100.00 %
        prediction can be treated as a multi-class classification
        problem. The dataset associated with corresponding            Table 1: Details of our challenge submission. With
        clusters c is trained by the linear SVM g : Rn → c            the median error of 352.47 km, we are at the 4th
        with L2 regularization [4].                                   position over all participants in the leaderboard.
     3. K-nearest neighbor regression. Once we have esti-
        mated to which cluster ci a test instance Xitest should
                                                                      of conceptual steps. This architecture enables improvement
        belong to, its predicted geo-location ŷtest
                                                i    is that of the
                                                                      in future experiments. We can easily replace and integrate
        nearest neighbor in the same cluster g(Xitest ). The co-
                                                                      new techniques in the workflow without redesigning the com-
        ordinates of Xitest are predicted using 1-NN regression
                                                                      plete system. For example, we can replace K-means cluster-
        [7] on all the training instances belonging to g(Xitest ).
                                                                      ing by K-medoids clustering or mean-shift clustering. In
    The evaluation metric is the median Karney distance d :           addition, we can also apply feature selection or dimension
R2 ×R2 → R+ between the actual yi and predicted locations             reduction on X train before feeding it into Step 2.
ŷi . We apply grid search to find the best value combina-
tion
Pv of all hyperparameters
              test
                                 that minimize the distance error     6.   ACKNOWLEDGMENTS
    i=1 d(f (Xi    ), Yitest ). At the end of the evaluation, we        We would like to thank the MediaEval organizers for their
have the number of clusters k = 1000, and the cost s = 0.01           baseline code and instructions2 . Nghia Duong-Trung is spon-
for the linear SVM. Those aforementioned steps yield the              sored by a grant from Ministry of Education and Training
pseudocode below.                                                     (MOET) of Vietnam under the national project no. 911.

Algorithm 1 Three-step approach
                                                                      7.   REFERENCES
INPUT: X train , X test , Y train , cost s, number of clusters k
                                                                      [1] Jaeyoung Choi, Claudia Hauff, Olivier Van Laere, and
                                                                          Bart Thomee, The placing task at mediaeval 2015,
 1: # Step 1: k-means clustering                                          (2015).
 2: c ← Kmeans(Y train , k)                                           [2] Jaeyoung Choi, Bart Thomee, Gerald Friedland,
 3: # Step 2: Linear SVM                                                  Liangliang Cao, Karl Ni, Damian Borth, Benjamin
 4: g ← LinearSV M (X train , s, c)                                       Elizalde, Luke Gottlieb, Carmen Carrano, Roger
 5: # Step 3: k-nearest neighbor                                          Pearce, et al., The placing task: A large-scale
 6: for i = 1 . . . v do                                                  geo-estimation challenge for social-media videos and
 7:   ci ← g(Xitest )                                                     images, Proceedings of the 3rd ACM Multimedia
 8:   X, Y ← rows of X train , Y train belonging to cluster ci            Workshop on Geotagging and Its Applications in
 9:   ŷi ← 1N N Regression(X, Y )                                        Multimedia, ACM, 2014, pp. 27–31.
10: end for
                                                                      [3] John A Hartigan and Manchek A Wong, Algorithm as
11: return ŷi
                                                                          136: A k-means clustering algorithm, Applied statistics
                                                                          (1979), 100–108.
                                                                      [4] Thorsten Joachims, Text categorization with support
4.     EXPERIMENTAL RESULTS                                               vector machines: Learning with many relevant features,
                                                                          Springer, 1998.
   Our implementation achieves a median error of 352.47 km
                                                                      [5] Charles FF Karney, Algorithms for geodesics, Journal
to the test set. The baseline median error is 71.45 km. In
                                                                          of Geodesy 87 (2013), no. 1, 43–55.
table 1, we present our evaluation results in more details.
To compare what has been done with the baseline, we only              [6] Olivier Van Laere, Steven Schockaert, and Bart
apply K-means on Y train without any textual knowledge                    Dhoedt, Georeferencing flickr resources based on textual
or language models. We also do not apply feature ranking.                 meta-data, Information Sciences 238 (2013), 52 – 74.
Those issues will lead to further improvement and we would            [7] Leif E Peterson, K-nearest neighbor, Scholarpedia 4
like to discuss it in section 5.                                          (2009), no. 2, 1883.
                                                                      [8] Bart Thomee, David A Shamma, Gerald Friedland,
                                                                          Benjamin Elizalde, Karl Ni, Douglas Poland, Damian
5.     CONCLUSION AND OUTLOOK                                             Borth, and Li-Jia Li, The new data and new challenges
   We have presented our three-step approach to the geo-                  in multimedia research, arXiv preprint
location prediction problem based on only textual metadata                arXiv:1503.01817 (2015).
without exploiting any language models and topic discovery
to investigate how reliable and robust this approach actually         2
                                                                        https://github.com/ovlaere/placing-
is. We have split the geo-location prediction into a sequence         text/tree/mediaeval2015