<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Nghia Duong-Trung, Martin Wistuba, Lucas Rego Drumond, Lars Schmidt-Thieme Information Systems and Machine Learning Lab University of Hildesheim</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>14</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>We participated in the MediaEval Benchmarking whose goal is to concentrate on the multimodal geo-location prediction on the Yahoo! Flickr Creative Commons 100M dataset - the placing task. It challenges participants to develop models and/or techniques to estimate the geographic locations of the Flickr resources based on textual metadata, e.g. titles, descriptions and tags. We aim to nd a procedure that is conceptual to understand, simple to implement and exible to integrate di erent techniques. In this paper, we present a three-step approach to tackle the locale-based sub-task.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The placing task is the challenge o ered by the MediaEval
Multimedia Benchmarking [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] Initiative that proposes
motivations for working with geotagged applications and
solutions [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The task focuses on the development of models
to predict the geo-location, i.e. the latitude and longitude,
of multimedia items based on their metadata and/or visual
features. Estimating the geo-location accurately will enable
us to provide a wide range of applications such as geo-aware
recommendations and targeted advertisements.
      </p>
      <p>
        The Yahoo Flickr Creative Commons 100 Million Dataset1
(YFCC100M) which is the largest public multimedia
collection contains a total of 100 million photos and videos
captured over 10 years [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Under the umbrella of
geolocation prediction, we focus on the locale-based placing task
which aims to estimate the geographic coordinates of a given
photo/video. This year's task dataset is based on a subset
of the YFCC100M. The training data consists of 4,695,149
items, while the test set contains 949,889 items. The
challenge baseline is described in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>In this paper we exploit the availability and plurality of
textual metadata, especially the titles, users tags, machine
tags and descriptions to develop our three-step approach:
(i) K-means clustering of multimedia items by their latitude
and longitude coordinates; (ii) learning a linear support
vector machine on textual contents to predict cluster
membership; and (iii) exploiting a K-nearest neighbor regression to
nd the closest item in the same predicted cluster and return
its geo-location as prediction. The theoretical purposes why
we split our system into 3 steps are discussed in section 3.2.
Moreover, we discuss what has been learned in comparison
with the baseline in section 4.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>TASK DESCRIPTION</title>
      <p>
        We have m, v geotagged multimedia items in the
training and test data respectively, and n features describing
each item. These features are drawn from textual
meta2
data. Each item is annota2tedR wisitthhea lagteiot-ulodceaatinodn yylon2 R ,
y = (ylat; ylon) where ylat 2 R
is the longitude. Given some training data Xtrain 2 Rm n,
and the respective labels Y train 2 Rm 2, we aim to nd a
model f : Rn ! R2 such that for some test data Xtest 2
Rv n, the error Piv=1 d(f (Xitest); Yitest) is minimal, where
Y test 2 Rv 2 is the true geo-location matrix and d is the
Karney distance [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
3.
      </p>
    </sec>
    <sec id="sec-3">
      <title>PROPOSED APPROACH AND RESULTS</title>
      <p>In this section, we discuss the data preprocessing
techniques we employed. Then, we present our proposed
threestep approach.
3.1</p>
    </sec>
    <sec id="sec-4">
      <title>Data preprocessing</title>
      <p>Before feeding the dataset to our three-step approach, we
pre-processed the data as follows. All given metadata
description was converted into a bag of words representation,
consisting of all words/unigrams that mutually appear in
both training and test set. Then, term frequency - inverse
document frequency features were computed to re ect how
important a word is to a description in a collection. The
features with low-variance were discarded. The number of
features after data preprocessing is 20,000.
3.2</p>
    </sec>
    <sec id="sec-5">
      <title>Proposed approach</title>
      <p>The following part is the paper's main contribution. We
simultaneously explain the theoretical purposes and describe
how our three-step approach works for the aim of nding the
f model mentioned in section 2. We devised a three-step
procedure.</p>
      <p>
        1. K-means clustering. The target geo-location y
consists of two labels ylat and ylon. The basic idea in the
rst step is to transform a multi-target prediction task
into a multi-class classi cation task. The idea of an
equally squared grid is not applicable since geographic
coordinates of items are spread all over the world. In
order to nd regions of interest we cluster the items
on the training set using K-means [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. At the end of
this step, we have a cluster assignment vector c 2 Cm,
where the i-th element ci contains the cluster assigned
to the i-th instance based on its geo-location yi.
2. Linear support vector machine. Now that we
have identi ed clusters, we need to learn a model on
Xtrain and c in order to map the test instances to
those clusters. For that reason, we use a classi er
which has c as the target and X train as the
training domain. From now on, the task of geo-location
prediction can be treated as a multi-class classi cation
problem. The dataset associated with corresponding
clusters c is trained by the linear SVM g : Rn ! c
with L2 regularization [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
3. K-nearest neighbor regression. Once we have
estimated to which cluster ci a test instance Xitest should
belong to, its predicted geo-location y^test is that of the
i
nearest neighbor in the same cluster g(Xitest). The
coordinates of Xitest are predicted using 1-NN regression
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] on all the training instances belonging to g(Xitest).
      </p>
      <p>The evaluation metric is the median Karney distance d :
R2 R2 ! R+ between the actual yi and predicted locations
y^i. We apply grid search to nd the best value
combination of all hyperparameters that minimize the distance error
Pv</p>
      <p>i=1 d(f (Xitest); Yitest). At the end of the evaluation, we
have the number of clusters k = 1000, and the cost s = 0:01
for the linear SVM. Those aforementioned steps yield the
pseudocode below.</p>
      <p>Algorithm 1 Three-step approach
INPUT: Xtrain, Xtest, Y train, cost s, number of clusters k
1: # Step 1: k-means clustering
2: c Kmeans(Y train; k)
3: # Step 2: Linear SVM
4: g LinearSV M (Xtrain; s; c)
5: # Step 3: k-nearest neighbor
6: for i = 1 . . . v do
7: ci g(Xitest)
8: X; Y rows of Xtrain; Y train belonging to cluster ci
9: y^i 1N N Regression(X; Y )
10: end for
11: return y^i</p>
    </sec>
    <sec id="sec-6">
      <title>EXPERIMENTAL RESULTS</title>
      <p>Our implementation achieves a median error of 352.47 km
to the test set. The baseline median error is 71.45 km. In
table 1, we present our evaluation results in more details.
To compare what has been done with the baseline, we only
apply K-means on Y train without any textual knowledge
or language models. We also do not apply feature ranking.
Those issues will lead to further improvement and we would
like to discuss it in section 5.</p>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSION AND OUTLOOK</title>
      <p>We have presented our three-step approach to the
geolocation prediction problem based on only textual metadata
without exploiting any language models and topic discovery
to investigate how reliable and robust this approach actually
is. We have split the geo-location prediction into a sequence
of conceptual steps. This architecture enables improvement
in future experiments. We can easily replace and integrate
new techniques in the work ow without redesigning the
complete system. For example, we can replace K-means
clustering by K-medoids clustering or mean-shift clustering. In
addition, we can also apply feature selection or dimension
reduction on Xtrain before feeding it into Step 2.
6.</p>
    </sec>
    <sec id="sec-8">
      <title>ACKNOWLEDGMENTS</title>
      <p>We would like to thank the MediaEval organizers for their
baseline code and instructions2. Nghia Duong-Trung is
sponsored by a grant from Ministry of Education and Training
(MOET) of Vietnam under the national project no. 911.
7.
2https://github.com/ovlaere/placingtext/tree/mediaeval2015</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Jaeyoung</given-names>
            <surname>Choi</surname>
          </string-name>
          , Claudia Hau , Olivier Van Laere,
          <string-name>
            <given-names>and Bart</given-names>
            <surname>Thomee</surname>
          </string-name>
          ,
          <source>The placing task at mediaeval</source>
          <year>2015</year>
          , (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Jaeyoung</given-names>
            <surname>Choi</surname>
          </string-name>
          , Bart Thomee, Gerald Friedland, Liangliang Cao, Karl Ni, Damian Borth, Benjamin Elizalde, Luke Gottlieb, Carmen Carrano,
          <string-name>
            <given-names>Roger</given-names>
            <surname>Pearce</surname>
          </string-name>
          , et al.,
          <article-title>The placing task: A large-scale geo-estimation challenge for social-media videos and images</article-title>
          ,
          <source>Proceedings of the 3rd ACM Multimedia Workshop on Geotagging and Its Applications in Multimedia, ACM</source>
          ,
          <year>2014</year>
          , pp.
          <volume>27</volume>
          {
          <fpage>31</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>John</surname>
            <given-names>A Hartigan</given-names>
          </string-name>
          and
          <article-title>Manchek A Wong, Algorithm as 136: A k-means clustering algorithm</article-title>
          , Applied statistics (
          <year>1979</year>
          ),
          <volume>100</volume>
          {
          <fpage>108</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Thorsten</given-names>
            <surname>Joachims</surname>
          </string-name>
          ,
          <article-title>Text categorization with support vector machines: Learning with many relevant features</article-title>
          , Springer,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Charles</surname>
            <given-names>FF Karney</given-names>
          </string-name>
          ,
          <article-title>Algorithms for geodesics</article-title>
          ,
          <source>Journal of Geodesy</source>
          <volume>87</volume>
          (
          <year>2013</year>
          ), no.
          <issue>1</issue>
          ,
          <issue>43</issue>
          {
          <fpage>55</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Olivier</given-names>
            <surname>Van Laere</surname>
          </string-name>
          ,
          <string-name>
            <surname>Steven Schockaert</surname>
          </string-name>
          , and Bart Dhoedt,
          <article-title>Georeferencing ickr resources based on textual meta-data</article-title>
          ,
          <source>Information Sciences 238</source>
          (
          <year>2013</year>
          ),
          <volume>52</volume>
          {
          <fpage>74</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Leif</surname>
            <given-names>E</given-names>
          </string-name>
          <string-name>
            <surname>Peterson</surname>
          </string-name>
          ,
          <string-name>
            <surname>K-nearest</surname>
            <given-names>neighbor</given-names>
          </string-name>
          ,
          <source>Scholarpedia</source>
          <volume>4</volume>
          (
          <year>2009</year>
          ), no.
          <issue>2</issue>
          ,
          <year>1883</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Bart</given-names>
            <surname>Thomee</surname>
          </string-name>
          , David A Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and
          <string-name>
            <surname>Li-Jia</surname>
            <given-names>Li</given-names>
          </string-name>
          ,
          <article-title>The new data and new challenges in multimedia research</article-title>
          ,
          <source>arXiv preprint arXiv:1503</source>
          .
          <year>01817</year>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>