<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Associating Relevant Photos to Georeferenced Textual Documents through Rank Aggregation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rui Candeias</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bruno Martins rui.candeias</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>bruno.g.martins@ist.utl.pt</string-name>
          <email>P@1</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Instituto Superior Tecnico, INESC-ID Av. Professor Cavaco Silva</institution>
          ,
          <addr-line>2744-016 Porto Salvo</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The automatic association of illustrative photos to paragraphs of text is a challenging cross-media retrieval problem with many practical applications. In this paper we propose novel methods to associate photos to textual documents. The proposed methods are based on the recognition and disambiguation of location names in the texts, using them to query Flickr for candidate photos. The best photos are selected with basis on their popularity, on their proximity, on temporal cohesion and on the similarity between the photo's textual descriptions and the text of the document. We speci cally tested di erent rank aggregation approaches to select the most relevant photos. A method that uses the CombMNZ algorithm to combine textual similarity, geographic proximity and temporal cohesion obtained the best results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The automatic association of illustrative photos to paragraphs of text is a
challenging cross-media retrieval problem with many practical applications. For
instance the Zemanta1 blog enrichment extension is a commercial application
capable of suggesting photos from Flickr to blog posts. Another example concerns
with textual documents describing travel experiences, usually called travelogues,
which can give interesting information in the context of planning a trip.
Today, there are several websites where these documents are shared and the use of
web information for travel planning has also increased. However, the use of the
travelogues by themselves is very restrictive. It is our conviction that the
visualization of photos associated with speci c parts from the travelogue, like common
scenarios and points of interest, may lead to a better usage of travelogues.</p>
      <p>Despite the huge number of high quality photos in websites like Flickr2, these
photos are currently not being properly explored in cross-media retrieval
applications. In this paper, we propose methods to automatically associate photos,
published on Flickr, to textual documents. These methods are based on
mining geographic information from textual documents, using a free web service to
This work was partially supported by the Fundaca~o para a Ci^encia e a Tecnologia
(FCT), through project grant PTDC/EIA-EIA/109840/2009 (SInteliGIS)
1 http://www.zemanta.com/
2 http://www.flickr.com
recognize and disambiguate location names and points of interest mentioned in
the documents. The places recognized in the documents are then used to query
Flickr for related photos. Finally, the best photos are selected with basis on
their popularity and on the similarity between their information (e.g., textual,
geographical and temporal metadata) and the information from the document
(e.g., textual contents, recognized places and temporal metadata).</p>
      <p>The rest of this paper is organized as follows: Section 2 presents the main
concepts and related works. Section 3 describes the proposed methods, detailing
the mining of geographic information contained in texts and the selection of the
best photos, based on their popularity and similarity. Section 4 describes how
a system, containing the proposed methods, was implemented. It also presents
the results of an initial evaluation experiment. Finally, Section 5 presents our
conclusions and points guidelines towards future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Problems related with the treatment of geographic references in textual
documents have been widely studied in Geographic Information Retrieval [
        <xref ref-type="bibr" rid="ref1 ref11 ref15 ref16">1,11,15,16</xref>
        ].
Using this information requires the recognition of place names in the texts (i.e.,
delimiting the text tokens referencing locations) and the disambiguation of those
place names in order to know their real location in the surface of the Earth (i.e.,
give unique identi ers, typically geospatial coordinates, to the location names
that were found). The main challenges in both tasks are related with the
ambiguity of natural language. Amitay et al. characterized those ambiguity problems
according to two types, namely geo/non-geo and geo/geo [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Geo/non-geo
ambiguity occurs when location names have a non-geographic meaning (e.g., Turkey,
the country or the bird). Geo/geo ambiguity refers to distinct locations with the
same name (e.g. London in England and London in Ontario).
      </p>
      <p>
        Leidner studied di erent approaches for the recognition and disambiguation
of geographic references in documents [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Most of the studied methods resolve
places references by matching expressions from the texts against dictionaries of
location names, and use disambiguation heuristics like default senses (e.g., the
most important referenced location is chosen, estimated by the population size)
or the spatial minimality (e.g., the disambiguation must minimize the polygon
that covers all the geographic references contained in the document). Recently,
Martins et al. studied the usage of machine learning approaches in the recognition
and disambiguation of geographic references, using Hidden Markov Models in
the recognition task, and regression models with features corresponding to the
heuristics surveyed by Leidner, in the disambiguation task [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Other recent
works focused on recognition and disambiguation problems that are particularly
complex, involving the processing of texts where geographic references are very
ambiguous and with a low granularity (e.g., mountaineering texts mention tracks
and speci c regions in mountains), and where it is important to distinguish
between the location names pertinent to route descriptions and those that are
pertinent to the description of panoramas [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>Currently, there are many commercial products for recognizing and
disambiguating place references in text. An example is the Yahoo! Placemaker3 web
service, which was used in this work and is better described in Section 3.1.</p>
      <p>
        Previous works have also studied the usage of Flickr as a Geographic
Information Retrieval information source [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The information stored in this service
revealed itself to be useful for many applications, due to the direct links between
geospatial coordinates (i.e., the coordinates of the places where the photos were
taken, either given by cameras with GPS capabilities or by the authors), dates
(i.e., the moments when the photos were taken) and text descriptions that are
semantically rich (i.e., descriptions and tags associated to photos).
      </p>
      <p>
        In particular, Lu et al. addressed the automatic association of photos,
published on Flickr, to Chinese travelogues [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], with basis on a probabilistic topic
model detailed on a previous work [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], which is an extension of the Probabilistic
Latent Semantic Indexing (pLSA) method [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The main idea in the work by Lu
et al. is similar to the basis of our work, as the authors tested di erent methods
for the selection of photos, obtained by querying Flickr's search engine with the
location names recognized in the texts. The probabilistic topic model is used
by the authors to avoid the gap between the vocabulary used in the documents
and the textual descriptions used in photos, modeling photos and/or documents
as probabilistic distributions over words. The authors tested four di erent
approaches for the selection of relevant photos, namely (i) a baseline approach
based on a simple word-to-word matching with the words from the travelogue
texts and the tags that represent the photos (ii) a mechanism based on a
probabilistic model created with the travelogue texts (iii) a mechanism based on
a probabilistic model created with tags that represent the photos, and (iv) a
mechanism based on a probabilistic model using the texts and the tags, which
obtained the best results. In our work, we approached the problem in a slightly
di erent way, by querying Flickr with the geospatial information associated with
the places recognized in the documents.
      </p>
      <p>
        In terms of previous works related to the area of cross-media retrieval,
Deschacht and Moens presented an approach that tries to nd the best picture
of a person or an object, stored in a database of photos, using the captions
associated to each picture [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The authors built appearance models (i.e.,
language models that represent the text captions from images), to capture persons
or objects that are featured in an image. Two types of entity-based appearance
models were tested, namely an appearance model based on the visualness (i.e.,
the degree to which an entity is perceived visually), and another appearance
model based on the salience (i.e., the importance of an entity in a text). As
baseline approaches, the authors built two simpler appearance models, namely
(i) a bag-of-words (BOW) model based on the words of the image captions, and
(ii) a bag-of-nouns (BON) model based on the nouns and proper nouns
contained in the image captions. From a dataset composed of several image-caption
pairs, the authors created two di erent sets of images annotated with the
entities, namely (i) an easy dataset composed of images with one entity, and (ii)
      </p>
      <sec id="sec-2-1">
        <title>3 http://developer.yahoo.com/geo/placemaker/</title>
        <p>a di cult dataset composed of images with three or more entities. The results
showed that when the dataset was queried with only one entity, the method
using the appearance model based on the visualness achieved the best results.
On the other hand, when the query was composed of two entities, the method
using the bag-of-words had better results.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Automatic Association of Photos to Texts</title>
      <p>The proposed method for the automatic association of photos to textual
documents is essentially based on a pipeline of three stages, which involves (i)
recognizing and disambiguating location names and points of interest referenced
in documents, (ii) collecting candidate photos through Flickr's API4, and (iii)
selecting the best photos with basis on their importance and on their similarity
(e.g., textual, geographical and temporal) towards the document. In this section
we describe the three steps in detail.
3.1</p>
      <sec id="sec-3-1">
        <title>Mining Geographic Information in Documents</title>
        <p>In this work, we used the Yahoo! Placemaker web service in order to extract
locations and speci c points of interest from texts. Placemaker can identify and
disambiguate places mentioned in textual documents. The service takes as input
a textual document with the information to be processed, and returns an XML
document that lists the referenced locations. For each location found in the
input document, the service returns also its position in the text, the complete
expression that was recognized as the location, the type of location (e.g., country,
city, suburb, point of interest, etc.), an unique identi er in the locations database
used by the service (i.e., the Where On Earth Identi er - WOEID - used by
Yahoo! GeoPlanet5), and the coordinates of the centroid that is associated to
the location (i.e., the gravity center of the minimum rectangle that covers its
geographic area). Also, for each document taken as input, the service returns
the bounding box corresponding to the document (i.e., the minimum rectangle
that covers all its geographic locations).
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Collecting and Selecting Relevant Photos</title>
        <p>The main challenge in collecting and selecting photos relevant to a segment of
text is related to the semantic gap between the photo metadata and the text, as
well as the noise present in the documents and in the descriptions of the photos.
For instance, in the case of travelogues, and despite the fact that these documents
have a uniform structure, their authors frequently mention information related
to transportation and accommodation, and not only descriptions of the most
interesting locations. For example, if the text of a travelogue mentions an airport
4 http://www.flickr.com/services/api/
5 http://developer.yahoo.com/geo/geoplanet/
or the city where the trip ends, while describing the arrival, one can select
photos related to these locations, which are not important for illustrating the
most interesting contents of the document. We have that travelogues frequently
mention locations that are only slightly relevant, and so it is very important to
distinguish between relevant and irrelevant locations.</p>
        <p>Other challenges in collecting and selecting relevant photos are related with
the fact that photos published in Flickr are frequently associated to tags or
textual descriptions irrelevant to their visual contents (e.g., tags are usually
identical among di erent photos uploaded by the same person, at the same
time), and also the vocabulary used in Flickr can be very di erent from the
vocabulary used in textual documents.</p>
        <p>Having these limitations in mind, we tested di erent approaches for the
selection of relevant photos, combining di erent sources of evidence for estimating
the relevance of the photos. These approaches are as follows:
T1: Selection based on textual similarity: We compute the textual similarity
between the tags plus the title of the photos, and the text of the document.
Speci cally, we compute the cosine measure between the textual descriptions
of the photos (i.e., joining tags and title) and the textual document, using
the Term Frequency Inverse Document Frequency (TF-IDF) method to
weight terms in the feature vectors. The idea behind this method is that, if a
photo has textual descriptions more similar to the text of a document, then
it can be considered as a good photo to be associated to the document.
T2: Selection based on textual similarity and geographical proximity:
We combined the textual similarity from T1 with the similarity, based on
the geospatial coordinates, between the locations recognized in the document
and the locations where photos were taken. The geographical similarity is
1
computed according to the formula (1+d) , where d is the great-circle distance
between the two locations. Because multiple locations can be recognized
in the document, we computed the maximum and the average similarity
towards each photo. The idea behind this method is that a photo that was
taken near a location recognized in the document can be considered as a
good photo to be associated to the document.</p>
        <p>T3: Selection based on textual similarity, geographical proximity and
temporal cohesion: We combine the method from T2 with the temporal
distance, in semesters, between the publication date of the document and the
moment when a photo was taken. Similarly to what is done in method T2,
1
the temporal similarity is computed according to the formula (1+t) , where
t id the number of semesters separating the photo from the document. The
idea behind this method is that a photo taken in a moment close to the date
when the document was written can often be considered as a good photo to
be associated to the document.</p>
        <p>T4: Selection based on textual similarity, geographical proximity,
temporal cohesion and photo interestingness: We combine the method T3
with other information related to the interestingness of the photos (e.g., the
number of comments and the number of times other users considered the
photo as a favorite). In this case, if a photo was taken in a location inside
the bounding box of the document (i.e., the bounding box that contains all
locations), then the number of comments and the number of times a photo
was marked as favorite are considered as features, and otherwise these
features assume the value of minus one. The idea behind this method is that
a photo that was taken near the locations recognized in the document, and
that is considered an interesting photo due to the number of comments and
the number of times users marked it as a favorite, can be considered a good
photo to be associated to the document.</p>
        <p>
          The above combination approaches were based on the usage of rank
aggregation schemes to combine the multiple features. Speci cally, two approaches
were considered, namely the CombSUM and the CombMNZ methods originally
proposed by Fox and Shaw [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Both CombSUM and CombMNZ use normalized
sums when combining the di erent features. To perform the normalization, we
applied the min-max normalization procedure to the scores of the individual
features, which is given by Equation 1.
(1)
(2)
        </p>
        <p>The CombSUM score of a photo p, for a given document D, is the the sum of
the normalized scores received by the photo in each of the k individual rankings,
and is given by Equation 2.</p>
        <p>k
CombSU M (p; D) = X scorej(p; D)</p>
        <p>j=1</p>
        <p>Similarly, the CombMNZ score of a photo p for a given document D is de ned
by Equation 3, where re is the number of non-zero similarities.</p>
        <p>CombM N Z(t; P ) = CombSU M (t; P ) re
(3)</p>
        <p>For measuring the similarity between the textual description of the photos
and the text of the document, in all the above methods, stopwords were rst
removed. To calculate the cosine measure between the photos textual descriptions
and the document, using the Term Frequency Inverse Document Frequency
(TF-IDF) method, we considered tags to be more important to describe the
photo, followed by the title. Thus, we applied di erent weights for the di erent
types of textual descriptions, weighting the tags as twice more important.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Validation Experiments</title>
      <p>We implemented a prototype system based on the techniques described in the
previous section, using the Qizx6 XQuery engine as an execution environment.</p>
      <sec id="sec-4-1">
        <title>6 http://www.xmlmind.com/qizx/</title>
        <p>This XQuery engine supports the latest version of the standard, together with the
XQuery Full Text extension to perform full-text search with the cosine measure
and TF/IDF vectors, in collections of XML documents.</p>
        <p>In order to validate the proposed methods, we created a corpus of 450 photos
downloaded from Flickr, with geographical information and a su ciently large
textual description (i.e., more then 100 words and containing location names
or points of interest). We used expressions frequently used in travelogues, such
as monument, vacation, trip or castle to lter the photos collected from Flickr.
The collected photos were taken in a point contained in the bounding box
corresponding to the geospatial footprint of one of the world's most visited cities7.
Also, the considered photos were taken in a date from 2000-01-01 to
2010-0501. For each photo, the number of comments and the number of times it was
considered as favorite by other users were also collected.</p>
        <p>In order to conduct the experiments, we needed a collection of documents
with relevance judgments for photos, i.e., a correct relevant photo associated to
the document. This collection was not already available and creating a collection
of travelogue documents, illustrated with Flickr photos that had been manually
selected by human experts, would be extremely time consuming, also implying
some knowledge about the locations described in the documents. This collection
is not already available, and creating a collection of photos from Flickr selected
and associated by experts to travelogues would be extremely time consuming,
and would imply a certain knowledge of the city to where the travel was made.</p>
        <p>The photo descriptions from Flickr, with the above characteristics, are fairly
good examples of documents with relevance judgments, because the owner
considered the photo as a relevant example to be associated to the large textual
description. So, for the purpose of our experiments, we considered the textual
descriptions as representations of textual documents having the same
characteristics as travelogues, and the photos from which the textual descriptions were
taken as the relevant photos that should be automatically associated.</p>
        <p>The prototype system, implementing di erent con gurations for the proposed
method, was then used to process the documents, associating them to relevant
photos. The con gurations used are described in Section 3.2.</p>
        <p>With the results for each document, and considering all four possible
congurations with the two voting schemes, we used the trec eval evaluation tool
to evaluate the matchings between photos and documents. Figure 1 presents
the results obtained in terms of Precision at position 1 (Precision@1), and in
terms of the Reciprocal Rank, in the all the considered cities. The horizontal
lines represent the mean value of Reciprocal Rank, in red, and the mean value
of Precision@1, in blue, for all the considered cities and when using the best
con guration. In all the charts, the bar in red, full colored, represents the value
of Reciprocal Rank, and the bar in blue, with a shaded color, represents the
value for the metric of Precision@1.</p>
        <p>The graphics show that method T3 using the CombMNZ approach (i.e.,
T3MNZ) outperforms method T1 in all the cities. These results suggest that the
7 http://en.wikipedia.org/wiki/Tourism
0.0 T1 T2−SUM T2−MNZ T3−SUM T3−MNZ T4−SUM T4−MNZ</p>
        <p>0.0 T1 T2−SUM T2−MNZ T3−SUM T3−MNZ T4−SUM T4−MNZ
London
Mean RR</p>
        <p>Mean P@1
0.0 T1 T2−SUM T2−MNZ T3−SUM T3−MNZ T4−SUM T4−MNZ
usage of multiple features (e.g., geographical proximity and temporal cohesion)
combined with the textual similarity is better then the usage of the textual
similarity alone. Also, methods using CombMNZ as the rank aggregation approach
have similar results to the methods using CombSUM.</p>
        <p>It is also interesting to notice that the values in the cities of Paris, London
and New York are higher, although the dataset contained an equal number of
photos for each city (i.e., 50 photos). In these cities, all the combination methods
using CombMNZ outperform method T1. These results suggest a higher
precision of Placemaker in the recognition and disambiguation of the location names
mentioned in the descriptions for those cities, although it should be noticed that
textual similarity alone also presents good results in these cities.</p>
        <p>Figure 2 illustrates the obtained results for two example textual descriptions,
presenting the top-3 most relevant photos as returned by the best performing
method, together with their tags in Flickr.</p>
        <p>Figure 3 presents the number of documents, in the collection, containing each
possible number of words, and the number of documents mentioning di erent
numbers of places. In the collection, there is a higher number of documents with
100 to 200 words. Also, the number of recognized places is frequently low, with
most of the documents containing 1 to 5 places.</p>
        <p>Figure 4 illustrates the relationships existing between the values of
Precision@1 and Reciprocal Rank, with the number of words and the number of
places, when considering the combination method that had the best results, i.e.,
The Louvre Pyramid is a large glass
and metal pyramid, surrounded by
three smaller pyramids, in the main
courtyard of the Louvre Palace in
Paris. The large pyramid serves as the
main entrance to the Louvre Museum.</p>
        <p>Completed in 1989, it has become a
landmark for the city of Paris. The
construction of the pyramid triggered
considerable controversy because many
people felt that the futuristic edi ce
looked quite out of place in front of the</p>
        <p>Louvre Museum with its classical</p>
        <p>architecture.</p>
        <p>Water, without which, we would not be
on Earth. This was captured to show
that Kuala Lumpur is a blend of old
buildings as well as new. This fountain
sits on one end of the famous Selangor</p>
        <p>Club eld used for Merdaka Day
Celebrations. Reggie Wan of Singapore
and I were here on this bright and hot
day. Couldn't get out fast enough away
from this tourist spot! Actually, all the
photos showcased here were pretty nice
and I really couldn't decide which one to
be the main picture. I chose this one</p>
        <p>because it was more artistic (the
fountain is dark, the tall building
medium grayish and the minaret is white
surrounded by blue)!
id=3756841917 id=3784201917 id=3418929006
louvre
paris
pyramid
palace
europe
vacation</p>
        <p>night
re ections
summer
museum
paris
france
pompidou
frontpage
longexposure
centrepompidou
longexposure</p>
        <p>nikon
guidomusch
parijs
paris
france
colour
seine
boat
lters
lee
lee lters
sunset
light
id=294945309 id=4194402499 id=327607369</p>
        <p>water
fountain
buildings</p>
        <p>sky
tourist
aqua
cityscape
re ections
jalanraja
fuji lm
kuala
lumpur
petronas
explore
digital
blending
dynamic
nikon
range
malaysia</p>
        <p>building
kualalumpur
malaysia</p>
        <p>soe
southeastasia
reggiewan</p>
        <p>asia
top20travelpix</p>
        <p>klcc
mosque
T3 using CombMNZ. These results suggest that a higher number of words does
not improve the results, neither in terms of Precision@1 or Reciprocal Rank.
The higher value of Reciprocal Rank and Precision@1 in documents with 1200
to 1300 words can be explained by the corresponding small number of
documents (i.e., only 2 documents). It is also interesting to notice that the values for
Precision@1 and for the Reciprocal Rank seem to improve when more than one
place is referenced in the document.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Work</title>
      <p>In this paper, we have described novel methods for the automatic association of
photos to textual documents. The described methods are based on a pipeline of
three steps, in which geographic references are rst extracted from documents,
kna .08
R
l
rcopa .06
i
ce
R
/
iin1o@ .04
sc
reP .02
then photos matching the geographic references are collected, using Flickr's API,
and nally the best photos are selected with basis on their similarity and
relevance. Di erent methods to select relevant photos were compared and a method
based on the combination of textual similarity, geographic proximity and
temporal cohesion, using the CombMNZ rank aggregation method for performing
the combination, obtained the best results.</p>
      <p>
        Despite the good results from our initial experiments, there are also many
challenges to future work. From our point of view, the major challenge lies in
improving the evaluation protocol. The validation of the proposed methods should
be made through a collection of static photos, with relevance judgments clearly
established by humans. The Content-based Photo Image Retrieval (CoPhIR)
collection, described in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and built from 106 million photos from Flickr, could be
a starting point for building such a test collection. Another idea is to experiment
the proposed methods in a collection not related to the domain of travelogues.
For instance, the dataset with news texts from BBC which was described by Feng
and Lapata [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], containing approximately 3400 entries and where each entry is
composed by a news document illustrated with a image that contains a textual
caption, could also be used to as a starting point to build a better test collection
to evaluate our method. This corpus contains near 3400 entries, where each
entry is composed by a news document, a news image related with the document
and its caption. Also, besides the usage of the cosine similarity to measure the
textual similarity between photos and documents, it would be interesting to use
di erent methods, for instance based on probabilistic topic models such as the
Latent Dirichlet Allocation (LDA) model [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        It would also be interesting to experiment with supervised learning
methods for combining the di erent relevance estimators. Several supervised learning
to rank methods [
        <xref ref-type="bibr" rid="ref12 ref13">13,12</xref>
        ], recently proposed in the information retrieval
community to address the problem of ranking search engine results, could be used to
develop models that can sort photos based on their relevance, considering di
erent sources of evidence (i.e., several similarity and importance metrics). Recent
works in the area of information retrieval have also described several advanced
unsupervised learning to rank methods, capable of outperforming the
CombSUM and CombMNZ approaches. This is currently a very hot topic of research
and, for future work, we would for instance like to experiment with the ULARA
algorithm, which was recently proposed by Klementiev et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>E.</given-names>
            <surname>Amitay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Har'El</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sivan</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>So er. Web-a-where: geotagging web content</article-title>
          .
          <source>In Proceedings of the 27th Annual international ACM SIGIR Conference on Research and Development in information Retrieval</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>D.</given-names>
            <surname>Blei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Jordan</surname>
          </string-name>
          .
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>3</volume>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>P.</given-names>
            <surname>Bolettieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Esuli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Falchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lucchese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Perego</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Piccioli</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Rabitti</surname>
          </string-name>
          .
          <article-title>Cophir : A test collection for content-based image retrieval</article-title>
          .
          <source>Technical report, Institute of Information Science and Tecnologies</source>
          , National Reasearch, Pisa, Italy,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Crandall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Backstrom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Huttenlocher</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Kleinberg</surname>
          </string-name>
          .
          <article-title>Mapping the world's photos</article-title>
          .
          <source>In Proceedings of the 18th international conference on World Wide Web</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>K.</given-names>
            <surname>Deschacht</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Moens</surname>
          </string-name>
          .
          <article-title>Finding the best picture: Cross-media retrieval of content</article-title>
          .
          <source>In Proceedings of the 30th European Conference on Information Retrieval</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Feng</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Lapata</surname>
          </string-name>
          .
          <article-title>Automatic image annotation using auxiliary text information</article-title>
          .
          <source>In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Fox</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Shaw</surname>
          </string-name>
          .
          <article-title>Combination of multiple searches</article-title>
          .
          <source>In Proceedings of the 2nd Text Retrieval Conference</source>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Q.</given-names>
            <surname>Hao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Pang</surname>
          </string-name>
          , and
          <string-name>
            <surname>L. Zhang.</surname>
          </string-name>
          <article-title>Generating location overviews with images and tags by mining user-generated travelogues</article-title>
          .
          <source>In Proceedings of the 17th ACM international Conference on Multimedia</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>T.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          .
          <article-title>Probabilistic latent semantic indexing</article-title>
          .
          <source>In Proceedings of the 22nd Annual International ACM SIGIR conference on Research and development in information retrieval</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>A.</given-names>
            <surname>Klementiev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Small</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Titov.</surname>
          </string-name>
          <article-title>Unsupervised rank aggregation with domain-speci c expertise</article-title>
          .
          <source>In Proceedings of the 21st International Joint Conference on Arti cal intelligence</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>J.</given-names>
            <surname>Leidner</surname>
          </string-name>
          .
          <article-title>Toponym Resolution in Text: Annotation, Evaluation and Applications of Spatial Grounding of Place Names</article-title>
          .
          <source>PhD thesis</source>
          ,
          <source>Institute for Communicating and Collaborative Systems</source>
          , School of Informatics, University of Edinburgh,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>Learning to Rank for Information Retrieval and Natural Language Processing</article-title>
          . Morgan &amp; Claypool Publishers,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. T. Liu.
          <article-title>Learning to rank for information retrieval</article-title>
          .
          <source>Foundations and Trends in Information Retrieval</source>
          ,
          <volume>3</volume>
          (
          <issue>3</issue>
          ),
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>X.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Hao</surname>
          </string-name>
          , and
          <string-name>
            <surname>L. Zhang.</surname>
          </string-name>
          <article-title>Visualizing textual travelogue with locationrelevant images</article-title>
          .
          <source>In Proceedings of the 2009 international Workshop on Location Based Social Networks</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>B.</given-names>
            <surname>Martins</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Anastacio</surname>
          </string-name>
          , and
          <string-name>
            <surname>P. Calado.</surname>
          </string-name>
          <article-title>A machine learning approach for resolving place references in text</article-title>
          .
          <source>In Proceedings of the 13th AGILE International Conference on Geographic Information Science</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>M. Piotrowski</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Liubli</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Volk</surname>
          </string-name>
          .
          <article-title>Towards mapping of alpine route descriptions</article-title>
          .
          <source>In Proceedings of the 6th ACM Workshop on Geographic information Retrieval</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>