<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantically Tagging Images of Landmarks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Heather S. Packer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jonathon S. Hare</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sina Samangooei</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul H. Lewis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Electronics and Computer Science, University of Southampton</institution>
          ,
          <addr-line>Southampton SO17 1BJ</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Semantic tagging allows images to be linked with URIs from web resources. Disambiguating URIs which correspond with an image's visual content is challenging. Previous work has largely failed to e ectively contextualise the knowledge provided by the Semantic Web and the user-provided keyword tags in images. We propose an algorithm which uses geographical coordinates and keywords of similar images to recommend semantic tags that describe an image's visual content. Our algorithm uses the semantic stores YAGO2, DBPedia and Geonames. These stores allow us to handle multi-lingual keyword tags and disambiguate between alternative names for landmarks.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Image tagging systems predominately allow no explicit semantic links describing
the visual content of images. The meaning of user assigned keyword tags can be
ambiguous because they typically use only a few words. For example, a user
may tag an image with \cathedral". This keyword tag does not specify which
cathedral. In order to improve the semantics of tags, \Semantic tags" (also known
as \Machine tags") have been used to add explicit context in the form of links and
additional information such as geographic coordinates. Speci cally, a semantic
tag is a link de ned by a Uniform Resource Identi er (URI) referencing an entity
de ned on the Semantic Web. Semantic tags enrich images by linking them to
resources that provide more information about the entities contained within the
image. A key challenge in semantic tagging is assigning the correct URIs.</p>
      <p>
        This paper describes an approach that attempts to semantically enrich a
query image, that has no information other than its pixel content, with URIs
of the entities depicted in the image. Firstly, we identify visually similar images
by comparing the visual features of the query image against a large corpus of
partially annotated images. This corpus contains a mixture of images that have
been geotagged and images with keyword tags, or images with a mixture of
the two types of annotation. Using the geolocation information from the similar
images we estimate the most likely geographic coordinates of the query. We then
compare the raw keyword tags of the similar images to entities contained in the
YAGO2 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] knowledge-base and validate whether any geographic-relations of the
entities are close to the query image's estimated location using the DBPedia [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
and Geonames [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] knowledge-bases. Finally, we use these selected entities to
construct a list of URIs which are relevant to the content of the query image1.
      </p>
      <p>
        The broader motivation of this work is to bridge the gap between
unstructured data attainable in real time, such as a GPS-location or image of a
landmark, and, rich, detailed and structured information about that landmark,
therefore facilitate more informed search and retrieval. For example, users can engage
in semantic searching; targeting a particular church, architect, period in history
or architectural style. This kind of structured search can also support event
detection [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or lifelogging [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] by helping collate documents which refer to a
particular entity more accurately.
      </p>
      <p>This work provides two contributions beyond state-of the-art. Speci cally,
we verify semantic tags using distance according to the height of the entities
identi ed in an image, and we also recommend URIs using a multi-language tag
index derived from multiple semantic web knowledge sources. The multi-lingual
index allows us to make recommendations from keywords in foreign languages
by identifying alternative names to utilise in our approach.</p>
      <p>The structure of the paper is as follows: rstly, we discuss related work on
tagging systems. Secondly we present details of our tagging approach. Finally, we
discuss the use of our system with some examples and provide our conclusions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Automatic image annotation is widely studied, but techniques that integrate
multiple contexts using semantic web resources are relatively rare. The following
review looks at works that recommend tags using both geographical coordinates
and visual image similarity. SpiritTagger [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] recommends keywords that re ect
the spirit of a location. It nds visually similar images using colour, texture, edge
and SIFT features [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and clusters tags based within a geographical radius. These
selected tags are ranked based on frequency and then importance. This enables
their algorithm to recommend tags that are speci c to a geographic region. The
focus of our work is to recommend semantic tags (URIs) that describe a place
of interest, not tags relating to a larger area.
      </p>
      <p>
        Moxley et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] use a corpus of images and their tags, and organise them into
sets of places, events and visual components. These are clustered based on the
co-occurrence of words and the distance between named geographical entities.
If an image matches, the wikipedia title is used as a recommendation for the
name of the landmark in the image. This approach uses limited information
from wikipedia to identify and recommend tags. In our approach, we aim to
validate semantic tags using additional information from semantic data sources.
      </p>
      <p>
        Similar to Moxley et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], Kennedy and Naaman [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]'s approach also
considers the importance of tags relevant to a speci c area or event. Their approach
generates a representative image set for landmarks using image tags and
geographic coordinates. Their technique identi es tags that occur frequently within
1 A demonstration of our approach can be found here: http://gtb.imageterrier.org
one geographic area, but infrequently elsewhere, in order to identify tags that
are uniquely local. They also lter tags that only occur during speci c time
ranges; this enables them to identify events such as the \New York Marathon"
and determine whether this is relevant to a query image by analysing the date
it was taken. Their focus was not concerned with recommending semantic tags.
      </p>
      <p>
        There are a number of datasets that contain ickr images and related URIs.
For instance, the Ookaboo dataset was manually created by 170,000
contributors who submitted images and classify them against a topic from wikipedia 2.
In contrast, our approach recommends URIs automatically for an untagged
image, by using tags from images that share visual features. The ickrtm wrapper
API allows users to search with a URI of an entity on Wikipedia and search for
images that depict that entity. In particular, you can search for images within a
user-speci ed distance of the geographical location of the searched entity (if the
entity has a geographical location). This is the opposite problem to us, our query
is an image depicting a landmark where the landmark is unknown, whereas their
query is an entity on Wikipedia. The work by [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] identify entities using natural
language processing by stemming words to nd their root or base by removing
any in ections, using Wordnet. They then identify any relationships between
these entities using the hypernym, holonym, meronym, and toponym
relationships described in Wordnet to create the triples describing the entities described
in ickr tags. Our approach supports [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]'s, by generating the URI describing
entities depicted in an image when it has no tags, so that their approach could
generate triples.
      </p>
      <p>
        A number of other approaches simply use location and visual features to
annotate images (e.g. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]). There has also been work to recommend tags, based
on existing annotations (e.g. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]), and recommending semantic entities, based
on existing tags (e.g. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]).
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Approach</title>
      <p>Our semantic tag recommendation approach has ve steps. Firstly, we search a
large index of visual features extracted from partially annotated images with the
features extracted from a query image in order to nd images similar to the query.
The index contains images that have either geographic tags or keyword tags (or
a mixture of the two). From the set of similar images with geographic locations
we calculate a robust average of the latitude and longitude which estimates the
geographic location of the query. Secondly, we use the keyword tags associated
with the similar images to identify entities close to the estimated coordinates
from YAGO2. Thirdly, we classify the types of entities that are possible
recommendations using the type hierarchies of YAGO2, DBPedia and Geonames.
In the fourth step, we restrict our recommendations based on their height and
distance. In the nal step, we expand our set of URIs with those from the closest
city in order to try and identify additional relevant semantic entities.
2 Ookaboo: http://ookaboo.com/o/pictures/</p>
      <p>The approach has been developed using an index of images crawled from
Flickr representing the Trentino region in Italy. More details of the dataset can
be found in Section 4. In the remainder of this section, we walk through each of
the ve steps in detail.
3.1</p>
      <sec id="sec-3-1">
        <title>Visual Image Similarity</title>
        <p>
          Firstly, we compare the visual content of an untagged query with the visual
content of each image in a dataset of tagged images. This is achieved by comparing
via a BoVW (Bag of Visual Words) [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] representation of both query image and
dataset images, extracted using the OpenIMAJ Toolkit3 [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. For e cient
retrieval, the BoVW of the dataset images is held in a compressed inverted index,
constructed using ImageTerrier4 [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Once constructed, the index can be used
to retrieve dataset images which are most visually similar to a query image;
the tags and geographic locations of the closest dataset images are passed on to
the next steps of our process. Speci cally, the retrieval engine is tuned to only
retrieve images that match with a very high con dence and thus only match the
speci c landmark/object in the query image; the aim is not to classify images
into landmark classes, but rather to identify a speci c instance.
        </p>
        <p>
          The BoVW image representations are constructed by extracting di
erence-ofGaussian SIFT features [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] from an of image and quantising them to a discrete
vocabulary. A vocabulary of 1 million features was learnt using approximate
KMeans clustering [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] with SIFT features from the MIRFlickr25000 dataset [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ].
Once the content of each image is represented as a set of visual terms, we
construct an inverted index which encodes each term in an image. The inverted
index is augmented with the orientation information of the SIFT feature
corresponding to each term; this extra geometric information allows us to improve
3 OpenIMAJ Toolkit: http://openimaj.org
4 ImageTerrier: http://imageterrier.org
retrieval precision using an orientation consistency check [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] at query time. For
every query image, the SIFT features are extracted and quantised. The set of
visual terms form a query against the inverted index which is evaluated using
the Inverse-Document-Frequency weighted L1 distance metric [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. Using this
strategy we select the top ten images in the dataset. These images provide us
with potential keyword tags and geographic coordinates.
        </p>
        <p>An iterative approach is used to robustly estimate the geographic location
of the query from the set of geotagged result images. The technique needs to be
robust as there is a high probability of outliers. Starting with all the matching
geotagged images, a geographic centroid is found and the image which is
geographically furthest from the centre is removed. The centroid is then updated
with the remaining images. This process continues iteratively until the distance
between the current centroid and furthest point from the centroid is less than
a prede ned threshold. Through initial tests on our dataset, we found that the
threshold of 0.8 returned between 70% and 100% of images that were visually
similar. An example of the visual similarity search and geographic localisation
for a query image of Trento Cathedral is illustrated in Figure 1.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Keyword-Entity Matching</title>
        <p>The second step aims to nd URIs representing entities in the query image
by attempting to match keyword tags to the names of entities. This can be
problematic because it is common for keyword tags to contain more than one
word without white space. Therefore, when searching for an entity in YAGO2
that matches a tag representing more than one word it will yield no matches.
For example, the keyword tag `trentocathedral' will not match the YAGO2
entity `Trento Cathedral'. In order to enable us to search for attened tags, we
performed a pre-processing step to create additional triples relating attened
tags to entities within YAGO2. We also attened the entities relating to an
entity through the \isCalled" property because it contains alternate terms used
to refer to an instance (including foreign language names). For example, the
YAGO2 entity for \Trento Cathedral" can also be called \Cattedrale di San
Vigilio" and \Katedralo de Santka Vigilio". Thus, we also use the attened
entity names \cattedraledisanvigilio" and \katedralodesantkavigilio" to represent
\Trento Cathedral". These additional triples and YAGO2 are used to look up
all the tags using exact string matching. If there are matching entities then we
check that they are in the same city (using the geographical coordinates from
YAGO2 and the estimated coordinates from step one). In our Trento example,
we retrieve the URIs shown in Table 1 from the image's tags.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Cross-Source Category Matches</title>
        <p>The aim of the third step is to determine whether the entities identi ed in step
2 and the keyword tags of the similar images are of a speci c type. We organised
these types into general categories, including town/city, region, country, date,
weather, season, mountain, building, activity, transport and unknown. This list
was derived from a sample set of tags from our corpus of Flickr images from the
Trentino region, and can be used to categorise 78% of all tags in the corpus. This
categorisation allows us to search for entities that have a geographical location
because we can lter out entities of type date, weather and activity which is not
speci c to one geographical location.</p>
        <p>In order to categorise the identi ed matches we found in YAGO2 (from the
previous step), we look up the identi ed entities in DBPedia and Geonames. This
is possible because YAGO2 incorporates the property \hasWikipediaUrl" which
DBPedia and Geonames both reference. In order to identify the matches'
categories we recurse through the type hierarchies of DBPedia and compare lexically
the hierarchies. We also map the Geonames feature code which categorises towns,
regions, countries, mountains and other landscape features, with our categories.
We add any entities that we cannot categorise to the `unknown' set.</p>
        <p>In our Trento Cathedral example, we categorise the entities identi ed from
YAGO2 (see the YAGO2 Matches in Table 1) and the tags from similar images
(see the tag cloud shown in Figure 1). Table 2 shows the selected categories and
corresponding properties that were used to infer these categories.
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Geographic Distance Constraints</title>
        <p>Using the categories assigned to the tags in step 3, we aim to verify whether
the entities (identi ed in step 2) which have a geographical location, are located
within a certain distance from the predicted location (from step 1). We prede ne
acceptable maximum distances for entities of type city, region and mountain and
found that through preliminary testing that these were suitable values (see Table
3). It is possible to evaluate the height of buildings using the \heightStories" and
\ oorCount" properties in DBPedia. We approximate a viewing distance for
buildings using these properties. Based on an empirical evaluation, with every
oor we estimate that it is possible to see a building from a further 5 meters
away. This is, however, an approximation because it will di er with landscape
features such as elevation, the average height of buildings around the building
in the query image, and the height of the oors.</p>
        <p>Our approach cannot guarantee that recommended entities are contained
within the query image because an entity might be within range of the estimated
location but it may not be within sight of the camera because other objects may
block the view or recommended entities might be located in a di erent direction.
However, we make our recommendation because images matched with step 1
contain reference to these entities. Therefore, we hypothesise that there is a high
likelihood that these recommended entities are depicted in the query image.</p>
        <p>In our Trento Cathedral example, the entity Duomo of category building has
5 oors and is 10 meters from the estimated geolocation. Using our approach we
validate that the Duomo is within our speci ed range. In the tags related to the
similar images, we identify that the building \Perugia Cathedral" has 6 oors
and is 343.5 kilometers away from the estimated location. Therefore, we do not
recommend URI of this building because its is not within range.
3.5</p>
      </sec>
      <sec id="sec-3-5">
        <title>Recommendation Expansion</title>
        <p>In the nal step of our approach, we aim to derive further matches by
expanding our search terms. Speci cally, we expand all non place entities (excluding
entities of the type city, town, region and country) with the the closest place
name, using the pattern [place name][non place entity]. This allows us to
disambiguate entities that are common to many places, such as town halls,
police stations and libraries. We then check whether the matches are located
close to the estimated coordinates. In our Trento Cathedral example, the tag
piazza is expanded to \Trento piazza" which is linked to by YAGO2 by the
\isCalled" property to the entity \Trento-Piazza del Duomo". We then validate
that the \Trento-Piazza del Duomo" is categorised with place and is within the
geographic distance range of 0.5km from the estimated geographical location.
In Table 4 we detail the extended tags which we attempt to match to YAGO2.
Table 5 details the recommended URIs for our example.</p>
        <p>Extended tag
[place][non place entity]</p>
        <p>Trento Cathedral</p>
        <p>Trento Piazza
In this section, we discuss the recommended URIs for four example query
images. The dataset of partially annotated images used as the basis to testing the
approach was crawled from Flickr. We rst downloaded approximately 150,000
geo-tagged images that were within the bounds of the province of Trentino, an
area of 2,397 square miles in the north of Italy. This set was then enriched with
an additional 400,000 images with keyword tags relating to the Trentino region.
In total our image set consists of 472,565 images because of the intersection
of these images sets5. We randomly sampled 3,250 images from this set and
manually identi ed the theme of the image (see Table 6).</p>
        <p>
          We considered using standard image sets such as the European Cities 50K[
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]
dataset and MIRFlickr[
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. However, we wanted to include images from the
surrounding area because often they are the most similar visually, areas typically
have a particular style due to tradition and age of the area. The European cities
set contains images from di erent cities and does not include whole regions.
Similarly we chose not to use the MIRFlickr image set as it was not suitable
because our approach requires both tags to identify landmarks and geotags to
disambiguate between landmarks, and 88.948% of the images in MIRFlickr did
not contain geotags. Whereas our image set contains over double the number of
geotagged images at 23% compared to 11%. To the best of our knowledge there
were no suitable datasets which contained a complete area of geotagged images,
or that contained a ground truth of URIs associated with each image.
4.1
        </p>
      </sec>
      <sec id="sec-3-6">
        <title>Example 1</title>
        <p>The rst query image depicts Trento Cathedral, the Neptune Fountain and a
plaza, and the visual feature search returned the images that depict the cathedral
(see Figure 2). The seventh similar image is a photograph of Trento Cathedral
and that image is not geographically close to the landmark's actual location.
While estimating the geographical location of the image, the seventh image's
geotags are removed by the threshold described in Section 3.1, and therefore
does not e ect the recommended URIs. Our approach correctly identi es that
the query image contains Trento Cathedral and recommends the URIs for the
Italian and English wikipedia URIs for the cathedral because the tag cloud
contains `trentocathedral', `trento' and `cathedral'. It also recommends URIs
relating to the region, such as town, region, and country, and URIs relating
5 Our image set can be downloaded here: http://degas.ecs.soton.ac.uk/~hp07r/
fullcollection.csv
to ecclesiastical buildings, notably it recommended URIs about the cathedral's
Pope (see following list).
1. http://en.wikipedia.org/wiki/Trento_Cathedral
2. http://it.wikipedia.org/wiki/Cattedrale_di_San_Vigilio
3. http://it.wikipedia.org/wiki/Provincia_autonoma_di_Trento
4. http://www.mpii.de/yago/resource/Italy
5. http://en.wikipedia.org/wiki/Italy
6. http://en.wikipedia.org/wiki/Trento
7. http://it.wikipedia.org/wiki/Church
8. http://en.wikipedia.org/wiki/Alps
9. http://en.wikipedia.org/wiki/Cathedral
10. http://it.wikipedia.org/wiki/Trento
11. http://it.wikipedia.org/wiki/Cattedrale
12. http://it.wikipedia.org/wiki/The_Church
13. http://en.wikipedia.org/wiki/Province_of_Trento
14. http://en.wikipedia.org/wiki/Church
15. http://www.mpii.de/yago/resource/Province_of_Trento
16. http://en.wikipedia.org/wiki/Pope_Vigilius
17. http://it.wikipedia.org/wiki/Papa_Vigilio
18. http://it.wikipedia.org/wiki/Cathedral
19. http://en.wikipedia.org/wiki/Trentino
20. http://www.mpii.de/yago/resource/Trento
21. http://en.wikipedia.org/wiki/Mountain
22. http://en.wikipedia.org/wiki/Alto
4.2</p>
      </sec>
      <sec id="sec-3-7">
        <title>Example 2</title>
        <p>The second query image depicts Trento Cathedral, and the visual feature search
correctly matched seven images of the cathedral (see Figure 3). From the tag
cloud we can see that one or more of the similar images has been incorrectly
tagged with `Buonconsiglio' and `Castle'. These tags refer to Buonconsiglio
Castle which is approximately 700 meters from Trento Cathedral. In step four of our
approach, we disambiguate between places of interest when there is a distance
greater then 0.5km. However, in this case, our approach was unable to
disambiguate between the two places of interest because all the geotagged images were
within 0.5km of Trento Cathedral (as de ned on Geonames) and contained tags
relating to both the cathedral and castle. If the image tagged with
`Buonconsiglio' was geographically located at the castle, then our approach would have
only recommended URIs relating to the cathedral. Our approach recommended
the URIs in example 1 and those in the following list, and recommended URIs
that relate to both Buonconsiglio Castle and Trento Cathedral.
1. http://it.wikipedia.org/wiki/Castello_del_Buonconsiglio
2. http://en.wikipedia.org/wiki/Castello_del_Buonconsiglio
3. http://it.wikipedia.org/wiki/Castello
4. http://www.mpii.de/yago/resource/Trento_Cathedral
4.3</p>
      </sec>
      <sec id="sec-3-8">
        <title>Example 3</title>
        <p>The third query image also depicts Trento Cathedral (see Figure 4). The visual
feature search matched three images but only one of the images depicted Trento
Cathedral. Non of these images were tagged, therefore our approach could not
nd or expand any tags to look up entities in YAGO2, DBPedia or Geonames.
The fourth query image depicts Buonconsiglio Castle. The visual feature search
returned over 20 images. Figure 5 shows the rst eight images which are the
most similar to the query image. The rst eight images all depict the castle.
The visual feature search also returned images of Trento Cathedral, hence the
tag cloud contains tags about the cathedral: cathedral, catedral, cathdrale,
cattedrale, and vigilio. Unlike our second example, our approach was able to
disambiguate between the castle and cathedral because the similar images were
correctly geotagged within 0.5km from the photographed landmark. Our
approach expanded the tag Buonconsiglio with castle (see Section 3.5), because it
determined that castle was a type of building, and thus was able to identify the
wikipedia URI http://en.wikipedia.org/wiki/Buonconsiglio_Castle. The
following list contains our approach's recommended URIs.
1. http://en.wikipedia.org/wiki/Buonconsiglio_Castle
2. http://it.wikipedia.org/wiki/Castello_del_Buonconsiglio
3. http://en.wikipedia.org/wiki/Castello_del_Buonconsiglio
4. http://it.wikipedia.org/wiki/Provincia_autonoma_di_Trento
5. http://en.wikipedia.org/wiki/Trento</p>
        <p>Our approach can be hindered by the quality of information in the semantic
knowledge stores | the set of tags and the coordinates. The examples discussed
in this section show that without approximately correct coordinates or tags, our
algorithm will not be able to identify and recommend accurate semantic tags.
Nor will our approach be able to validate the coordinates of places of interest,
if the knowledge base does not contain them.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In this paper, we present an algorithm to recommend URIs that represent the
visual content of an image and focus on identifying places of interest using
geographical information from YAGO2, DBPedia, and Geonames. In order to use
these knowledge sources, we use large-scale image matching techniques to nd
similar images that are then used to estimate geo-coordinates and potential tags.</p>
      <p>The four examples show that the quality our results highly depends on the
quality of the image matching techniques and the reference corpus. Our approach
performs best when there are accurate tags and geotags and this is not always the
case with collections of images. For future work, we plan to develop approaches
that consider how to better handle keyword tags and geotags that are incorrect.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was funded by the European Union Seventh Framework Programme
(FP7/2007-2013) under grant agreements n 270239 (ARCOMEM), 287863
(TrendMiner) and 231126 (LivingKnowledge) together with the LiveMemories project,
graciously funded by the Autonomous Province of Trento (Italy).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. J. Ho art,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Berberich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Lewis-Kelham</surname>
          </string-name>
          , G. de Melo, and G. Weikum, \
          <article-title>YAGO2: Exploring and Querying World Knowledge in Time, Space, Context, and Many Languages,"</article-title>
          <source>in Proceedings of the 20th International Conference Companion on World Wide Web. ACM</source>
          ,
          <year>2011</year>
          , pp.
          <volume>229</volume>
          {
          <fpage>232</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          , G. Kobilarov,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ives</surname>
          </string-name>
          , \
          <article-title>Dbpedia: A Nucleus for a Web of Open Data," The Semantic Web</article-title>
          , pp.
          <volume>722</volume>
          {
          <issue>735</issue>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>B.</given-names>
            <surname>Vatant</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Wick</surname>
          </string-name>
          , \Geonames Ontology,
          <article-title>" GeoNames, Accessed</article-title>
          , vol.
          <volume>6</volume>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>H.</given-names>
            <surname>Packer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Samangooei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gibbins</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , \
          <article-title>Event Detection using Twitter and Structured Semantic Query Expansion,"</article-title>
          <source>in CIKM2012 - The First International Workshop on Multimodal Crowd Sensing</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>H.</given-names>
            <surname>Packer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , \
          <article-title>MemoryBook: Generating Narrative from Lifelogs,"</article-title>
          <source>in Hypertext2012 - The Second International Workshop on Narrative and Hypertext Systems</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>E.</given-names>
            <surname>Moxley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kleban</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Manjunath</surname>
          </string-name>
          , \
          <article-title>Spirittagger: a Geo-Aware Tag Suggestion Tool Mined from Flickr,"</article-title>
          <source>in Proceeding of the 1st ACM international Conference on Multimedia Information Retrieval</source>
          ,
          <year>2008</year>
          , pp.
          <volume>24</volume>
          {
          <fpage>30</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. D. G. Lowe, \
          <article-title>Object Recognition from Local Scale-Invariant Features,"</article-title>
          <source>IEEE International Conference on Computer Vision</source>
          , vol.
          <volume>2</volume>
          , p.
          <fpage>1150</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>E.</given-names>
            <surname>Moxley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kleban</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Manjunath</surname>
          </string-name>
          , \
          <article-title>Not all Tags are Created Equal: Learning Flickr Tag Semantics for Global Annotation,"</article-title>
          <source>in IEEE International Conference on Multimedia and Expo</source>
          .,
          <year>2009</year>
          , pp.
          <volume>1452</volume>
          {
          <fpage>1455</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>L.</given-names>
            <surname>Kennedy</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Naaman</surname>
          </string-name>
          , \
          <article-title>Generating Diverse and Representative Image Search Results for Landmarks,"</article-title>
          <source>in Proceeding of the 17th International Conference on World Wide Web</source>
          ,
          <year>2008</year>
          , pp.
          <volume>297</volume>
          {
          <fpage>306</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>M. Maala</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Delteil</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Azough</surname>
          </string-name>
          , \
          <article-title>A conversion process from ickr tags to rdf descriptions,"</article-title>
          <source>in BIS 2007 Workshops</source>
          ,
          <year>2008</year>
          , p.
          <fpage>53</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>H.</given-names>
            <surname>Kawakubo</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Yanai</surname>
          </string-name>
          , \
          <article-title>GeoVisualRank: a Ranking Method of Geotagged Images Considering Visual Similarity</article-title>
          and
          <string-name>
            <surname>Geo-Location</surname>
            <given-names>Proximity</given-names>
          </string-name>
          ,
          <article-title>"</article-title>
          <source>in Proceedings of the 20th International Conference Companion on World Wide Web</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. B. Sigurbjornsson and R. van Zwol,
          <source>\Flickr Tag Recommendation Based on Collective Knowledge," in Proceeding of the 17th International Conference on World Wide Web</source>
          ,
          <year>2008</year>
          , pp.
          <volume>327</volume>
          {
          <fpage>336</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>J. Tang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hong</surname>
            ,
            <given-names>G.-J.</given-names>
          </string-name>
          <string-name>
            <surname>Qi</surname>
          </string-name>
          , and T.-S. Chua, \
          <article-title>Inferring Semantic Concepts from Community-Contributed Images and Noisy Tags,"</article-title>
          <source>in Proceedings of the 17th ACM International Conference on Multimedia</source>
          ,
          <year>2009</year>
          , pp.
          <volume>223</volume>
          {
          <fpage>232</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>J.</given-names>
            <surname>Sivic</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          , \
          <article-title>Video Google: A Text Retrieval Approach to Object Matching in Videos,"</article-title>
          in ICCV,
          <year>October 2003</year>
          , pp.
          <volume>1470</volume>
          {
          <fpage>1477</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Hare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Samangooei</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Dupplaw</surname>
          </string-name>
          , \
          <article-title>OpenIMAJ and ImageTerrier: Java libraries and tools for scalable multimedia analysis and indexing of images,"</article-title>
          <source>in Proceedings of ACM Multimedia</source>
          <year>2011</year>
          ,
          <article-title>ser</article-title>
          .
          <source>MM '11. ACM</source>
          ,
          <year>2011</year>
          , pp.
          <volume>691</volume>
          {
          <fpage>694</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>J. Hare</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Samangooei</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Dupplaw</surname>
            , and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Lewis</surname>
          </string-name>
          , \
          <article-title>Imageterrier: An extensible platform for scalable high-performance image retrieval." in The ACM International Conference on Multimedia Retrieval (ICMR</article-title>
          <year>2012</year>
          ),
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. D. Lowe, \
          <article-title>Distinctive image features from scale-invariant keypoints,"</article-title>
          <source>IJCV</source>
          , vol.
          <volume>60</volume>
          , no.
          <issue>2</issue>
          , pp.
          <volume>91</volume>
          {
          <issue>110</issue>
          ,
          <year>January 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>J. Philbin</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Chum</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Isard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Sivic</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          , \
          <article-title>Object Retrieval with Large Vocabularies and Fast Spatial Matching,"</article-title>
          <source>in CVPR</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>M. J. Huiskes</surname>
            and
            <given-names>M. S.</given-names>
          </string-name>
          <string-name>
            <surname>Lew</surname>
          </string-name>
          , \
          <article-title>The MIR ickr retrieval evaluation,"</article-title>
          <source>in Proceeding of the 1st ACM international conference on Multimedia information retrieval</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20. H.
          <string-name>
            <surname>Jegou</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Douze</surname>
            , and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Schmid</surname>
          </string-name>
          , \
          <article-title>Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search,"</article-title>
          <source>in Proceedings of the 10th European Conference on Computer Vision</source>
          ,
          <year>2008</year>
          , pp.
          <volume>304</volume>
          {
          <fpage>317</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>D.</given-names>
            <surname>Nister</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Stewenius</surname>
          </string-name>
          , \
          <article-title>Scalable recognition with a vocabulary tree,"</article-title>
          <source>in CVPR</source>
          ,
          <year>2006</year>
          , pp.
          <volume>2161</volume>
          {
          <fpage>2168</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Avrithis</surname>
          </string-name>
          , G. Tolias, and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kalantidis</surname>
          </string-name>
          , \
          <article-title>Feature map hashing: Sub-linear indexing of appearance and global geometry,"</article-title>
          <source>in in Proceedings of ACM Multimedia</source>
          , Firenze, Italy,
          <year>October 2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>