<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the ImageCLEFphoto 2007 photographic retrieval task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michael Grubinger</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul Clough</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Allan Hanbury</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Henning MuÄller</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>She±eld University</institution>
          ,
          <addr-line>She±eld</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University and Hospitals of Geneva</institution>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Victoria University</institution>
          ,
          <addr-line>Melbourne</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Vienna University of Technology</institution>
          ,
          <addr-line>Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>ImageCLEFphoto 2007 is the general photographic ad-hoc retrieval task of the ImageCLEF 2007 evaluation campaign and provides both the resources and the framework necessary to perform comparative laboratory-style evaluation of visual information retrieval from generic photographic collections. In 2007, the evaluation objective concentrated on retrieval of lightly annotated images, a new challenge that attracted a large number of submissions: a total of 20 participating groups submitting a record number of 616 system runs. This paper summarises the components used in the benchmark, including the document collection, the search tasks, an analysis of the submissions from participating groups, and results. The participants were provided with a subset of the IAPR TC-12 Benchmark : 20,000 colour photographs and four sets of semi-structured annotations in (1) English, (2) German, (3) Spanish and (4) one set whereby the annotation language had randomly been selected for each of the images. Unlike in 2006, the participants were not allowed to use the semantic description ¯eld in their retrieval approaches. The topics and relevance assessments from 2006 were reused (and updated) to facilitate the comparison of retrieval from fully and lightly annotated images. Some of the ¯ndings for multilingual visual information retrieval from generic collections of lightly annotated photographs include: bilingual retrieval performs as well as monolingual retrieval; the choice of the query language is almost negligible as many of the short captions contain proper nouns; combining concept and content-based retrieval methods as well as using relevance feedback and/or query expansion techniques can signi¯cantly improve retrieval performance; and the retrieval results are similar to those in 2006, despite the limited image annotations in 2007.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>ImageCLEFphoto 2007 provides a system-centered evaluation for multilingual visual information
retrieval from generic photographic collections (i.e. containing everyday real-world photographs
akin to those that can frequently be found in private photographic collections).
1.1</p>
      <sec id="sec-1-1">
        <title>Evaluation Scenario</title>
        <p>The evaluation scenario is similar to the classic TREC1 ad-hoc retrieval task: simulation of the
situation in which a system knows the set of documents to be searched, but cannot anticipate the
particular topic that will be investigated (i.e. topics are not known to the system in advance) [21].
The goal of the simulation is: given an alphanumeric statement (and/or sample images) describing
a user information need, ¯nd as many relevant images as possible from the given collection (with
the query language either being identical or di®erent from that used to describe the images).
1.2</p>
      </sec>
      <sec id="sec-1-2">
        <title>Evaluation Objective 2007</title>
        <p>The objective of ImageCLEFphoto 2007 comprised the evaluation of multilingual visual
information retrieval from a generic collection of lightly annotated photographs (i.e. containing only short
captions such as the title, location, date or additional notes, but without a semantic description
of that particular photograph). This new challenge allows for the investigation of the following
research questions:
² Are traditional text retrieval methods still applicable for such short captions?
² How signi¯cant is the choice of the retrieval language?
² How does the retrieval performance compare to retrieval from collections containing fully
annotated images (ImageCLEFphoto 2006 )?
² Has the general retrieval performance improved in comparison with retrieval from lightly
annotated images (ImageCLEFphoto 2006 )?</p>
        <p>One major goal of ImageCLEFphoto 2007 was to attract more content-based retrieval
approaches as most of the retrieval approaches in previous years had predominately been
conceptbased. The reduced alphanumeric semantic information provided with the image collection should
support this goal as content-based retrieval techniques become more signi¯cant with more and
more reduced image captions.
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Evaluation Architecture</title>
      <p>Similar to ImageCLEFphoto 2006 [4], we generated a subset of the IAPR TC-12 Benchmark
to provide the evaluation resources for ImageCLEFphoto 2007. This section provides more
information on these individual components: the document collection, the query topics, relevance
judgments and performance indicators. More information on the design and implementation of
the IAPR TC-12 Benchmark itself, created under Technical Committee 12 (TC-12) of the
International Association of Pattern Recognition (IAPR2), can be found in [10].
2.1</p>
      <sec id="sec-2-1">
        <title>Document Collection</title>
        <p>The document collection of IAPR TC-12 Benchmark contains 20,000 colour photos taken from
locations around the world and comprises a varying cross-section of still natural images. Figure 2.1
illustrates a number of sample images from a selection of categories.</p>
        <p>1http://trec.nist.gov/
2http://www.iapr.org/</p>
        <p>Sports.</p>
        <p>Landscapes.</p>
        <p>People.</p>
        <p>Animals.</p>
        <p>The majority of images have been provided by viventura3, an independent travel company that
organises adventure and language trips to South America. Travel guides accompany the tourists
and maintain a daily online diary including photographs of trips made and general pictures of each
location including accommodation, facilities and ongoing social projects. The remainder of the
images have been collected by the ¯rst author over the past few years from personal experiences
(e.g. holidays). The collection is publicly available for research purposes and, unlike many existing
photographic collections used to evaluate image retrieval systems, this collection is very general
in content with many di®erent images of similar visual content, but varying illumination, viewing
angle and background. This makes it a challenge for the successful application of techniques
involving visual analysis.</p>
        <p>
          Each image in the collection has a corresponding semi-structured caption consisting of the
following seven ¯elds: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) a unique identi¯er, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) a title, (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) a free-text description of the semantic
and visual contents of the image, (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) notes for additional information, (
          <xref ref-type="bibr" rid="ref5">5</xref>
          ) the provider of the photo
and ¯elds describing (
          <xref ref-type="bibr" rid="ref6">6</xref>
          ) where and (
          <xref ref-type="bibr" rid="ref7">7</xref>
          ) when the photo was taken. Figure 2.1 shows a sample
image with its corresponding English annotation.
        </p>
        <p>
          These annotations are stored in a database, allowing the creation of collection subsets with
respect to a variety of particular parameters (e.g. which caption ¯elds to use). Based on the feedback
from participants of previous evaluation tasks, the following was provided for ImageCLEFphoto
2007 :
² Annotation language: four sets of annotations in (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) English, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) German, (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) Spanish
and (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) one set whereby the annotation language was randomly selected for each of the
images.
² Caption ¯elds: only the ¯elds for the title, location, date and additional notes were
provided. Unlike 2006, the description ¯eld was not made available for retrieval to provide a
more realistic evaluation scenario and to attract more visually oriented retrieval approaches.
completeness - there were no images without annotations as in 2006.
² Annotation completeness: each image caption exhibited the same level of annotation
The participants were granted access to the data set on 16 April 2007 and had about three
weeks to familiarise themselves with the new subset so that they could, for instance, adapt their
existing retrieval scripts to the reduced multilingual annotations or to extract the visual and
textual features of the images and their annotations in order to index the entire collection.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Query Topics</title>
        <p>On 6 May 2007, the participants were given 60 query topics (see Table 1) representing typical
search requests for the generic photographic collection of the IAPR TC-12 Benchmark.
annotated (2007) photographs. The creation of these topics had been based on several factors (see
[9] for detailed information), including:
² the analysis of a log ¯le from online-access to the image collection;
² knowledge of the contents of the image collection;
² various types of linguistic and pictorial attributes;
² the use of geographical constraints;
² the estimated di±culty of the topic.</p>
        <p>Similar to TREC, the query topics were provided as structured statements of user needs which
consist of a title (a short sentence or phrase describing the search request in a few words) and
three sample images that are relevant to that search request. These images were removed from
the test collection and did not form part of the ground-truth in 2007.</p>
        <p>The topic titles were o®ered in 16 languages including English, German, Spanish, Italian,
French, Portuguese, Chinese, Japanese, Russian, Polish, Swedish, Finnish, Norwegian, Danish,
and Dutch, whereby all translations had been provided by at least one native speaker and veri¯ed
by at least another native speaker. The participants only received the topic titles, but not the
narrative descriptions to avoid misunderstandings as they had been misinterpreted by participants
in the past (they only serve to unambiguously de¯ne what constitutes a relevant image or not).
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Relevance Assessments</title>
        <p>Relevance assessments were carried out by the two topic creators4 using a custom-built online tool.
The top 40 results from all submitted runs were used to create image pools giving an average of
2,299 images (max: 3237; min: 1513) to judge per topic.</p>
        <p>
          The topic creators judged all images in the topic pools and also used interactive search and
judge (ISJ) to supplement the pools with further relevant images. The assessments were based on
a ternary classi¯cation scheme: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) relevant, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) partially relevant, and (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) not relevant. Based on
these judgments, only those images judged relevant by both assessors were considered for the sets
of relevant images (qrels).
        </p>
        <p>Finally, these qrels were complemented with the relevant images found at ImageCLEFphoto
2006 in order to avoid missing out on relevant images not found this year due to the reduced
captions.
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Result Generation</title>
        <p>Once the relevance judgments were completed, we were able to evaluate the performance of the
individual systems and approaches (the deadline for this result generation process was 15 July
2007). The results for submitted runs were computed using the latest version of trec eval5.</p>
        <p>The submissions were evaluated using uninterpolated (arithmetic) mean average precisions
(MAP) and precision at rank 20 (P20) because most online image retrieval engines like Google,
Yahoo! and Altavista display 20 images by default. Further measures considered include geometric
mean average precision (GMAP) to test system robustness, and the binary preference (bpref)
measure which is a good indicator for the completeness of relevance judgments.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Participation and Submission Overview</title>
      <p>ImageCLEFphoto 2007 saw the registration of 32 groups (4 less than in 2006), with 20 of them
eventually submitting a record number of 616 runs (all of which were evaluated). This is a drastic
increase in comparison to previous years (12 groups submitting 157 runs in 2006, and 11 groups
349 runs in 2005 respectively).</p>
      <p>Table 2 provides an overview of the participating groups, the corresponding number of
submitted runs and the references of the working papers in which the participants describe their retrieval
approaches. The 20 groups are from 20 di®erent institutions in 16 countries, with one institution
(Concordia University) sending two separate groups (CINDI, CLAC), while DCU and UTA joined
forces and submitted as one participating group. New participants submitting in 2007 include
Budapest, CLAC, UTA, NTU (Hongkong), ImpColl, INAOE, RUG, SIG and XRCE.</p>
      <p>The increasing participation at ImageCLEFphoto might be an indicator for the growing need
for evaluation of visual information retrieval from generic photographic collections and the global
interest of researchers world-wide to participate in evaluation events such as ImageCLEFphoto.
4One of the topic generators is a member of the viventura travel company.
5http://trec.nist.gov/trec_eval/trec_eval.7.3.tar.gz</p>
      <p>Group ID
Alicante
Berkeley
Budapest
CINDI
CLAC
CUT
DCU-UTA
GE
HongKong
ImpColl
INAOE
IPAL
Miracle
NII
RUG
RWTH
SIG
SINAI
Taiwan
XRCE</p>
      <p>Institution
University of Alicante, Spain
University of California, Berkeley, USA
Hungarian Academy of Sciences, Budapest, Hungary
Concordia University, Montreal, Canada
Concordia University, Montreal, Canada
Technical University Chemnitz, Germany
Dublin City University, Dublin, Ireland
&amp; University of Tampere, Finland
University and Hospitals of Geneva, Switzerland
Nanyang Technological University, Hong Kong
Imperial College, London, UK
INAOE, Puebla, Mexico
IPAL, Singapore
Daedalus University, Madrid, Spain
National Institute of Informatics, Tokyo, Japan
University of Groningen, The Netherlands
RWTH Aachen University, Germany
Universite Paul Sabatier, Toulouse, France
University of Ja¶en, Ja¶en, Spain
National Taiwan University, Taipei, Taiwan
Cross-Content Analytics, Meylan, France
Further, the number of runs per participating group has dramatically risen as well, with
participants submitting an average of 30.8 runs in 2007 (13.1 runs in 2006). However, this may rather
be attributed to the fact that four sets of annotations were o®ered (compared to two in 2007) and
that the participants were allowed to submit as many system runs as they desired.
3.1</p>
      <sec id="sec-3-1">
        <title>Submission Overview by Retrieval Dimensions</title>
        <p>Overall, 616 runs were submitted and categorised with respect to the following dimensions: query
and annotation language, run type (automatic or manual), use of relevance feedback or automatic
query expansion, and modality (text only, image only or combined).</p>
        <p>Dimension
Query Mode
Annotation Language
Modality
Query Manipulation
Run Type</p>
        <p>Type
bilingual
monolingual
visual
English
German
Spanish
Random
none
Text Only
Mixed (Text &amp; Image)
Image Only
none
Relevance Feedback
Query Expansion
Feedback and Expansion
Manual
Automatic</p>
        <p>
          Data 2007
234 (
          <xref ref-type="bibr" rid="ref8">8</xref>
          )
187 (
          <xref ref-type="bibr" rid="ref17">17</xref>
          )
53 (
          <xref ref-type="bibr" rid="ref12">12</xref>
          )
271 (
          <xref ref-type="bibr" rid="ref17">17</xref>
          )
83 (
          <xref ref-type="bibr" rid="ref7">7</xref>
          )
33 (
          <xref ref-type="bibr" rid="ref7">7</xref>
          )
32 (
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
52 (
          <xref ref-type="bibr" rid="ref12">12</xref>
          )
167 (
          <xref ref-type="bibr" rid="ref15">15</xref>
          )
255 (
          <xref ref-type="bibr" rid="ref13">13</xref>
          )
52 (
          <xref ref-type="bibr" rid="ref12">12</xref>
          )
148 (
          <xref ref-type="bibr" rid="ref19">19</xref>
          )
204 (
          <xref ref-type="bibr" rid="ref9">9</xref>
          )
76 (
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
46 (
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
19 (
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
455 (
          <xref ref-type="bibr" rid="ref19">19</xref>
          )
        </p>
        <p>
          Data 2006
78 (
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
64 (
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
137 (
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
        </p>
        <p>
          5 (
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
121 (
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
21 (
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
131 (
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
11 (
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
142 (
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
        </p>
        <p>
          Total
312 (
          <xref ref-type="bibr" rid="ref8">8</xref>
          )
251 (
          <xref ref-type="bibr" rid="ref18">18</xref>
          )
53 (
          <xref ref-type="bibr" rid="ref12">12</xref>
          )
408 (
          <xref ref-type="bibr" rid="ref18">18</xref>
          )
88 (
          <xref ref-type="bibr" rid="ref8">8</xref>
          )
33 (
          <xref ref-type="bibr" rid="ref7">7</xref>
          )
32 (
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
52 (
          <xref ref-type="bibr" rid="ref12">12</xref>
          )
288 (
          <xref ref-type="bibr" rid="ref15">15</xref>
          )
276 (
          <xref ref-type="bibr" rid="ref13">13</xref>
          )
52 (
          <xref ref-type="bibr" rid="ref12">12</xref>
          )
279 (
          <xref ref-type="bibr" rid="ref19">19</xref>
          )
204 (
          <xref ref-type="bibr" rid="ref9">9</xref>
          )
76 (
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
57 (
          <xref ref-type="bibr" rid="ref6">6</xref>
          )
19 (
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
597 (
          <xref ref-type="bibr" rid="ref19">19</xref>
          )
        </p>
        <p>Table 3 provides an overview of all submitted runs according to these dimensions (with the
number of groups in parenthesis). Most submissions (91.6%) used the image annotations, with 8
groups submitting a total of 312 bilingual runs and 18 groups a total of 251 monolingual runs;
15 groups experimented with purely concept-based (textual) approaches (288 runs), 13 groups
investigated the combination of content-based (visual) and concept-based features (276 runs), while
a total of 12 groups submitted 52 purely content-based runs, a dramatic increase in comparison
with previous events (in 2006, only 3 groups had submitted a total of 12 visual runs). Furthermore,
53.4% of all retrieval approaches involved the use of image retrieval (31% in 2006).</p>
        <p>Based on all submitted runs, 50.6% were bilingual (59% in 2006), 54.7% of runs used query
expansion and pseudo-relevance feedback techniques (or both) to further improve retrieval results
(46% in 2006), and most runs were automatic (i.e. involving no human intervention); only 3.1%
of the runs submitted were manual.</p>
        <p>Two participating groups made use of additional data (i.e. the description ¯eld and the qrels)
from ImageCLEFphoto 2006. Although all these runs were evaluated (indicated by \Data 2006"),
they were not considered for the system performance analysis and retrieval evaluation described
in Section 4.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Submission Overview by Languages</title>
        <p>The expanded multilingual character of the evaluation environment also yielded an increased
number of bilingual retrieval experiments: while only four query languages (French, Italian,
Japanese, Chinese) had been used in 10 or more bilingual runs in 2006, a total of 13 languages
were used to start retrieval approaches in 10 or more runs in 2007. The most popular languages
this year were German (43 runs), French (43 runs) and English (35 runs). Surprisingly, 26.5% of
the bilingual experiments used a Scandinavian language to start the retrieval approach: Swedish
(32 runs), Norwegian (18 runs) and Danish (12 runs) { none of these languages had been used
in 2006. It is also interesting to note that Asian languages (18.6% of bilingual runs) were almost
exclusively used for retrieval from English annotations (only one run experimented with the
German annotations), which might indicate a lack of translation resources from Asian to European
languages other than English.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>This section provides an overview of the system results with respect to query and annotation
languages as well as other submission dimensions such as query mode, retrieval modality and the
involvement of relevance feedback or query expansion techniques.</p>
      <p>Although the description ¯elds were not provided with the image annotations, the absolute
retrieval results achieved by the systems were not much lower compared to those in 2006 when
the entire annotation was used. We attribute this to the fact that more than 50% of the groups
had participated at ImageCLEF before, improved retrieval algorithms (not only of returning
participants), and the increased use of content-based retrieval approaches.
4.1</p>
      <sec id="sec-4-1">
        <title>Results by Language</title>
        <p>missions from CUT, DCU, NTU (Taiwan) and INAOE dominate the results (see participants'
workshop papers for further information about their runs). As in previous years, the highest
English monolingual run slightly outperforms the highest German and Spanish monolingual runs
(MAPs are 22.9% and 12.1% lower).</p>
        <p>The highest bilingual to English run (German { English) performed with a MAP of 91.3% of
the highest monolingual run MAP, with the highest bilingual run in most other query languages
such as Portuguese, Spanish, Russian, Italian, Chinese, French and Japanese all exhibiting at least
80% of that highest monolingual English run. Hence, there is no longer much di®erence between
monolingual and bilingual retrieval, indicating a signi¯cant progress of the translation and retrieval
methods using these languages. Moreover, the highest bilingual to Spanish run (English { Spanish)
had a MAP of 99.2% of the highest monolingual Spanish run, while the highest bilingual to German
run (English { German) even outperformed the highest German monolingual run MAP by 13.3%.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Results by Query Mode</title>
        <p>This trend is not only true for the highest runs per language pair, but also for all submissions and
across several performance indicators. Table 6 illustrates the average scores across all system runs
(and the standard deviations in parenthesis) with respect to monolingual, bilingual and purely
visual retrieval.</p>
        <p>Query Mode
Monolingual
Bilingual
Visual</p>
        <p>
          Again, monolingual and bilingual retrieval are almost identical, and so are the average results
for monolingual Spanish, English and German retrieval (see Table 7): Spanish shows the highest
average MAP and BPREF values, while German exhibits the highest average for P(
          <xref ref-type="bibr" rid="ref20">20</xref>
          ) and English
for GMAP.
        </p>
        <p>Annotation
Spanish
English
German</p>
        <p>Across all submissions, the average values for bilingual retrieval from English and German
annotations are even slightly higher than those for monolingual retrieval (see Table 8), while
bilingual retrieval from Spanish annotations and from annotations with a randomly selected language
does not lag far behind.</p>
        <p>Annotation
English
German
Spanish
Random
None</p>
        <p>
          These results indicate that the query language does not play a major factor for visual
information retrieval for lightly annotated images. We attribute this (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) to the high quality of the
state-of-the-art translation techniques, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) to the fact that such translations implicitly expand the
query terms (similar to query expansion using a thesaurus) and (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) to the short image captions
used (as many of them are proper nouns which are often not even translated).
4.3
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Results by Retrieval Modality</title>
        <p>In 2006, the system results had shown that combining visual features from the image and semantic
knowledge derived from the captions o®ered optimum performance for retrieval from a generic
photographic collection with fully annotated images.</p>
        <p>As indicated in Table 9, the results of ImageCLEFphoto 2007 show that this also applies
for retrieval from generic photographic collections with lightly annotated images: on average,
combining visual features from the image and semantic information from the annotations gave a
24% improvement of the MAP over retrieval based solely on text.</p>
        <p>Purely content-based approaches still lag behind, but the average MAP for retrieval solely
based on image features shows an improvement of 65.8% compared to the average MAP in 2006.</p>
        <p>MAP
0.1487 (0.0655)
0.1199 (0.0404)
0.0681 (0.0385)</p>
        <p>While the use of query expansion does not necessarily seem to dramatically improve retrieval
results for retrieval from lightly annotated images (average MAP only 2.1% higher), relevance
feedback (typically in the form of query expansion based on pseudo relevance feedback) appeared
to work well on short captions (average MAP 19.9% higher), with a combination of query expansion
and relevance feedback techniques yielding results almost twice as good as without any of these
techniques (average MAP 99.5% higher).
This paper reported on ImageCLEFphoto 2007, the general photographic ad-hoc retrieval task
of the ImageCLEF 2007 evaluation campaign. Its evaluation objective concentrated on visual
information retrieval from generic collections of lightly annotated images, a new challenge that
attracted a large number of submissions: 20 participating groups submitted a total of 616 system
runs.</p>
        <p>
          The participants were provided with a subset of the IAPR TC-12 Benchmark : 20,000 colour
photographs and four sets of semi-structured annotations in (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) English, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) German, (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) Spanish
and (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) one set whereby the annotation language was randomly selected for each of the images.
Unlike in 2006, the participants were not allowed to use the semantic description ¯eld in their
retrieval approaches. The topics and relevance assessments from 2006 were reused (and updated)
to facilitate the comparison of retrieval from fully and lightly annotated images.
        </p>
        <p>The nature of the task also attracted a larger number of participants experimenting with
content-based retrieval techniques, and hence the retrieval results were similar to those in 2006,
despite the limited image annotations in 2007. Other ¯ndings for multilingual visual information
retrieval from generic collections of lightly annotated photographs include:
² bilingual retrieval performs as well as monolingual retrieval;
² the choice of the query language is almost negligible as many of the short captions contain
proper nouns;
² combining concept and content-based retrieval methods as well as using relevance feedback
and/or query expansion techniques can signi¯cantly improve retrieval performance;
ImageCLEFphoto will continue to provide resources to the retrieval and computational vision
communities to facilitate standardised laboratory-style testing of image retrieval systems. While
these resources have predominately been used by systems applying a concept-based retrieval
approach thus far, the rapid increase of participants using content-based retrieval techniques at
ImageCLEFphoto calls for a more suitable evaluation environment for visual approaches (e.g. the
preparation of training data). For ImageCLEFphoto 2008, we are planning to create new topics
and will therefore be able to provide this year's topics and qrels as training data for next year.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] Andr¶as Benczu¶r, Istv¶an Bir¶o, M¶atya¶s Brendel, K¶aroly Csalog¶any, Dar¶oczy B¶alint, and D¶avid Siklo¶si. Cross-modal retrieval by text and image feature biclustering</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Yih-Chen Chang</surname>
          </string-name>
          and
          <string-name>
            <surname>Hsin-Hsi Chen</surname>
          </string-name>
          .
          <article-title>Experiment for Using Web Information to do Query and Document Expansion</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Stephane</given-names>
            <surname>Clinchant</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jean-Michel Renders</surname>
            , and
            <given-names>Gabriela</given-names>
          </string-name>
          <string-name>
            <surname>Csurka</surname>
          </string-name>
          .
          <article-title>XRCE's Participation to ImageCLEFphoto 2007</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Paul</given-names>
            <surname>David Clough</surname>
          </string-name>
          , Michael Grubinger, Thomas Deselaers, Allan Hanbury, and
          <article-title>Henning MuÄller. Overview of the ImageCLEF 2006 photographic retrieval and object annotation tasks. In Evaluation of Multilingual and Multi-modal Information Retrieval: Seventh Workshop of the Cross-Language Evaluation Forum (CLEF</article-title>
          <year>2006</year>
          ), Lecture Notes in Computer Science (LNCS), Alicante, Spain,
          <source>September</source>
          <volume>19</volume>
          {
          <fpage>21</fpage>
          2006. Springer. (in press).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Deselaers</surname>
          </string-name>
          , Tobias Gass, Tobias Weyand, and Hermann Ney.
          <article-title>FIRE in ImageCLEF 2007</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>M.C. D¶</surname>
          </string-name>
          <article-title>³az-</article-title>
          <string-name>
            <surname>Galiano</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          <string-name>
            <surname>Garc</surname>
          </string-name>
          <article-title>¶³a-</article-title>
          <string-name>
            <surname>Cumbreras</surname>
          </string-name>
          , M.T Mart¶
          <article-title>³n-</article-title>
          <string-name>
            <surname>Valvidia</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Montejo-Raez</surname>
            , and
            <given-names>L.A.</given-names>
          </string-name>
          <string-name>
            <surname>Uren</surname>
          </string-name>
          <article-title>~a L¶opez</article-title>
          .
          <source>SINAI at ImageCLEF 2007. In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Osama</given-names>
            <surname>El</surname>
          </string-name>
          <string-name>
            <surname>Demerdash</surname>
          </string-name>
          , Leila Kosseim, and
          <string-name>
            <given-names>Sabine</given-names>
            <surname>Bergler</surname>
          </string-name>
          .
          <article-title>Experiments with Clustering the Collection at ImageCLEF 2007</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Sheng</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jean-Pierre</surname>
            <given-names>Chevallet</given-names>
          </string-name>
          , Thi Hoang Diem Le,
          <article-title>Trong Ton Pham, and Joo Hwee Lim</article-title>
          .
          <article-title>IPAL at ImageCLEF 2007 Mixing Features, Models and Knowledge</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Grubinger</surname>
          </string-name>
          .
          <article-title>On the Creation of Query Topics for ImageCLEFphoto</article-title>
          . In Third MUSCLE / ImageCLEF Workshop on Image and Video Retrieval Evaluation, Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Grubinger</surname>
          </string-name>
          , Paul David Clough, Henning MuÄller, and Thomas Deselears.
          <article-title>The IAPR{ TC12 Benchmark: A New Evaluation Resource for Visual Information Systems</article-title>
          . In International Workshop OntoImage'
          <year>2006</year>
          <article-title>Language Resources for Content-Based Image Retrieval, held in conjunction with LREC'06</article-title>
          , pages
          <fpage>13</fpage>
          {
          <fpage>23</fpage>
          ,
          <string-name>
            <surname>Genoa</surname>
          </string-name>
          , Italy, May 22
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Steven</surname>
            <given-names>C. H.</given-names>
          </string-name>
          <string-name>
            <surname>Hoi</surname>
          </string-name>
          .
          <article-title>Cross-Language and Cross-Media Image Retrieval: An Empirical Study at ImageCLEF 2007</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Hugo</given-names>
            <surname>Jair</surname>
          </string-name>
          <string-name>
            <surname>Escalante</surname>
          </string-name>
          , Carlos A. Hern¶andez, Aurelio L¶opez, Heidy M. Mar¶³n, Manuel Montes y Go¶mez, Eduardo Morales,
          <string-name>
            <given-names>Luis E.</given-names>
            <surname>Sucar</surname>
          </string-name>
          , and
          <article-title>Luis Villasen~or. TIA-INAOE's Participation at ImageCLEF 2007</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>A. JÄarvelin</surname>
            , P. Wilkins,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Adamek</surname>
            , E. Airio,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Sormunen</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.F.</given-names>
            <surname>Smeaton</surname>
          </string-name>
          .
          <article-title>DCU and UTA at Photographic ImageCLEF 2007</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Ray</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Larson</surname>
          </string-name>
          .
          <article-title>Linked Relevance Feedback for the ImageCLEF Photo Task</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Jo</surname>
          </string-name>
          <article-title>~ao Magalh~aes, Simon Overell, and Stefan RuÄger. Exploring Image, Text and Geographic Evidences in ImageCLEF 2007</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Sergio</surname>
            <given-names>Navarro</given-names>
          </string-name>
          , Fernando Llopis, Rafael Mun~oz Guillena, and
          <string-name>
            <given-names>Elisa</given-names>
            <surname>Noguera</surname>
          </string-name>
          .
          <article-title>Information retrieval of visual descriptions with IR-n system based on passages</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>M.M. Rahman</surname>
          </string-name>
          , Bipin C.
          <article-title>Desai, and Prabir Bhattacharya. Multi-Modal Interactive Approach to ImageCLEF 2007 Photographic and Medical Retrieval by CINDI</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Mouna</surname>
            <given-names>Torjmen</given-names>
          </string-name>
          , Karen Pinel-Sauvagnat, and
          <string-name>
            <given-names>Mohand</given-names>
            <surname>Boughanem</surname>
          </string-name>
          .
          <article-title>Using pseudo-relevance feedback to improve image retrieval results</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Florence</given-names>
            <surname>Tushabe</surname>
          </string-name>
          and
          <string-name>
            <given-names>Michael H. F.</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          .
          <article-title>Content-Based Image Retrieval Using ShapeSize Pattern Spectra</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Julio</given-names>
            <surname>Villena-Rom</surname>
          </string-name>
          ¶
          <article-title>an, Sara Lana-Serrano, Jos¶e-Luis Mart¶³nez-Fern¶andez, and Jos¶e Carlos Gonz¶alez-Cristo¶bal</article-title>
          . MIRACLE at ImageCLEFphoto 2007:
          <article-title>Evaluation of Merging Strategies for Multilingual and Multimedia Information Retrieval</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Ellen</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Voorhees</surname>
            and
            <given-names>Donna</given-names>
          </string-name>
          <string-name>
            <surname>Harmann</surname>
          </string-name>
          .
          <article-title>Overview of the Seventh Text REtrieval Conference (TREC{7)</article-title>
          .
          <source>In The Seventh Text Retrieval Conference</source>
          , pages
          <volume>1</volume>
          {
          <fpage>23</fpage>
          ,
          <string-name>
            <surname>Gaithersburg</surname>
            ,
            <given-names>MD</given-names>
          </string-name>
          , USA,
          <year>November 1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Thomas</surname>
            <given-names>Wilhelm</given-names>
          </string-name>
          , Jens KuÄrsten, and Maximilian Eibl.
          <article-title>Experiments for the ImageCLEF 2007 Photographic Retrieval Task</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Xin</surname>
            <given-names>Zhou</given-names>
          </string-name>
          , Julien Gobeill, Patrick Ruch, and Henning MuÄller. University and Hospitals of Geneva at ImageCLEF 2007.
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>