<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The CLEF 2005 Cross-Language Image Retrieval Track</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Paul Clough</string-name>
          <email>p.d.clough@sheffield.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Henning Mueller</string-name>
          <email>henning.mueller@sim.hcuge.ch</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Deselaers</string-name>
          <email>deselaers@cs.rwth-aachen.de</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Grubinger</string-name>
          <email>michael.grubinger@research.vu.edu.au</email>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Lehmann</string-name>
          <email>lehmann@computer.org</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Je ery Jensen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>William Hersh</string-name>
          <email>hersh@ohsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Biomedical Informatics, Oregon Health and Science University</institution>
          ,
          <addr-line>Portland, Oregon</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Information Studies, She eld University</institution>
          ,
          <addr-line>She eld</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Medical Informatics, Medical Faculty, Aachen University of Technology (RWTH)</institution>
          ,
          <addr-line>Pauwelsstr. 30, Aachen D-52057</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Lehrstuhl fur Informatik VI, Computer Science Department, RWTH Aachen University</institution>
          ,
          <addr-line>D-52056 Aachen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Medical Informatics Service, Geneva University and Hospitals</institution>
          ,
          <addr-line>Geneva</addr-line>
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>School of Computer Science and Mathematics, Victoria University</institution>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff6">
          <label>6</label>
          <institution>University of Amsterdam, Informatics department</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The purpose of this paper is to outline e orts from the 2005 CLEF crosslanguage image retrieval campaign (ImageCLEF). The aim of this CLEF track is to explore the use of both text and content{based retrieval methods for cross{language image retrieval. Four tasks were o ered in the ImageCLEF track: a ad{hoc retrieval from an historic photographic collection, ad{hoc retrieval from a medical collection, an automatic image annotation task, and a user{centered (interactive) evaluation task that is explained in the iCLEF summary. 24 research groups from a variety of backgrounds and nationalities (14 countries) participated in ImageCLEF. In this paper we describe the ImageCLEF tasks, submissions from participating groups and summarise the main ndings.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        ImageCLEF7 conducts evaluation of cross{language image retrieval and is run as part of the
Cross Language Evaluation Forum (CLEF) campaign. The ImageCLEF retrieval benchmark was
established in 2003 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and run again in 2004 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] with the aim of evaluating image retrieval from
multilingual document collections. Images by their very nature are language independent, but
often they are accompanied by texts semantically related to the image (e.g. textual captions or
metadata). Images can then be retrieved using primitive features based on pixels which form the
contents of an image (e.g. using a visual exemplar), abstracted features expressed through text
or a combination of both. The language used to express the associated texts or textual queries
should not a ect retrieval, i.e. an image with a caption written in English should be searchable in
languages other than English.
      </p>
      <p>ImageCLEF provides tasks for both system{centered and user{centered retrieval evaluation
within two main areas: retrieval of images from photographic collections and retrieval of images
from medical collections. These domains o er realistic scenarios in which to test the performance
of image retrieval systems, o ering di erent challenges and problems to participating research
groups. A major goal of ImageCLEF is to investigate the e ectiveness of combining text and
image for retrieval and promote the exchange of ideas which may help improve the performance
of future image retrieval systems.</p>
      <p>ImageCLEF has already seen participation from both academic and commercial research groups
worldwide from communities including: Cross{Language Information Retrieval (CLIR), Content{
Based Image Retrieval (CBIR), medical information retrieval and user interaction. We provide
7 See http://ir.shef.ac.uk/imageclef/
participants with the following: image collections, representative search requests (expressed by
both image and text) and relevance judgements indicating which images are relevant to each search
request. Campaigns such as CLEF and TREC have proven invaluable in providing standardised
resources for comparative evaluation for a range of retrieval tasks and ImageCLEF aims to provide
the research community with similar resources for image retrieval. In the following sections of this
paper we describe separately each search task: section 2 describes ad{hoc retrieval from historic
photographs, section 3 ad{hoc retrieval from medical images, section sec:annotation the automatic
annotation of medical images and. For each we brie y describe the test collections, the search tasks,
participating research groups, results and a summary of the main ndings.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Ad{hoc Retrieval from Historic Photographs</title>
      <sec id="sec-2-1">
        <title>Aims and Objectives</title>
        <p>
          This is a bilingual ad{hoc retrieval task in which a system is expected to match a user's one{time
query against a more or less static collection (i.e. the set of documents to be searched is known
prior to retrieval, but the search requests are not). Similar to the task run in previous years (see,
e.g. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]), the goal of this task is given multilingual text queries, retrieve as many relevant images as
possible from the provided image collection (the St. Andrews collection of historic photographs).
Queries for images based on abstract concepts rather than visual features are predominant in this
task. This limits the e ectiveness of using visual retrieval methods alone as either these concepts
cannot be extracted using visual features and require extra external semantic knowledge (e.g. the
name of the photographer), or images with di erent visual properties may be relevant to a search
request (e.g. di erent views of Rome). However, based on feedback from participants in 2004, the
search tasks for 2005 are aimed to re ect more visually{based queries.
        </p>
        <p>Short title: Rev William Swan.</p>
        <p>Long title: Rev William Swan.</p>
        <p>Location: Fife, Scotland
Description: Seated, 3/ 4 face studio portrait of a man.</p>
        <p>Date: ca.1850
Photographer: Thomas Rodger
Categories: [ ministers ][ identified male ][ dress - clerical ]
Notes: ALB6-85-2 jf/ pcBIOG: Rev William Swan ( ) ADD: Former
owners of album: A Govan then J J? Lowson. Individuals and other
subjects indicative of St Andrews provenance. By T. R. as identi ed
by Karen A. Johnstone " Thomas Rodger 1832-1883. A biography
and catalogue of selected works".
The St. Andrews collection consists of 28,133 images, all of which have associated textual captions
written in British English (the target language). The captions consist of 8 elds including title,
photographer, location, date and one or more pre{de ned categories (all manually assigned by
domain experts). For example, see Fig. 1. Further examples can be found in [?] and the St.
Andrews University Library8. We provided participants with 28 topics (titles shown in Table 11
and an example image shown in Fig. 5), the main themes based on analysis of log les from
a web server at St. Andrews university, knowledge of the image collection and discussions with
maintainers of the image collection. After identifying these main themes, we modi ed queries
to test various aspects of cross-language and visual search and used a custom{built IR system
to identify suitable topics (in particular those topics with an estimated 20 and above relevant</p>
        <sec id="sec-2-1-1">
          <title>8 http://www-library.st-andrews.ac.uk/</title>
          <p>
            images). A complexity score was developed by the authors to categorise topics with respect to
linguistic complexity [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ].
          </p>
          <p>Each topic consists of a title (a short sentence or phrase describing the search request in a few
words), and a narrative (a description of what constitutes a relevant or non{relevant image for that
search request). In addition to the text description for each topic, we also provided two example
images which we envisage could be used for relevance feedback (both manual and automatic)
and query{by{example searches9. Both topic title and narratives have been translated into the
following languages: German, French, Italian, Spanish (European), Spanish (Latin American),
Chinese (Simpli ed), Chinese (Traditional) and Japanese. Translations have also been produced
for the titles only and these are available in 25 languages including: Russian, Croatian, Bulgarian,
Hebrew and Norwegian. All translations have been provided by native speakers and veri ed by at
least one other native speaker.
2.3</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Creating Relevance Assessments</title>
        <p>Relevance assessments were performed by sta at the University of She eld (the majority
unfamiliar with the St. Andrews collection but given training and access to the collection through
our IR system). The top 50 results from all submitted runs (349) were used to create image pools
giving an average of 1,376 (max: 2,193 and min: 760) images to judge per topic. The authors
judged all topics to create a \gold standard" and at least two further assessments were obtained
for each topic. Assessors used a custom{built tool to make judgements accessible on{line enabling
them to log in when and where convenient. We asked assessors to judge every image in the topic
pool, but also to use interactive search and judge: searching the collection using their own queries
to supplement the image pools with further relevant.</p>
        <p>The assessment of images in this ImageCLEF task is based on using a ternary classi cation
scheme: (1) relevant, (2) partially relevant and (3) not relevant. The aim of the ternary scheme is
to help assessors in making their relevance judgements more accurate (e.g. an image is de nitely
relevant in some way, but maybe the query object is not directly in the foreground: it is therefore
considered partially relevant). Relevance assessment for the more general topics are based entirely
on the visual content of images (e.g. \aircraft on the ground"). However, certain topics also require
the use of the caption to make a con dent decision (e.g. "pictures of North Street St Andrews").
What constitutes a relevant image is a subjective decision, but typically a relevant image will have
the subject of the topic in the foreground, the image will not be too dark in contrast, and maybe
the caption con rms the judge's decision.</p>
        <p>Based on these judgements, various combinations are used to create the set of relevant images
and as in previous years, we used the pisec-total set: those images judges as relevant or partially{
relevant by the topic creator and at least one other assessor. These are then used to evaluate system
performance and compare submissions. The size of pools and number of relevant images is shown
in Table 11 (the %max indicating the pool size compared to the maximum possible pool size, i.e.
if all top 50 images from each submission were unique).
2.4</p>
      </sec>
      <sec id="sec-2-3">
        <title>Participating Groups</title>
        <p>In total, 19 groups registered for this task and 11 ended up submitting (including 5 new groups
compared to last year) a total of 349 runs (all of which were evaluated). Participants were given
queries and relevance judgements from 2004 as training data and access to a default CBIR system
(GIFT/Viper). Submissions from participants are brie y described in the following.
CEA: CEA from France, submitted 9 runs. Experimented with 4 languages, title and title+narrative,
and merging between modalities (text and image). This is simply based on normalised scores obtained by
each search and is conservative (results obtained using visual topics and CBIR system are used only to
reorder results obtained using textual topics)</p>
        <sec id="sec-2-3-1">
          <title>9 See http://ir.shef.ac.uk/imageclef2005/adhoc.htm for an example</title>
          <p>NII: National Institute of Informatics from Japan, submitted 16 runs with 3 languages. These experiments
were aimed to see if the inclusion of learned word association model - the model which represents how
words are related - can help nding relevant images in adhoc CLIR setting. To do this, basic unigram
language models were combined with di erently estimated word association models that performs soft
word-expansion. Also, combining simple keyword matching-like language models to above mentioned soft
word-expansion language models at the model-output level. All runs were text only.</p>
          <p>Alicante: University of Alicante (Computer Science) from Spain, submitted 62 runs (including 10 joint
runs with UNED and Jaen). They experimented with 13 languages using title, automatic query expansion
and text only. Their system combines probabilistic information with ontological information and a feedback
technique. Several information streams are created using di erent sources: stems, words and stem bigrams,
the nal result obtained by combining them. An ontology has been created automatically from the St.
Andrews collection to relate a query with several image categories. Four experiments were carried out
to analyse how di erent features contribute to retrieval results. Moreover, a voting-based strategy was
developed joining three di erent systems of participating universities: University of Alicante, University
of Jaen and UNED.</p>
          <p>CUHK: Chinese University of Hong Kong, submitted 36 runs for English and Chinese (simpli ed). CUHK
experimented with title, title+narrative and using visual methods to rerank search results (visual features
are composed of two parts: DCT coe cients and Colour moments with a dimension of 9). Various IR
models used for retrieval (trained on 2004 data), together with query expansion. LDC Chinese segmentor
is used to extract words from Chinese queries and translated into English using a dictionary.
DCU: Dublin City University (Computer Science) from Ireland, submitted 33 runs for 11 languages.
All runs were automatic using title only. Standard OKAPI used incorporating stop word removal, su x
stripping and query expansion using pseudo relevance feedback. Their main focus of participation was to
explore an alternative approach to combining text and image retrieval in an attempt to make use of
information provided by the query image. Separate ranked lists returned using text retrieval without feedback
and image retrieval based on standard low-level colour, edge and texture features, were investigated to
nd documents returned by both methods. These documents were then assumed to be relevant and used
for text based pseudo relevance feedback and retrieval as in our standard method.</p>
          <p>Geneva: University Hospitals Geneva from Switzerland, submitted 2 runs based on visual retrieval only
(automatic and no feedback).</p>
          <p>Indonesia: University of Indonesia (Computer Science), submitted 9 runs using Indonesian queries only.
They experimented with using title and title+narrative, with and without query expansion and combining
text and image retrieval (all runs automatic).</p>
          <p>MIRACLE: Daedalus and Madrid University from Spain, submitted 106 runs for 23 languages. All runs
were automatic, using title only, no feedback and text-based only.</p>
          <p>NTU: National Taiwan University from Taiwan, submitted 7 runs for Chinese (traditional) and
English (also included a visual-only run). All runs are automatic and NTU experimented with using query
expansion, using title and title+narrative and combining visual and text retrieval.</p>
          <p>Jaen: University of Jaen (Intelligent Systems) from Spain, submitted 64 runs in 9 languages (all
automatic). Jaen experimented with title and title+narrative, with and without feedback and combining both
text and visual retrieval. Jaen experimented with both term weighting and the use of pseudo relevance
feedback.</p>
          <p>UNED: UNED from Spain, submitted 5 runs for Spanish (both Latin American and European) and
English. All runs were automatic, title, text only and with feedback. UNED experimented with three di erent
approaches: i) a naive baseline using a word by word translation of the title topics; ii) a strong baseline
based on Pirkola's work; and iii) a structured query using the named entities with eld search operators
and Pirkola's approach.</p>
          <p>Participants were asked to categorise their submissions by the following dimensions: query
language, type (automatic or manual), use of feedback (typically relevance feedback is used for
automatic query expansion), modality (text only, image only or combined) and the initial query
(visual only, title only, narrative only or a combination). A summary of submissions by these
dimensions is shown in Table 1. No manual runs have been submitted this year, and a large
proportion are text only using just the title. Together with 41% of submissions using query expansion,
this co{incides with the large number of query languages o ered this year and the focus on query
translation by participating groups (although 6 groups submitted runs involving CBIR). An
interesting submission this year was the combined e orts of Jaen, UNED and Alicante to create
an approach based on voting for images. Table 2 provides a summary of submissions by query
language. At least one group submitted for each language, the most popular (non-English) being
French, German and Spanish (European).
Results for submitted runs were computed using the latest version of trec eval 10 from NIST
(v7.3). From the scores output, four chosen to evaluate submissions are Mean Average Precision
(MAP), precision at result 10 (P10), precision at result 100 (P100) and the number of relevant
images retrieved (RelRet) from which we compute recall (the proportion of relevant retrieved).
Table 3 summarises the top performing systems in the ad-hoc task based on MAP. Whether MAP
is the best score to rank image retrieval systems is debatable, hence our inclusion of P10 and
P100 scores. The highest English (monolingual) retrieval score is 0.4135, with a P10 of 0.5500 and
P100 of 0.3197. On average recall is high (0.8434), but low MAP and P10 indicating that relevant
images are likely retrieved at lower rank positions. The highest monolingual score is obtained using
combined visual and text retrieval and relevance feedback.</p>
          <p>The highest cross{language MAP is Chinese (traditional) for the NTU submission which is
97% of highest monolingual score. Retrieval performance is variable across language with some
performing poorly, e.g. Romanian, Bulgarian, Czech, Croatian, Finnish and Hungarian. Although
these languages did not have translated narratives available for retrieval, it is more likely low
performance results from limited availability of translation and language processing resources and
di cult language structure (e.g. results from CLEF2004 showed Finnish to be a very challenging
language due to its complex morphology). Hungarian performs the worst at 23% of monolingual.
However, it is encouraging to see participation at CLEF for these languages. On average, MAP
10 http://trec.nist.gov/trec eval/trec eval.7.3.tar.gz
for English is 0.2084 (0.3933 P10; 0.6454 recall) and across all languages is 0.2009 (0.2985 P10;
0.5737 recall) { see Table 4.
The variety of submissions in the ad-hoc task this year has been pleasing with a number of
groups experimenting with both visual and text-based retrieval methods and combining the two
(although the number of runs submitted as combined is much lower than 2004). As in 2004, the
combination of text and visual retrieval appears to give highest retrieval e ectiveness (based on
MAP) indicating this is still an area for research. We aimed to o er a wider range of languages
of which 13 have submissions from at least two groups (compared to 10 in 2004). It would seem
that the focus for many groups in 2005 has been translation with more use made of both title
and narrative than 2004. However, it is interesting to see languages such as Chinese (traditional)
and Spanish (Latin American) perform above European languages such as French, German and
Spanish (European) which performed best in 2004.</p>
          <p>Although topics were designed to be more suited to visual retrieval methods (based on
comments from participants in 2004), the topics are still dominated by semantics and background
knowledge; pure visual similarity still plays a less signi cant role. The current ad-hoc task is not
well-suited to purely visual retrieval because colour information, which typically plays an important
role in CBIR, is ine ective due to the nature of the St. Andrews collection (historic photographs).
Also unlike typical CBIR benchmarks, the images in the St. Andrews collection are very complex
containing both objects in the foreground and background which prove indistinguishable to CBIR
methods. Finally, the relevant image set is visually di erent for some queries (e.g. di erent views of
a city) making visual retrieval methods ine ective. This highlights the importance of using either
text-based IR methods on associated metadata alone, or combined with visual features. Relevance
feedback (in the form of automatic query expansion) still plays an important role in retrieval as
also demonstrated by submissions in 2004: a 17% increase in 2005 and 48% in 2004.</p>
          <p>We are aware that research in the ad-hoc task using the St. Andrews collection has probably
reached a plateau. There are obvious limitations with the existing collection: mainly black and
white images, domain-speci c vocabulary used in associated captions, restricted retrieval scenario
(i.e. searches for historic photographs) and experiments with limited target language (English)
are only possible (i.e. cannot test further bilingual pairs). To address these and widen the image
collections available to ImageCLEF participants, we have been provided with access to a new
collection of images from a personal photographic collection with associated textual descriptions
in German and Spanish (as well as English). This is planned for use in the ImageCLEF 2006
ad-hoc task.
3
3.1</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Ad{hoc Retrieval from Medical Image Collections</title>
      <sec id="sec-3-1">
        <title>Goals and objectives</title>
        <p>
          Domain{speci c information retrieval is getting increasingly important and this holds especially
true for the medical eld, where patients as well as clinicians and researchers have their particular
information needs [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Whereas information needs and retrieval methods for textual documents
have been well researched, there is only a small amount of information available on the need to
search for images [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], and even less so for the use of images in the medical domain. ImageCLEFmed
is creating resources to evaluate information retrieval tasks on medical image collections. This
process includes the creation of image collections, of query tasks, and the de nition of correct
retrieval results for these tasks for system evaluation. Part of the tasks have been based on surveys
of medical professionals and how they use images [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>Much of the basic structure is similar to the non{medical ad{hoc task such as the general
outline, the evaluation procedure and the relevance assessment tool used. These similarities will
not be described in any detail in this section.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Data sets used and query topics</title>
        <p>
          In 2004, only the Casimage11 dataset was made available to participants [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], containing almost
9.000 images of 2.000 cases [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], 26 query topics with relevance judgements of three medical
experts. It is also part of the 2005 collection. Images present in the data set include mostly radiology
modalities, but also photographs, powerpoint slides and illustrations. Cases are mainly in French,
with around 20% being in English. We were also allowed to use the PEIR12 (Pathology Education
Instructional Resource) database using annotation from the HEAL13 project (Health Education
Assets Library, mainly Pathology images [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]). This dataset contains over 33.000 images with
English annotation, with the annotation being in XML per image and not per case as casimage.
The nuclear medicine database of MIR, the Mallinkrodt Institute of Radiology14 [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], was also
made available to us for ImageCLEF. This dataset contains over 2.000 images mainly from nuclear
medicine with annotations per case and in English. Finally, the PathoPic15 collection (Pathology
11 http://www.casimage.com/
12 http://peir.path.uab.edu/
13 http://www.healcentral.com/
14 http://gamma.wustl.edu/home.html
15 http://alf3.urz.unibas.ch/pathopic/intro.htm
images [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]) was included into our dataset. It contains 9.000 images with an extensive
annotation per image in German. Part of the German annotation is translated into English, but it is
still incomplete. This means, that a total of more than 50.000 images was made available with
annotations in three di erent languages. Two collections have case{based annotations whereas
two collections have image image{based annotations. Only through the access to the data by the
copyright holders, we were able to distribute these images to the participating research groups.
        </p>
        <p>The image topics were based on a small survey at OHSU. Based on this survey, the topics were
developed along the following main axes:
{ Anatomic region shown in the image;
{ Image modality (x{ray, CT, MRI, gross pathology, ...);
{ Pathology or disease shown in the image;
{ abnormal visual observation (eg. enlarged heart);
As the goal was clearly to accommodate both visual and textual research groups we developed
a set of 25 topics containing three di erent groups of queries: queries that are expected to be
solvable with a visual retrieval system (topics 1-12), topics where both text and visual features are
expected to perform well (topics 13-23) and semantic topics, where visual features are not expected
to improve results. All query topics were of a higher semantic level than the 2004 topics because
the automatic annotation task provides a testbed for purely visual retrieval/classi cation. All 25
topics contain one to three images, one query also an image as negative feedback. The query text
was given out with the images in the three languages present in the collections: English, German,
and French. An example for a visual query of the rst category can be seen in Figure 2.</p>
        <p>Show me chest CT images with emphysema.</p>
        <p>Zeige mir Lungen CTs mit einem Emphysem.</p>
        <p>Montre{moi des CTs pulmonaires avec un emphyseme.</p>
        <p>A query topic that will require more than purely visual features can be seen in Figure 3.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Relevance judgements</title>
        <p>The relevance assessments were performed at OHSU in Portland, Oregon. A simple interface was
used from previous ImageCLEF relevance assessments. 9 judges, mainly medical doctors and one
image processing specialist performed the relevance judgements. Due to a lack of resources, only
part of the topics could be judged by more than one person.</p>
        <p>To create the image pools for the judgements, the rst 40 images of each submitted run were
taken into account to create pools with an average size of 892 images. The largest pool size was
Show me all x{ray images showing fractures.</p>
        <p>Zeige mir Rontgenbilder mit Bruchen.</p>
        <p>Montres{moi des radiographies avec des fractures.
1167 and the smallest one 470. It took the judges an average of roughly three hours to judge
the images for a single topic. Compared to the purely visual topics from 2004 (around one hour
judgement per topic containing an average of 950 images) the judgement process took much longer
per image as the semantic queries required to verify the text and often an enlarged version of the
images. The longer time might also be due to the fact that in 2004 all images were pre{marked as
irrelevant, and only relevant images required a change, whereas this year we did not have anything
pre{marked. Still, this process is signi cantly faster than most text research judgements, as a large
number of irrelevant images could be sorted out very quickly.</p>
        <p>We use a ternary judgement scheme including relevant, partially{relevant, and non{relevant.
For the o cial qrels, we only used images marked as relevant. We also had several topics judged
by two persons, but still took only the rst judgements for the evaluations. Further analysis will
follow in the nal conference proceedings when more knowledge is available on the used techniques
as well.
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Participants</title>
        <p>The number of registered participants of ImageCLEF has multiplied over the last three years.
ImageCLEF started with 4 participants in 2003, then in 2004 a total of 18 groups participated
and in 2005 we have 36 registered groups. The medical retrieval task had 12 participants in 2004
when it was purely visual and 13 in 2005 as a mixture of visual and non-visual retrieval. A
surprisingly small number of groups (13 of 28 registered groups) nally submitted results, which
can be due to the short time span between delivery of the images and the deadline for results
submission. Another point was the fact that several groups only registered very late as they had
not had information about ImageCLEF beforehand, but they were still interested in the datasets
also for future participations. As the registration to the task is free, they could simply register to
get this access.</p>
        <p>The following groups registered but were nally not able to submit results for a variety of
reasons:
{ UNED, LSI, Valencia, Spain
{ Central University, Caracas, Venezuela
{ Temple University, computer science, USA
{ Imperial College, computing lab, UK
{ Dublin City university, computer science, Ireland
{ CLIPS Grenoble, France
{ University of She eld, UK
{ Chinese University of Honk Kong, China</p>
        <p>Finally 13 groups (two of them from the same laboratory but di erent groups in Singapore)
submitted results for the medical retrieval task, including a total of 134 runs. Only 6 manual
runs were submitted. Here is a short list of their participation including a short description of the
submitted runs:
{ National Chiao Tuna University, Taiwan: submitted 16 runs in total, all automatic. 6 runs
were visual only and 10 mixed runs. They use simple visual features (color histogram,
coherence matrix, layout features) as well as text retrieval using a vector{space model with word
expansion using wordnet.
{ State university of New York (SUNY), Bu alo, USA: submitted a total of 6 runs, one visual
and ve mixed runs. GIFT was used as visual retrieval system and SMART as textual retrieval
system, while mapping the text to UMLS.
{ University and Hospitals of Geneva, Switzerland: submitted a total of 19 runs, all automatic
runs. This includes two textual and two visual runs plus 15 mixed runs. The retrieval relied
mainly on the GIFT and easyIR retrieval systems.
{ RWTH Aachen, computer science, Germany: submitted 10 runs, two being manual mixed
retrieval, two automatic textual retrieval, three automatic visual retrieval and three automatic
mixed retrieval. The Fire retrieval engine was used with varied visual features and a text search
engine using English and mixed{language retrieval.
{ Daedalus and Madrid University, Spain: submitted 14 runs, all automatic. 4 runs were visual
only and 10 were mixed runs; They mainly used semantic word expansions with EuroWordNet.
{ Oregon Health and Science University, Portland, OR, USA: submitted three runs in total, two
manual runs, one for visual and one for textual retrieval and one automatic textual run. As
retrieval engines GIFT and Lucene are being used.
{ University of Jaen, Spain: had a total of 42 runs, all automatic. 6 runs were textual, only, and
36 were mixed. GIFT is used as a visual query system and the LEMUR system is used for
text in a variety of con gurations to achieve multilingual retrieval.
{ Institute for Infocomm research, Singapore: submitted 7 runs, all of them automatic visual
runs; For their runs they rst manually selected visually similar images to train the features,
which should rather be classi ed as a manual run, then. Then, they use a two{step approach
for visual retrieval.
{ Institute for Infocomm research { second group , Singapore: submitted a total of 3 runs, all
visual with one being automatic and two manual runs The main technique applied is the
connection of medical terms and concepts to visual appearances.
{ RWTH Aachen { medical informatics, Germany: submitted two visual only runs with several
visual features and classi cation methods of the IRMA project.
{ CEA, France: submitted ve runs, all automatic with two being visual, only and three mixed
runs. The techniques used include the the PIRIA visual retrieval system and a simple frequency{
based text retrieval system.
{ IPAL CNRS/ I2R, France/Singapore: submitted a total of 6 runs, all automatic with two being
text only and the other a combination of textual and visual features. For textual retrieval they
map the text onto single axes of the MeSH ontology. They also use negative weight query
expansion and mix visual and textual results for optimal results.
{ University of Concordia, Canada: submitted one visual run containing a query only for the
rst image of every topic using only visual features. The technique applied is an association
model between low{level visual features and high{level concepts mainly relying on texture,
edge and shape features.</p>
        <p>In Table 5 an overview of the submitted runs can be seen including the query dimensions.
This section will give an overview of the best results of the various categories and will also do
some more in depth analysis on a topic basis. More needs to follow based on the submissions of
the papers from the participants.</p>
        <p>Table 6 shows all the manual runs that were submitted with a classi cation into the technique
used for the retrieval</p>
        <p>In Table 7 are the best 5 results for textual retrieval only and the best ten results for visual
and for mixed retrieval.</p>
        <p>If we are looking at single topics it becomes clear that the systems vary extremely over the
topics. If we calculate the average over the best system for each query we would be much closer to
0.5 than to what the best system actually achieved, 0.2821. So far, non of the systems optimised
the feature selection based on the query input.
3.6</p>
      </sec>
      <sec id="sec-3-5">
        <title>Discussion</title>
        <p>The results show a few clear trends. Very few groups performed manual submissions using relevance
judgements, which is most likely due to the need of resources for such evaluations. Still, relevance
feedback has shown to be extremely useful in many retrieval tasks and the evaluation of it seems
extremely necessary, as well. Surprisingly, in the submitted results, relevance feedback does not
seem to have a much superior performance compared to the automatic runs. In the 2004 tasks the
relevance feedback runs were often signi cantly better than without feedback.</p>
        <p>It also becomes clear that the topics developed were much more geared towards textual retrieval
than visual retrieval. The best results for textual retrieval are much higher than for visual retrieval
only, and a few of the bad textual runs seem simply to have indexing problems. When analysing
the topics in more details a clear division becomes clear between the developed visual and textual
topics, but also some of the topics marked as visual had actually better results using a textual
system. Some systems actually perform extremely well on a few topics but then extremely bad on
other topics. No system is actually the best system for more than two of the topics.</p>
        <p>The best results were clearly obtained when combining textual and visual features most likely
due to the fact that there were queries for that either one of the feature sets would work well.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Automatic Annotation Task</title>
      <p>4.1</p>
      <sec id="sec-4-1">
        <title>Introduction, Idea, and Objectives</title>
        <p>
          Automatic image annotation is a classi cation task, where an image is assigned to its correspondent
class from a given set of pre-de ned classes. As such, it is an important step for content-based
image retrieval (CBIR) and data mining [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. The aim of the Automatic Annotation Task in
ImageCLEFmed 2005 was to compare state-of-the-art approaches to automatic image annotation
and to quantify their improvements for image retrieval. In particular, the task aims at nding out
how well current techniques for image content analysis can identify the medical image modality,
body orientation, body region, and biological system examined. Such an automatic classi cation
can be used for multilingual image annotations as well as for annotation veri cation, e.g., to detect
false information held in the header streams according to Digital Imaging and Communications in
Medicine (DICOM) standard [16].
        </p>
        <p>23
24
25
26
27
28
29
30
31
32
36
37
38
39
40
41
42
44
45
46
47
48
49
50
51
52
53
The database consisted of 9,000 fully classi ed radiographs taken randomly from medical routine
at the Aachen University Hospital. 1,000 additional radiographs for which classi cation labels were
unavailable to the participants had to be classi ed into one of the 57 classes, the 9,000 database
images come from. Although only 57 simple class numbers were provided for ImageCLEFmed 2005.
The images are annotated with complete IRMA code, a multi-axial code for image annotation.
The code is currently available in English and German. It is planned to use the results of such
automatic image annotation tasks for further, textual image retrieval tasks in the future.</p>
        <p>Example images together with their class number are given in Figure 4. Table 8 gives the
English textual description for each of the classes.
4.3</p>
      </sec>
      <sec id="sec-4-2">
        <title>Participating Groups</title>
        <p>In total 26 groups registered for participation in the automatic annotation task. All groups have
downloaded the data but only 12 groups submitted runs. Each group had at least two di
erent submissions. The maximum number of submissions per group was 7. In total, 41 runs were
submitted which are brie y described in the following.</p>
        <p>CEA: CEA from France, submitted three runs. In each run di erent feature vectors were used
and classi ed using a k-Nearest Neighbour classi er (k was either 3 or 9). In the run labelled
cea/pj-3.txt the images were projected along horizontal and vertical axes to obtain a feature
histogram. For cea/tlep-9.txt histogram of local edge patterns features and colour features were
created, and for cea/cime-9.txt quanti ed colours were used.</p>
        <p>CINDI: The CINDI group from Concordia University in Montreal, Canada used multi-class SVMs
(one-vs-one) and a 170 dimensional feature vector consisting of colour moments, colour histograms,
cooccurence texture features, shape moment, and edge histograms.</p>
        <p>Geneva: The medGIFT group from Geneva, Switzerland used various di erent settings for
graylevels, and Gabor lters in their medGIFT image retrieval system.</p>
        <p>Infocomm: The group from Infocomm Institute, Singapore used three kinds of 16x16
low-resolutionmap-features: initial gray values, anisotropy and contrast. To avoid over- tting, for each of 57
classes, a separate training set was selected and about 6,800 training images were chosen out of
the given 9,000 images. Support Vector Machines with RBF (radial basis functions) kernels were
applied to train the classi ers which were then employed to classify the test images.
Miracle: The Miracle Group from UPM Madrid, Spain uses GIFT and a decision table majority
classi er to calculate the relevance of each individual result in miracle/mira20relp57.txt. In
mira20relp58IB8.txt additionally a k-nearest neighbour classi er with k = 8 and attribute
normalisation is used.</p>
        <p>Montreal: The group from University of Montreal, Canada submitted 7 runs, which di er in the
used features used. They to estimated, which classes are best represented by which features and
combined appropriate features.
mtholyoke: For the submission from Mount Holyoke College, MA, USA, Gabor energy features
were extracted from the images and two di erent cross-media relevance models were used to classify
the data.
nctu-dblab: The NCTU-DBLAB group from National Chiao Tung University, Taiwan used a
support vector machine (SVM) to learn image feature characteristics. Based on the SVM model,
several image features were used to predict the class of the test images.
ntu: The Group from National Taiwan University used mean gray values of blocks as features and
di erent classi ers for their submissions.
rwth-i6: The Human language technology and pattern recognition group from RWTH Aachen
University, Germany had two submissions. One used a simple zero-order image distortion model
taking into account local context. The other submission used a maximum entropy classi er and
histograms of patches as features.
rwth-mi: The IRMA group from Aachen, Germany used features proposed by TAMURA et al to
capture global texture properties and two distance measures for down-scaled representations, which
preserve spatial information and are robust w.r.t. global transformations like translation, intensity
variations, and local deformations. The weighing parameters for combining the single classi ers
were guessed for the rst submission and trained on a random 8,000 to 1,000 partitioning of the
training set for the second submission.
ulg.ac.be: The ULg method is based on random sub-windows and decision trees. During the
training phase, a large number of multi-size sub-windows are randomly extracted from training images.
Then, a decision tree model is automatically built (using Extra-Trees and/or Tree Boosting), based
on size-normalised versions of the sub-windows, and operating directly on their pixel values.
Classi cation of a new image similarly entails the random extraction of sub-windows, the application
of the model to these, and the aggregation of sub-window predictions.
The error rates ranges between 12.6 % and 73.3 % (Table 9). Based on the training data, a system
guessing the most frequent group for all 1,000 test images would result with 70.3 % error rate,
since 297 radiographs of the test set were from class 12 (Table 10). A more realistic baseline of 36.8
% error rate is computed from an 1-nearest-neighbour classi er comparing down-scaled 32 32
versions of the images using the Euclidean distance.</p>
        <p>For each class, a more detailed analysis including the number of training and test images
as well as with respect to all 41 submitted runs, the average classi cation accuracy, the class
most frequently misclassi ed, and the average percentage over all submitted runs of images being
assigned to this class is given in Table 10. Obviously, the di culty of the 57 classes diversi es. The
average classi cation accuracy range from 6.3 % to 90.7 %, and there is a tendency that classes
with less training images are more di cult. For instance for class 32, 78 images were contained
in the training but only one image in the test data. In 23 runs, this test image was misclassi ed
(43.9 %). Five times, it was labelled to be from class 25 (12.2 %). Also, it can be seen that many
images of the classes 7 and 8 have been classi ed to be of class 6.
Similar experiments have been described in literature. However, previous experiments have been
restricted to a small number of categories. For instance, several algorithms have been proposed for
orientation detection of chest radiographs, where lateral and frontal orientation are distinguished
by means of image content analysis [18,19]. For this two-class experiment, the error rates are
below 1 % [20]. In a recent investigation, Pinhas and Greenspan report error rates below 1 % for
automatic categorisation of 851 medical images into 8 classes [21]. In previous investigations of
the IRMA group, error rates between 5.3% and 15% were reported for experiments with 1617 of
6 [22] and 6,231 of 81 classes [23], respectively. Hence, error rates of 12 % for 10,000 of 57 classes
are plausible.</p>
        <p>As mentioned before, classes 6, 7, and 8 were frequently confused. All show parts of the arms
and thus look extremely similar (Fig. 4). However, a reason for the common misclassi cation in
favour of class 6 might be that there are by a factor of 5 more training images from class 6 than
from classes 7 and 8 together.</p>
        <p>Given the con dence les from all runs, classi er combination was tested using the sum- and
the product rule in such a manner that rst the two best con dence les were combined, then
the three best con dence les, and so forth. Unfortunately, the best results was 12.9%. Thus, no
improvement over the current best submission was possible using simple classi er combination
techniques.</p>
        <p>Having some results close to 10% error rate, classi cation and annotation of images might open
interesting vistas for CBIR systems. Although the task considered here is more restricted than
the Medical Retrieval Task and thus can be considered easier, techniques applied here will most
probably be apt to be used in future CBIR applications, too. Therefore, it is planned to use the
results of such automatic image annotation tasks for further, textual image retrieval tasks.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>ImageCLEF has continued to attract researchers from a variety of global communities interested
image retrieval using both low-level image features and associated texts. This year we have
improved the ad-hoc medical retrieval by enlarging the image collection and creating more semantic
submission
rwth-i6/IDMSUBMISSION
rwth_mi-ccf_idm.03.tamura.06.confidence
rwth-i6/MESUBMISSION
ulg.ac.be/maree-random-subwindows-tree-boosting.res
rwth-mi/rwth_mi1.confidence
ulg.ac.be/maree-random-subwindows-extra-trees.res
geneva-gift/GIFT5NN_8g.txt
infocomm/Annotation_result4_I2R_sg.dat
geneva-gift/GIFT5NN_16g.txt
infocomm/Annotation_result1_I2R_sg.dat
infocomm/Annotation_result2_I2R_sg.dat
geneva-gift/GIFT1NN_8g.txt
geneva-gift/GIFT10NN_16g.txt
miracle/mira20relp57.txt
geneva-gift/GIFT1NN_16g.txt
infocomm/Annotation_result3_I2R_sg.dat
ntu/NTU-annotate05-1NN.result
ntu/NTU-annotate05-Top2.result
geneva-gift/GIFT1NN.txt
geneva-gift/GIFT5NN.txt
miracle/mira20relp58IB8.txt
ntu/NTU-annotate05-SC.result
nctu-dblab/nctu_mc_result_1.txt
nctu-dblab/nctu_mc_result_2.txt
nctu-dblab/nctu_mc_result_4.txt
nctu-dblab/nctu_mc_result_3.txt
nctu-dblab/nctu_mc_result_5.txt
cea/pj-3.txt
mtholyoke/MHC_CQL.RESULTS
mtholyoke/MHC_CBDM.RESULTS
cea/tlep-9.txt
cindi/Result-IRMA-format.txt
cea/cime-9.txt
montreal/UMontreal_combination.txt
montreal/UMontreal_texture_coarsness_dir.txt
nctu-dblab/nctu_mc_result_gp2.txt
montreal/UMontreal_contours.txt
montreal/UMontreal_shape.txt
montreal/UMontreal_contours_centred.txt
montreal/UMontreal_shape_fourier.txt
montreal/UMontreal_texture_directionality.txt
Euclidean Distance, 32x32 images, 1-Nearest-Neighbor
queries based on realistic information needs of medical professionals. The ad-hoc task has
continued to attract interest and this year has seen an increase in the number of translated topics and
those with translated narratives. The addition of the IRMA annotation task has provided a further
challenge to the medical side of ImageCLEF and proven a popular task for participants, covering
mainly the visual retrieval community. The user-centered retrieval task, however, remains with
low participation, mainly due to the high level of resources required to run an interactive task. We
will continue to improve tasks for ImageCLEF 2006 mainly based on feedback from participants.</p>
      <p>A large number of participants only registered but nally did not submit results. This means
that the resources are very valuable and already access to the resources is a readon to register.
Still, only if we have participants submitting results with di erent techiques, there is really the
possibility to compare retrieval systems and developed better retrieval for the future. So for 2006
we hope to receive much feedback for tasks and many people who register, submit results and
participate at the CLEF workshop to discuss the presented techniques.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>This work has been funded in part by the EU Sixth Framework Programme (FP6) within the Bricks
project (IST contract number 507457) as well as the SemanticMining project (IST NoE 507505).
The establishment of the IRMA database was funded by the German Research Community DFG
under grand Le 1108/4. We also acknowledge the generous support of National Science Foundation
(NSF) grant ITR-0325160.
categorisation of medical images for content-based retrieval and data mining. Computerized Medical
Imaging and Graphics 2005; 29(2): 143-155.
16. Guld MO, Kohnen M, Keysers D, Schubert H, Wein B, Bredno J, Lehmann TM. Quality of DICOM
header information for image categorization. Procs SPIE 2002; 4685: 280-287.
17. Lehmann TM, Schubert H, Keysers D, Kohnen M, Wein BB. The IRMA code for unique classi cation
of medical images. Procs SPIE 2003; 5033: 440-451.
185-189.</p>
      <p>190-193.
19. Boone JM, Seshagiri S, Steiner RM. Recognition of chest radiograph orientation for picture archiving
and communications systems display using neural networks. Journal of Digital Imaging 1992; 5(3):
20. Lehmann TM, Guld MO, Keysers D, Schubert H, Kohnen M, Wein BB. Determining the view position
of chest radiographs. Journal of Digital Imaging 2003; 16(3): 280-291.
21. Pinhas A, Greenspan H. A continuous and probabilistic framework for medical image representation
and categorization. Procs SPIE 2003; 5371: 230-238.</p>
      <p>Proc. Bildverarbeitung fr die Medizin 2004 : 366-370
gorization of medical images. Procs SPIE 2004; 5371: 211-222.
22. Keysers D, Gollan C., Ney H. Classi cation of Medical Images using Non-linear Distortion Models.
23. Guld MO, Keysers D, Leisten M, Schubert H, Lehmann TM. Comparison of global features for
cate</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Clough</surname>
          </string-name>
          , P.D. and
          <string-name>
            <surname>Sanderson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2003</year>
          ),
          <article-title>The CLEF 2003 cross language image retrieval track</article-title>
          ,
          <source>In Proceedings of Cross Language Evaluation Forum (CLEF) 2003 Workshop</source>
          , Trondheim, Norway.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Clough</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Muller, H. and
          <string-name>
            <surname>Sanderson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>The CLEF 2004 Cross Language Image Retrieval Track, In Multilingual Information Access for Text, Speech and Images: Results of the Fifth CLEF Evaluation Campaign</article-title>
          , Eds (Peters,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Clough</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Kluck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            and
            <surname>Magnini</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          ),
          <source>Lecture Notes in Computer Science (LNCS)</source>
          , Springer, Heidelberg, Germany,
          <year>2005</year>
          , Volume
          <volume>3491</volume>
          /
          <year>2005</year>
          ,
          <fpage>597</fpage>
          -
          <lpage>613</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. . J.
          <string-name>
            <surname>Cox</surname>
            ,
            <given-names>M. L.</given-names>
          </string-name>
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <string-name>
            <surname>Omohundro</surname>
            , and
            <given-names>P. N.</given-names>
          </string-name>
          <string-name>
            <surname>Yianilos</surname>
          </string-name>
          . Pichunter:
          <article-title>Bayesian relevance feedback for image retrieval</article-title>
          .
          <source>Proceedings of the 13th International Conference on Pattern Recognition</source>
          ,
          <volume>3</volume>
          :
          <fpage>361</fpage>
          {
          <fpage>369</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Grubinger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leung</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Clough</surname>
          </string-name>
          , P.D.
          <article-title>Towards a Topic Complexity Measure for Cross-Language Image Retrieval</article-title>
          ,
          <source>In Proceedings of Cross Language Evaluation Forum (CLEF) 2005 Workshop</source>
          , Vienna, Austria
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Petrelli</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Clough</surname>
          </string-name>
          , P.D.
          <article-title>Concept Hierarchy across Languages in Text-Based Image Retrieval: A User Evaluation</article-title>
          ,
          <source>In Proceedings of Cross Language Evaluation Forum (CLEF) 2005 Workshop</source>
          , Vienna, Austria
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Villena-Roman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crespo-Garc</surname>
            <given-names>a</given-names>
          </string-name>
          , R.M., and
          <string-name>
            <surname>Gonzalez-Cristobal</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          <article-title>Boolean Operators in Interactive Search</article-title>
          ,
          <source>In Proceedings of Cross Language Evaluation Forum (CLEF) 2005 Workshop</source>
          , Vienna, Austria
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Candler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. H.</given-names>
            <surname>Uijtdehaage</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Dennis. Introducing</surname>
          </string-name>
          <string-name>
            <surname>HEAL</surname>
          </string-name>
          :
          <article-title>The health education assets library</article-title>
          .
          <source>Academic Medicine</source>
          ,
          <volume>78</volume>
          (
          <issue>3</issue>
          ):
          <volume>249</volume>
          {
          <fpage>253</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>K.</given-names>
            <surname>Glatz-Krieger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Glatz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gysel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dittler</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Mihatsch</surname>
          </string-name>
          .
          <article-title>Webbasierte Lernwerkzeuge fur die Pathologie { web{based learning tools for pathology</article-title>
          .
          <source>Pathologe</source>
          ,
          <volume>24</volume>
          :
          <fpage>394</fpage>
          {
          <fpage>399</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>W.</given-names>
            <surname>Hersh</surname>
          </string-name>
          , H. Muller, P. Gorman, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Jensen</surname>
          </string-name>
          .
          <article-title>Task analysis for evaluating image retrieval systems in the ImageCLEF biomedical image retrieval task</article-title>
          .
          <source>In Slice of Life conference on Multimedia in Medical Education (SOL</source>
          <year>2005</year>
          ), Portland,
          <string-name>
            <surname>OR</surname>
          </string-name>
          , USA,
          <year>June 2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>W. R.</given-names>
            <surname>Hersh</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. H.</given-names>
            <surname>Hickam</surname>
          </string-name>
          .
          <article-title>How well do physicians use electronic information retrieval systems?</article-title>
          <source>Journal of the American Medical Association</source>
          ,
          <volume>280</volume>
          (
          <issue>15</issue>
          ):
          <volume>1347</volume>
          {
          <fpage>1352</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>M.</given-names>
            <surname>Markkula</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Sormunen</surname>
          </string-name>
          .
          <article-title>Searching for photos { journalists' practices in pictorial IR</article-title>
          . In J. P. Eakins,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Harper</surname>
          </string-name>
          , and J. Jose, editors,
          <source>The Challenge of Image Retrieval, A Workshop and Symposium on Image Retrieval</source>
          , Electronic Workshops in Computing,
          <source>Newcastle upon Tyne</source>
          ,
          <volume>5</volume>
          {
          <issue>6</issue>
          <year>February 1998</year>
          .
          <article-title>The British Computer Society</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. H. Muller,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rosset</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Vallee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Terrier</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Geissbuhler</surname>
          </string-name>
          .
          <article-title>A reference data set for the evaluation of medical image retrieval systems</article-title>
          .
          <source>Computerized Medical Imaging and Graphics</source>
          ,
          <volume>28</volume>
          :
          <fpage>295</fpage>
          {
          <fpage>305</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>A.</given-names>
            <surname>Rosset</surname>
          </string-name>
          , H. Muller, M. Martins,
          <string-name>
            <given-names>N.</given-names>
            <surname>Dfouni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Vallee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>O.</given-names>
            <surname>Ratib</surname>
          </string-name>
          .
          <article-title>Casimage project { a digital teaching les authoring environment</article-title>
          .
          <source>Journal of Thoracic Imaging</source>
          ,
          <volume>19</volume>
          (
          <issue>2</issue>
          ):1{
          <issue>6</issue>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Wallis</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Miller</surname>
            ,
            <given-names>T. R.</given-names>
          </string-name>
          <string-name>
            <surname>Miller</surname>
            , and
            <given-names>T. H.</given-names>
          </string-name>
          <string-name>
            <surname>Vreeland</surname>
          </string-name>
          .
          <article-title>An internet{based nuclear medicine teaching le</article-title>
          .
          <source>Journal of Nuclear Medicine</source>
          ,
          <volume>36</volume>
          (
          <issue>8</issue>
          ):
          <volume>1520</volume>
          {
          <fpage>1527</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Lehmann</surname>
            <given-names>TM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gld</surname>
            <given-names>MO</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deselaers</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keysers</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schubert</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spitzer</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ney</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wein</surname>
            <given-names>BB</given-names>
          </string-name>
          .
          <article-title>Automatic 18</article-title>
          .
          <string-name>
            <surname>Pietka</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            <given-names>HK</given-names>
          </string-name>
          .
          <article-title>Orientation correction for chest images</article-title>
          .
          <source>Journal of Digital Imaging</source>
          <year>1992</year>
          ;
          <volume>5</volume>
          (
          <issue>3</issue>
          ):
          <fpage>5</fpage>
          <source>.11 5.12 5.13 5.14 5.15 5.16 5.17 5.18 5.19 5.20 5.21 5.22 5.23 5.24 5.25 5.26 5.27 5.28 Fig</source>
          .
          <article-title>5. Example images given to participants for the ad-hoc retrieval task (1 of 2 images)</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>