<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the ImageCLEFmed 2008 medical image retrieval task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Henning Muller</string-name>
          <email>henning.mueller@sim.hcuge.ch</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jayashree Kalpathy</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cramer</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Charles E. Kahn Jr.</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>William Hatt</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Steven Bedrick</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>William Hersh</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Radiology, Medical College of Wisconsin</institution>
          ,
          <addr-line>Milwaukee, WI</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Measurement</institution>
          ,
          <addr-line>Performance, Experimentation</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Medical Informatics, University Hospitals and University of Geneva</institution>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Oregon Health and Science University (OHSU)</institution>
          ,
          <addr-line>Portland, OR</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Applied Sciences Western Switzerland</institution>
          ,
          <addr-line>Sierre</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Information Search and Retrieval; H.3.4 Systems and Software; H.3.7 Digital Libraries; H.2.3 [Database Managment]: Languages|Query Languages</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>2008 was the fth year for the medical image retrieval task of ImageCLEF, one of
the most popular tracks within CLEF. Participation continued to increase in 2008. A
total of 15 groups submitted 111 valid runs. Several requests for data access were also
received after the registration deadline.</p>
      <p>The most signi cant change in 2008 was the use of a new database containing images
from the medical literature. These images, part of the Goldminer collection, were
from the RSNA journals Radiology and Radiographics. Besides the images, the gure
captions and the part of the caption referring to a particular sub gure were supplied
to the participants. Access to the full text articles in HTML was also provided, as
was each article's Medline PMID (PubMed Identi er). An article's PMID could be
used to obtain the o cially assigned MeSH (Medical Subject Headings) terms. Unlike
previous years, this year's collection was entirely in English, as it was obtained from
English-language medical literature. However, the topics were, as in previous years,
supplied in German, French, and English. The topics used in 2008 were a subset of the
85 topics used in 2005-2007. Thirty topics were made available, ten in each of three
categories: visual, mixed, and semantic.</p>
      <p>As in previous years, most groups concentrated on fully automatic retrieval.
However, three groups submitted a total of seven manual or interactive runs; these runs
did not show a substantial increase in performance over the automatic approaches. In
previous years, multi{modal combinations were the most frequent submissions.
However, in 2008 only half as many mixed runs as purely textual runs were submitted.
Very few fully visual runs were submitted, and the ones submitted performed poorly.
This may be explained in part by the heavily semantic nature of the 2008 topics.</p>
      <p>The best MAP scores were very similar for textual and multi{modal approaches,
whereas early precision performance was clearly better for the multi-modal approaches.</p>
    </sec>
    <sec id="sec-2">
      <title>Categories and Subject Descriptors</title>
    </sec>
    <sec id="sec-3">
      <title>General Terms</title>
      <p>1</p>
      <sec id="sec-3-1">
        <title>Introduction</title>
        <p>
          ImageCLEF1 [
          <xref ref-type="bibr" rid="ref1 ref2 ref5">1, 2, 5</xref>
          ] started within CLEF2 (Cross Language Evaluation Forum, [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]) in 2003. A
medical image retrieval task was added in 2004 to explore domain{speci c multilingual visual
information retrieval and also multi{modal retrieval by combining visual and textual features
for retrieval. A medical retrieval task and a medical image annotation task have been part of
ImageCLEFmed since 2005 [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>
          This paper reports on the medical retrieval task whereas additional papers describe the four
other tasks of ImageCLEF. More detailed information can also be found on the task web pages
for ImageCLEFmed. A detailed analysis of a previous medical image retrieval task is available in
[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>The medical retrieval task in 2008</title>
        <p>The main change in the medical retrieval task in 2008 was the use of a new database. The search
tasks remained essentially the same as in the previous years. The collection distributed to the
participants included the images and the captions, as published in the medical journals. URLs to
access the full text of the journal article were also made available to the participants.
2.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Registration and participation</title>
      <p>As in previous years, registration for the medical retrieval task increased in 2008, albeit slowly.
Several of the groups registered solely to obtain the test collection in order to use it as training
data for their algorithms, rather than actually participating in the competition. In the end, 15
research groups submitted a total of 130 runs. Groups were asked to not submit more than ten
runs in 2008 (di erent from previous years) so as not to bias the pools too much towards any
single group.</p>
      <p>There were signi cant problems with many of the 130 initial runs: some were submitted in
incorrect formats; several runs were duplicated; and there were runs that provided search results
for only a subset of the thirty topics. These problems were corrected in collaboration with the
authors as much as was possible, resulting in 111 valid runs that were used to generate the pools
that were nally judged for relevance. The following groups submitted valid runs:</p>
      <sec id="sec-4-1">
        <title>Hungarian Acadamy of Sciences, Budapest, Hungary; National Library of Medicine (NLM), National Institutes of Health NIH, Bethesda, MD, USA;</title>
      </sec>
      <sec id="sec-4-2">
        <title>Bania Luka University, Bosnia-Hercegovina;</title>
      </sec>
      <sec id="sec-4-3">
        <title>MedGIFT group, University of Geneva, Switzerland;</title>
        <p>Natural Language Processing group, University Hospitals of Geneva, Switzerland;</p>
      </sec>
      <sec id="sec-4-4">
        <title>GPLSI group, University of Alicante, Spain;</title>
        <p>1http://www.imageclef.org/
2http://www.clef-campaign.org/</p>
        <p>Multimedia Modelling Group, LIG, Grenoble, France;</p>
      </sec>
      <sec id="sec-4-5">
        <title>Natural Language Processing at UNED. Madrid, Spain;</title>
      </sec>
      <sec id="sec-4-6">
        <title>Miracle group, Spain;</title>
        <p>Oregon Health and Science University (OHSU), Portland, OR, USA;</p>
      </sec>
      <sec id="sec-4-7">
        <title>IRIT Toulouse, France;</title>
      </sec>
      <sec id="sec-4-8">
        <title>University of Jaen, Spain;</title>
      </sec>
      <sec id="sec-4-9">
        <title>Tel Aviv University, Israel;</title>
      </sec>
      <sec id="sec-4-10">
        <title>National University of Bogota, Colombia;</title>
      </sec>
      <sec id="sec-4-11">
        <title>TextMess group, University of Alicante, Spain. Thus, a total of 15 groups from eight countries and four continents submitted results that are presented in the following chapters.</title>
        <p>2.2</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Database</title>
      <p>The database used for the task in 2008 was made available by the Radiological Society of North
America (RSNA). The database contains in total slightly more than 66,000 images taken from
the radiological journals Radiology and Radiographics. The images are original gures used in
published articles. The collection is a subset of a larger database that is available via the
Goldminer3 image search engine. For each image, the text of the gure caption was supplied as free
text. However, this caption was sometimes associated with a multi-part image. In over 90% of
the images the part of the caption actually referring to this sub{image was also provided.
Additionally, links to HTML versions of the full{text articles were provided along with the relevant
PubMed accession ID numbers. Both the full{size images as well as thumbnails were available to
the participants. All text was in English.</p>
      <p>The contents of this database represent a broad and signi cant body of medical knowledge,
which makes this year's competition a potentially realistic scenario for how clinicians might use
image retrieval systems in the future.
2.3</p>
    </sec>
    <sec id="sec-6">
      <title>Query topics</title>
      <p>
        The query topics in 2008 were a selection of 30 topics from the previous three years of
ImageCLEFmed [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Training data in the form of the 2005-2007 database with images, annotations,
topics, sample query images and qrel les was made available to participants. All topics were
supposed to cover at least two of the following axes:
      </p>
      <p>Anatomic region shown in the image;</p>
      <sec id="sec-6-1">
        <title>Image modality (x{ray, CT, MRI, gross pathology, ...);</title>
      </sec>
      <sec id="sec-6-2">
        <title>View (frontal, sagittal,...);</title>
      </sec>
      <sec id="sec-6-3">
        <title>Pathology or disease shown in the image;</title>
        <p>abnormal visual observation (eg. enlarged heart).</p>
        <p>From the 85 possible topics of past years, similar topics were removed to cover a wide range of
di erent modalities and anatomic regions. A visual and textual check was then performed to make
sure that at least a few relevant images exist in the dataset. Since the databases of 2008 and 2007
were very di erent, we wanted to ensure that each topic had more than one relevant image exist.</p>
        <p>Each query topic consists of the information need in three languages (English, French, German)
and at least two example images. Groups could decide which language and media to use for the
query processing and also which part of the text to use.
2.4</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Relevance judgments</title>
      <p>
        A new system for relevance judgments was introduced in 2008 building on a Ruby for Rails
framework and allowing for simple judgments via a web interface for all judges. The rst 35
images of every run were combined into \pools" with an average size of around 900 images. Such
pooling is necessary to reduce the amount of data to judge, and the bias can be regarded as very
limited [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Medical Doctors who are also students of biomedical informatics at OHSU were hired
for the judgment process and paid by the hour for the judgments.
      </p>
      <p>A ternary judgment scheme was used, wherein each image in each pool was judged to be
\relevant", \partly relevant", or \non{relevant". Images clearly corresponding to all criteria were
judged as \relevant", images whose relevance could not be safely con rmed but could still be
possible were marked as \partly relevant", and images for which one or more criteria of the topic
were not met were marked as \non{relevant". Judges were instructed in these criteria and results
were manually controlled during the judgment process.</p>
      <p>During the judging, the new system exhibited a minor problem that resulted in certain images
losing their judgments. This resulted in a short delay in the judging process, after which the
a ected images were re{judged by the same persons.
3</p>
      <sec id="sec-7-1">
        <title>Submissions and results</title>
        <p>This section details the submissions for the tasks and a rst brief evaluation. A more detailed
evaluation of the techniques will follow in the nal proceedings when more details on the techniques
used for the submissions will be known. Unfortunately, information on the techniques used in the
submissions is not always made available by the participants well ahead of time and in great detail.</p>
        <p>Trec eval was used for the evaluation process with most of its performance measures.
3.1</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Submissions</title>
      <p>A total of 130 runs were submitted via the electronic submission system. Scripts to check the
validity of the runs were made available to participants ahead of the submission phase, but even
so, almost half of the submitted runs contained errors in either content or format and required
changes. Common mistakes included a wrong trec eval format, use of only a subset of the topics
and incorrect image identi ers. In collaboration with the authors a large number of runs were
repaired, resulting in 111 valid runs taken into account for the pools.</p>
      <p>In total, only seven runs were \manual" or \interactive". There were also fewer \visual{only"
runs than in all previous years, with only 8 such runs being submitted. The large majority were
text{only runs, with 65 submissions. Mixed automatic runs had 31 submissions.</p>
      <p>Groups subsequently had the chance to evaluate additional runs themselves as the qrels were
made available to participants two weeks ahead of the submission deadline for these working notes.
3.2</p>
    </sec>
    <sec id="sec-9">
      <title>Visual retrieval</title>
      <p>The number of visual runs in 2008 was much lower than in previous years, and the evolution is
not as fast as with textual retrieval techniques. Five groups submitted a total of eight runs in
2008. Performance as measured in MAP is very low for all these runs, reaching a maximum of 0.04
for the best run. Early precision averaged over all topics reaches around 0.2, which is absolutely
acceptable. When taking into account only the visual topics these results are much better, whereas
the purely semantic topics obtained extremely poor results.</p>
      <p>Table 1 shows the results and particularly the large di erences between the runs. Some runs
managed to retrieve a larger part of the relevant images (809) but with a fairly low MAP, whereas
some runs with a higher MAP only found a very small number of relevant images in the rst 1000
results. A higher bpref in this context can mean that a larger number of images from these runs
were not judged for relevance. This might also be due to the fact that only very few visual runs
were submitted and thus only few visually retrieved documents were nally judged.</p>
      <p>Run
TAU MIPLAB-TAU norm
UNAL-W+QE+JS
GE GIFT8
MIPLAB-TAU orig
etfbl-max11111
etfbl-sum11111
GE GIFT16
LSI UNED
CEB Image</p>
      <p>Results of GIFT were available to the all the participants for combinations of visual and textual
runs.
3.3</p>
    </sec>
    <sec id="sec-10">
      <title>Textual retrieval</title>
      <p>Purely automatic textual retrieval had by far the largest number of runs in 2008 with 65, more
than half of all submitted runs. Table 2 shows the results for all submitted automatic text runs,
ordered by MAP. Most performance measures such as bpref and early precision are similar in
order. Only early precision sometimes has signi cant di erences from the ranking with MAP.</p>
      <p>Runs from the University of Alicante (Textmess), University of Jaen (SINAI), and LIG
Grenoble teams obtained the best results, mainly by using ontologies such as MeSH (Medical Subject
Headings) to code the documents. A MAP of 0.29 could be obtained and several systems have
a high score very close to this. A more detailed analysis is required with the exact techniques
applied for each of the runs.
3.3.1</p>
      <p>Using various languages for the retrieval
Unfortunately, very little information was available on which languages the groups used for the
retrieval. It can be assumed that most groups used English as this promises the best results. It was
also possible to use all three query languages together, for example, for extracting MeSH terms.
While this multi{lingual approach is not necessarily a realistic scenario, it can lead to interesting
results.</p>
      <p>The HUG group used the same techniques with several languages and showed that English
obtained by far the best results, better than either French or German. The technique they applied
was to map of MeSH terms form the text and queries in various languages. Through the PMIDs,
the o cially (manually) assigned MeSH terms of the articles were also available. The MeSH terms
extracted from the article and query text performed worse for retrieval than the o cially assigned
terms.
3.3.2</p>
      <p>Additional resources used for the retrieval
Groups could also state which additional resources were used for retrieval. The goal of this was to
assemble a collection of available resources that could potentially be shared among participants to
improve performance in future challenges. A large variety of resources were used, in large part for
the combination of visual and textual runs, but also for purely textual runs. Many of the best runs
used the ImageCLEFmed 2005-2007 data for training. O cial MeSH terms manually assigned by
the National Library of Medicine could be used through the PMIDs of the articles.</p>
      <p>The most commonly used resources were the training data sets of ImageCLEF 2005-2007.
There were numerous challenges with this approach, as the database used from 2005-07 di ered
greatly from the 2008 database. The annotations in the '05-07 database were of much poorer
quality than in the 2008 database, and the two databases were made up of very di erent types
of images. Nonetheless, the 2008 topics were a subset of those from previous years' competitions,
and so the scenario was somewhat realistic with respect to the training data.
3.4</p>
    </sec>
    <sec id="sec-11">
      <title>Mixed retrieval</title>
      <p>The promotion of mixed{media retrieval has always been one of the main goals of ImageCLEF.
In past years, mixed{media retrieval had the highest submission rate. In 2008, however, only half
as many mixed runs were submitted than purely textual runs.</p>
      <p>Table 3 shows the results for all submitted runs. It is clear that, for a large number of the runs,
the MAP results for the mixed retrieval submissions were very similar to those from the purely
textual retrieval systems. An interesting observation is that the mixed-media submissions often
have higher early precision than the purely textual retrieval submissions. This con rms what has
been previously observed.</p>
      <p>The text{only runs exhibited relatively high correlation between MAP and bpref. This was
not the case among the mixed{media runs. One possible explanation for this di erence could be
that the mixed{media runs used a wider variety of techniques than the text{only runs. Another
possible explanation is that more of the mixed{media runs were submitted after the deadline for
pool inclusion. If the mixed{media runs retrieved a higher proportion of non-judged images than
the text{only runs, the result would be a larger MAP/bpref variance.</p>
      <p>When comparing these mixed{media results with those from the text{only runs, it becomes
clear that mixed retrieval can obtain very low results. From examining mixed{media runs which
had corresponding text{only runs, it is particularly clear that combining good textual retrieval
techniques with questionable visual retrieval techniques can negatively a ect system performance.
This demonstrates the di culty of usefully integrating both textual and visual information, and
'#"
'!"
&amp;"
)
(
'
#%$&amp; %"
#
"!
$"
#"
!"
012/3"
4/24567"
Run
ohsu int 2
ohsu sdb full interactive
ohsu sdb lsa
CEB ITD ALL
CEB IBaseM
CEB TD ALL
CEB TD3
the fragility that such combinations can introduce into retrieval systems. As seen in 1, the
distribution of MAP for the textual runs was higher than that for the mixed runs. A signi cant
mode exists around a MAP of 0.05 for the mixed runs, while the modes for the textual runs are
at 0.15 and 0.28.
This year, as in previous years, interactive retrieval was only used by a very small number of
participants. Interactive retrieval is extremely important, and it is a pity so few groups chose to
attempt anything other than purely automated systems.</p>
      <p>Table 4 shows the results of all manual and interactive runs submitted. Two runs from OHSU
had fairly good results; the other runs were competitive in neither the MAP nor the early precision
categories when compared to the fully automatic runs. In general, MAP and early precision were
well-correlated (R2 = 0:82 for textual runs, 0.68 for mixed runs); these two runs, however, had
higher early precision than their MAP would predict.
3.6</p>
    </sec>
    <sec id="sec-12">
      <title>Topic Analysis</title>
      <p>Overall, most groups performed signi cantly better on the semantic topics than on the mixed or
visual topics, as can be seen in the table below. Topics 6 and 11{18 were quite di cult for many
participants. Table 5 gives an overview of the best and average perform per topic. Some topics
with a small number of relevant images have a particularly low performance.</p>
      <p>The fact that many of the visual topics obtained poorer performance than the semantic topics
also shows that groups have much more experience working on semantic topics, and that visual
retrieval currently has much more di culty obtaining good results. That said, visual retrieval
can have an important positive in uence, and it seems necessary to promote it further by having
potentially a larger number of visual topics to push groups towards using visual techniques.
Four topics were each judged by two judges. We performed tests of inter-rater agreements using
kappa statistics, as seen in table 6. In 3 of the four cases, the inter-rater agreement was quite
good. In the last case, one judge interpreted the query more strictly than the other.
4</p>
      <sec id="sec-12-1">
        <title>Conclusions</title>
        <p>The focus of many participants in this year's ImageCLEF 2008 has been text{based retrieval.
The increasingly semantic topics combined with a database containing high{quality annotations
in 2008 may have resulted in less impact of using visual techniques as compared to previous
years. This tendency is also seen when looking at the performance by topic where visual topics
had signi cantly lower results than the semantic topics. Our goal in the upcoming ImageCLEF
medical retrieval task is to increase the number of visual runs submitted. We hope to modify the
task to favor more integrated approaches. Another important aspect is that interactive retrieval
has always had a poor participation and de nitely needs to be regarded more strongly. Relevance
feedback and query modi cations have a potential to signi cantly improve results, but of course
research favors laboratory style evaluations.</p>
        <p>Visual runs were rare and had no single run with a very convincing performance as was the
case in 2007, where the best visual runs had an extremely good performance. Mixed{media runs
were very similar in performance to textual runs when looking at MAP. The only di erence was
that mixed{media runs obtained better early precision in general. Several mixed{media runs were
also broken, resulting in a very poor performance. This highlights that the combination is still
not very stable.</p>
        <p>A per{topic analysis shows that visual topics obtained lower average results than semantic
topics. The analysis also shows that several runs with very few relevant images have a very low
average performance, whereas topics with a larger number seem to perform better.</p>
      </sec>
      <sec id="sec-12-2">
        <title>Acknowledgements</title>
        <p>We would like to thank the CLEF campaign for supporting the ImageCLEF initiative. The
images for the 2008 ImageCLEFmed challenge were contributed by the Radiological Society of
North America (RSNA). This work was partially funded by the Swiss National Science Foundation
(FNS) under contract 205321{109304/1, the American National Science Foundation (NSF) with
grant ITR{0325160, and by the University of Applied Sciences Western Switzerland (HES SO) in
the context of the BeMeVIS project.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Paul</given-names>
            <surname>Clough</surname>
          </string-name>
          , Michael Grubinger, Thomas Deselaers, Allan Hanbury, and
          <article-title>Henning Muller. Overview of the ImageCLEF 2006 photo retrieval and object annotation tasks</article-title>
          .
          <source>In CLEF 2006 Proceedings, volume 4730 of Springer Lecture Notes in Computer Science</source>
          , pages
          <volume>579</volume>
          {
          <fpage>594</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Paul</given-names>
            <surname>Clough</surname>
          </string-name>
          , Henning Muller, and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Sanderson</surname>
          </string-name>
          .
          <article-title>The CLEF cross{language image retrieval track (ImageCLEF) 2004</article-title>
          . In Carol Peters, Paul Clough, Julio Gonzalo, Michael Jones,
          <string-name>
            <given-names>Gareth J. F.</given-names>
            and
            <surname>Kluck</surname>
          </string-name>
          , and Bernardo Magnini, editors,
          <source>Multilingual Information Access for Text</source>
          ,
          <article-title>Speech and Images: Result of the fth CLEF evaluation campaign</article-title>
          , volume
          <volume>3491</volume>
          of Lecture Notes in Computer Science (LNCS), pages
          <fpage>597</fpage>
          {
          <fpage>613</fpage>
          ,
          <string-name>
            <surname>Bath</surname>
          </string-name>
          , UK,
          <year>2005</year>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>William</given-names>
            <surname>Hersh</surname>
          </string-name>
          , Henning Muller, Je ery Jensen, Jianji Yang, Paul Gorman, and
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Ruch</surname>
          </string-name>
          .
          <article-title>Advancing biomedical image retrieval: Development and analysis of a test collection</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          , September/October:
          <volume>488</volume>
          {
          <fpage>496</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>William</given-names>
            <surname>Hersh</surname>
          </string-name>
          , Henning Muller, and
          <string-name>
            <surname>Jayashree</surname>
          </string-name>
          Kalpathy-Cramer.
          <article-title>The imageclefmed medical image retrieval task test collection</article-title>
          .
          <source>Journal of Digital Imaging</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Henning</given-names>
            <surname>Mu</surname>
          </string-name>
          ller, Thomas Deselaers, Eugene Kim, Jayashree Kalpathy-Cramer,
          <string-name>
            <given-names>Thomas M.</given-names>
            <surname>Deserno</surname>
          </string-name>
          , Paul Clough, and
          <string-name>
            <given-names>William</given-names>
            <surname>Hersh</surname>
          </string-name>
          .
          <article-title>Overview of the ImageCLEFmed 2007 medical retrieval and annotation tasks</article-title>
          .
          <source>In CLEF 2007 Proceedings, volume 5152 of Lecture Notes in Computer Science (LNCS)</source>
          , Budapest, Hungary,
          <year>2008</year>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Jacques</given-names>
            <surname>Savoy</surname>
          </string-name>
          .
          <source>Report on CLEF{2001 experiments. In Report on the CLEF Conference 2001 (Cross Language Evaluation Forum)</source>
          , pages
          <fpage>27</fpage>
          {
          <fpage>43</fpage>
          ,
          <string-name>
            <surname>Darmstadt</surname>
          </string-name>
          , Germany,
          <year>2002</year>
          . Springer LNCS 2406.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Justin</given-names>
            <surname>Zobel</surname>
          </string-name>
          .
          <article-title>How reliable are the results of large{scale information retrieval experiments</article-title>
          ? In W. Bruce Croft, Alistair Mo at, C. J. van Rijsbergen,
          <string-name>
            <surname>Ross Wilkinson</surname>
          </string-name>
          , and Justin Zobel, editors,
          <source>Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , pages
          <volume>307</volume>
          {
          <fpage>314</fpage>
          ,
          <string-name>
            <surname>Melbourne</surname>
          </string-name>
          , Australia,
          <year>August 1998</year>
          . ACM Press, New York.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>