<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Report on the CLEF Experiment: Combining Image and Multi{lingual Search for Medical Image Retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Henning Muller</string-name>
          <email>henning.mueller@sim.hcuge.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antoine Geissbuhler</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Patrick Ruch</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IN-Ecublens</institution>
          ,
          <addr-line>CH-1015 Lausanne</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Rue Micheli-du-Crest</institution>
          ,
          <addr-line>CH-1211 Geneva 4</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Swiss Federal Institute of Technology</institution>
          ,
          <addr-line>LITH</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>This article describes the technologies used for the various runs submitted by the University of Geneva in the context of the 2004 imageCLEF competition. As our expertise is mainly in the eld of medical image retrieval, we will concentrate most of our e ort on the medical image retrieval task. Described are the runs that were submitted by our group including technical details for each of the single runs and a short explication of the obtained results, also compared with the results of submissions from other research groups. We will also describe the problems encountered with respect to optimising the system and especially with respect to nding a balance between weighting the textual and visual features for retrieval. A much better balance seems possible when using some training data for optimisation and with the relevance judgements being available for a control of the respective retrieval quality. The results show that relevance feedback is extremely important for optimal results. Query expansion with visual features only gives minimal changes in result quality. If textual features are added in the automatic query expansion, then the results improve signi cantly. Visual and textual results combined deliver the best results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The goals of imageCLEF are mainly in the eld of cross{language information retrieval. From
our point of view, this is of extremely high importance for a country such as Switzerland with
four o cial languages and equally within the European Union with an even larger variety. CLEF
has been held since 2000 as an independent workshop, always following the European conference
on digital libraries (ECDL). 2003 saw the rst imageCLEF conference [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and all of the systems
that were used to submit runs took into account the textual but not the visual data of the images
supplied. The goal of the 2004 conference was clearly to create an image retrieval task with a
realistic outline description that would need a visual component in addition to the textual multi{
lingual part. The medical image retrieval task is such a realistic task where a medical doctor has
produced one or several image(s) and would like to get evidence for or against a certain diagnosis.
Ground truthing can, for now, not be on a diagnosis basis as the image dataset is not specialised
enough for this. Still, a task was born with a visual query being a starting point as also described
the following document [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Relevant documents were in this case images that show the same
anatomic region, were taken with the same modality, from the same viewing direction and the
same radiologic protocol if applicable (for example, contrast agent or not, T1 vs. T2 weighting
when using the MRI).
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the main ideas for the 2004 task are described. The data for the task were taken from
a medical case database called casImage1 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This database contains almost 9000 images from
more than 2000 medical cases. Images contain annotation in XML format but these annotations
are very rudimentary and are not at all controlled with respect to quality or elds that have to be
lled in. About 10% of the records to not contain any annotation. A majority of the documents
are in French (70%), with around 20% being in English.
      </p>
      <p>Figure 1 shows a few example images from the database. These images are among others the
query topic for the performance evaluation.</p>
      <p>In total 26 images were chosen by a radiologist as query topics. These images were chosen to
well represent the image set with respect to modalities used, anatomic regions represented and
radiologic protocols used. A large number of images in the database was expected to not be
represented by these 26 topics.</p>
      <p>Relevance judgements were done by a total of three judges per query. Images could be judged
as relevant, partially relevant or non relevant. This results in 9 sets of relevance judgements
showing the overlap of all relevant judgements, intersection of relevant judgements, overlap and
intersection of relevant and partially relevant. Due to a signi cant di erence in the relevance
judgements the principal evaluation was performed with a relevance set that was obtained by
including all images that were judge as relevant by at least two of the judges. All data is available
from the imageCLEF web sides.</p>
      <p>In this paper we will mainly discuss the un-interpolated mean average precision of every run
that we submitted as this measure was used for the o cial ranking of systems. Other measures
might change the ranking of systems and might be more appropriate for di erent query tasks.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Basic technologies used</title>
      <p>For our rst participation at imageCLEF, we aim at combining content-based retrieval of images
with cross{language retrieval applied on case reports. Considering that benchmarks are not
available at the time of submission, investigating such a combination is challenging in itself, so that our
study is clearly at a preliminary stage. Once training data is available, systems can be optimised
for certain query tasks.
2.1</p>
      <sec id="sec-2-1">
        <title>Image Retrieval</title>
        <p>
          The technology used for the content{based retrieval of medical images is mainly taken from the
Viper 2 project of the University of Geneva. Much information about this system is available [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
Outcome of the Viper project is the GNU Image Finding Tool, GIFT 3. This software tool is
open source and can consequently also be used by other participants of imageCLEF. A ranked
1http://www.casimage.com/
2http://viper.unige.ch
3http://www.gnu.org/software/gift/
list of visual similarity for every query task was made available for participants and will server
as a baseline for the quality of submissions. Demonstration versions of a gift server were made
available for participants to query visually as well as not everybody can be expected to install an
entire Linux tool for such a benchmark. The feature sets that are used by GIFT are:
Local colour features at di erent scales by partitioning the images successively into four
equally sized regions (four times) and taking the mode colour of each region as a multi{scale
descriptor;
global colour features in the form of a colour histogram, compare by a simple histogram
intersection;
local texture features by partitioning the image and applying Gabor lters in various scales
and directions. Gabor responses are quantised into 10 strength;
global texture features represented as a simple histogram of the responses of the local Gabor
lters in various directions and scales and with various strength.
        </p>
        <p>A particularity of GIFT is that it uses many techniques from text retrieval. Visual features are
quantised and the open a feature space that is very similar to that opened by words in texts. The
distribution of feature frequency corresponds more or less to a Zipf distribution. A simple tf/idf
weighting is used and the query weights are normalised by the results of the query itself. The
histogram features are calculated based on a simple histogram intersection.</p>
        <p>
          The medical version of the GIFT is called medGIFT 4 [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. It is also accessible as open source
and adaptations concern mainly the features used and the interface that shows the diagnosis on
screen and that is linked with a radiologic teaching le so the MD can not only browse images but
also get the textual data and other images of the same case. Grey levels play a more important
role role for medical images and their numbers are raised, especially for relevance feedback queries.
The number and sort of the Gabor lter responses also has an impact on the performance and
these are changed with respect to the number of directions and scales.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Textual case report search</title>
        <p>
          The basic granularity of the casimage collection is the case. A case is usually gathering a textual
report, which describes the case, with appropriate bibliographic references, and a set of images.
Because the original queries are images, only, textual case{based retrieval is used for relevance
feedback. Experiments were conducted with the easyIR engine5. As a single report is able to
contain both French and English written parts, we decided to index the casimage collection using
two di erent indexes: 1) using an English stemmer, 2) using a French stemmer. We used the
Porters stemmer for the English and a modi ed version of Savoy's tool for the French language.
For each index a list of stop words was used: 544 items for English, 792 for French. We also
used a biomedical thesaurus, which has proven some e ectiveness in the context of the TREC
Genomics track [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. For the English, 120'000 string variants were extracted from the UMLS, while
the French thesaurus contains about 6000 entries. Although we tested di erent translations [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and
index combination strategies, our submitted runs were produced using the English index without
speci c translation. Finally, textual relevance feedback was also disabled.
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Combining the two</title>
        <p>As the query is an image only, we had to use some automatic mechanism to expand the query to
text. In our case we used automatic query expansion of the rst and the rst three images. These
images were analysed and the text of the case report was taken as free text for the query. XML
tags of the casimage les were removed and unnecessary elds such as MD name or date of the
entry were removed.</p>
        <p>4http://www.sim.hcuge.ch/medgift/
5http://lithwww.epfl.ch/r~uch/softs/softs.html</p>
        <p>The free{text queries deliver a list of result cases and their similarity score. This similarity
score was normalised by the highest similarity score available as it is already the case for the visual
queries. Afterwards, the similarity score is transferred from the case to all the images that are part
of this case. This includes a high number of visually very dissimilar images that just appear on the
same case. Afterwards, visual and textual results are merged in part. This list might not contain
all the images but at least images that have some similarity in visual and in the textual part will
be ranked highly. Problem is to nd a balance between the visual and the textual component. In
our experience, the visual part needs to be ranked higher than the textual part but the textual
part does improve the nal results signi cantly. Relevance feedback is another tool that improves
the results very strongly.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Runs submitted for evaluation</title>
      <p>This section gives a basic introduction into the techniques and variations used for our runs
submitted to imageCLEF. It also contains the IDs for the runs that were nally submitted and that
are evaluated in this paper.
3.1</p>
      <sec id="sec-3-1">
        <title>Only visual retrieval with one query image</title>
        <p>For the visual queries, the medGIFT system was used. This system allows a fairly easy change of a
few system parameters such as the con guration of the Gabor lters and the grey level and colour
quantisations. Input for these queries were only the query images. No feedback or automatic
query expansion was used. The following system parameters were submitted:
18 hues, 3 saturations, 3 values, 4 grey levels, 4 directions and 3 scales of the Gabor lters, the
GIFT base con guration made available to all participants of imageCLEF; (GE 4g 4d vis)
9 hues, 2 saturations, 2 values, 16 grey levels, 8 directions and 5 scales of the Gabor lters;
(GE 16g 8d vis)
9 hues, 2 saturations, 2 values, 16 grey levels, 4 directions and 5 scales of the Gabor lters.
(GE 16g 4d vis)</p>
        <p>
          It is very hard to actually analyse visually and without ground truth, which of the runs
performed best. The three runs were submitted as a trial and because previous results suggest
that a small number of grey levels performs better, especially within the rst few images retrieved.
Studies show that a larger number of grey levels might be better for feedback queries with a larger
number of input images [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>The imageCLEF results nally show a slightly di erent picture: The best of the visual runs is
the GIFT base system that uses only 4 grey levels, 3 scales and four directions of the Gabor lters
(mean average precision (MAP) 0.3157). Much worse is the system when using a smaller number
of colours but 16 grey levels and ve scales (0.2565). It will have to be tested whether the ve
scales have an in uence on these results. When using ve scales, 16 grey levels and 8 directions
instead of four, the results get better again (0.2649)
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Visual retrieval with automatic query expansion</title>
        <p>This section uses very simple query expansion, feeding back the query image and the 1 or three best
images retrieved in a rst query step. Some manual observations showed that the rst few images
seem to be very similar in most cases. Only a few queries did not turn up visually similar images as
the rst response. Thus, we attempted to feed back the rst retrieved image as feedback with the
initial query image. In a second try we submitted the rst three retrieved images automatically
which might contain more information but has also a higher risk of error. When wrong images are
used in the query expansion, the results have the risk of becoming much worse. The runs that we
submitted are a mixture of these containing one quantisation with one and three images fed back
and another quantisation with only one image fed back. A forth test runs was submitted as well.
The runs submitted where not analysed for their performance thus the selection of the submitted
runs was more on personal intuition.</p>
        <sec id="sec-3-2-1">
          <title>8 directions, 16 grey levels, one image fed back (GE 8d 16g qe1.txt);</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>4 directions, 16 grey levels, one image fed back (GE 4d 16g qe1.txt);</title>
        </sec>
        <sec id="sec-3-2-3">
          <title>4 directions 16 grey levels three images fed back (GE 4d 16g qe3.txt);</title>
          <p>normal gift system with one image being fed back (GE 4d 4g qe1.txt);</p>
          <p>The results show that with automatic query expansion the best results are again obtained with
the standard gift system (MAP 0.3100). This is actually not as good as the results without query
expansion. When using 16 grey levels the results do very slightly improve over the rst query
step when feeding back 1 (MAP 0.2593) and three (MAP 0.2586) query images. The results are
almost unchanged between expansion with one and three images. The system with 8 directions
and 16 grey levels does improve the results stronger than with only four directions (0.2704). This
seems to underline the idea that a small number of grey levels is much better in the rst query
step but with expansion it is better to have more information on the images in form of grey levels
and Gabor lter responses.
3.3</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>Visual retrieval with manual feedback</title>
        <p>This part was performed in a manual way with the same three quantisations as were used in
the one{shot queries. Only di erence is that a user was retrieving the rst 20 images for every
query and performed manual relevance feedback for 1 step, only. We would have liked to have an
evolution over several steps to show how much relevance feedback can do and when a saturation
would be expected, but nally this was not attempted due to a lack of resources to perform the
manual feedback. The person performing the relevance feedback does not have a medical education
and some errors with respect to the feedback might be due to this as wrong images might be fed
back.</p>
        <p>(GE 4d 4g rf );
(GE 4d 16g rf );
(GE 8d 16g rf ).</p>
        <p>The result images from the rst query step were taken to query the system and observe the
rst 20 results for the run. Positive and negative images were marked for feedback to optimise
a system response. A few images were marked as neutral when they were regarded as irrelevant
but visually similar to the correct images or when the feedback person was not sure about the
relevance of the image. It was feared that this could make a relevance feedback query less good
than the initial query.</p>
        <p>The results show that the performance di erence between a small number of grey levels and a
larger number is reduced when using relevance feedback. Still, the GIFT base systems stays the
best one in the test (0.3791). Worst relevance feedback system is the system with 16 grey levels
and four directions (0.3259). Most improved system is the one with 16 grey levels but 8 directions
(0.3380). Relevance feedback shows its enormous potential and important for visual information
retrieval as the results improve signi cantly with the use of feedback. Taking a larger number
of feedback images, an expert feedback person and also several steps of feedback has surely the
potential to further improve results.
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Visual and textual multi-lingual retrieval, automatic run</title>
        <p>This combination run uses the same automatic query expansions that are based on the images
retrieved with the medGIFT system. The rst one or three images that were added for query
expansion were as well used for the textual query. The text from these images was cleaned from
the XML tags of the casimage case notes and unnecessary elds such as dates and the treating
MD were also deleted. ACR codes are equally deleted as they are currently not translated into
their correct textual description which could be an important help for the textual indexing and
retrieval.</p>
        <p>The remaining text was submitted to the easyIR system. We tried out both, a French and
an English version but nally, only used the English version as the results did not seem to be
signi cantly di erent. Making a selection between English and French case notes and thus having
two indices might make a di erence with respect to the results. The results list from easyIR
contains the most similar case notes with respect to the text and a weighting. This weighting was
normalised based on the highest weighting in the list to get values between 0 and 1. Afterwards,
all images in these case notes received the value of the entire case, thus containing visually similar
and very dissimilar images. A total of 200 cases was retrieved which results in a list of 800{1000
results images containing a similarity value.</p>
        <p>The merging of the visual and textual results was done in various ways. As the unit for
retrieval and similarity assessment is the image, the visual similarity plays an important role.
Textual similarity might be better with respect to the semantics of the case but a case contains
relevant and also many irrelevant images that are in the same case but of a di erent modality.
Thus, visual similarity had to be weighted higher than textual similarity, so visually non-similar
images were not weighted higher than visually similar but textually dissimilar ones. We were not
really sure to have correct case in the rst N = 1::3 images so some care might be important to
not expand the query into a completely wrong direction.</p>
        <p>Three runs were submitted using 75% visual and 25% textual retrieval:
4 directions, 16 grey levels, visual/textual with query expansion from one image; (GE 4d 16g vt1)
4 directions, 16 grey levels, visual/textual with query expansion from three images; (GE 4d 16g vt3)
4 directions, 4 grey levels, visual/textual with query expansion from one image; (GE 4d 4g vt1)
Another run was submitted with a ratio of 80% for the visual and 20% for the textual features:
(GE 4d 4g vt2).</p>
        <p>(GE 4d 16g vtx).</p>
        <p>Another idea was based on the fact that most visually important images should be within the
upper part of the visually similar images being retrieved. This means that the goal should be to
augment the value of those in the list of the visually similar that also appear in the list of the
textually similar. For this run we simply multiplied all those images that were within the rst 200
cases retrieved textually and within the rst 1000 images visually by a factor of 1.5. The resulting
series has the tag:</p>
        <p>Evaluation results show that the use of textual information signi cantly improves the retrieval,
even when only using the text of a single image as in the case of 16 grays and four directions
(MAP 0.2935). This is an improvement of 0.035 and thus more than 10% over the visual query
expansion with one image. When executing query expansion with 3 images, the results even
improve much more (MAP 0.3370) and are among the best automatic runs that were submitted
for the competition. This is surprising as the visual query expansion with 3 images was actually
worse than with 1 image and it also only improve results slightly.</p>
        <p>Better results were again obtained when using 4 grey levels. When feeding back one image,
the MAP is 0.3611 and thus better than all other submitted automatic runs. Best results in our
test were obtained when changing the weighting between visual and textual features from 25%
to 20% which delivered a MAP of 0.3749. The selective weighting change for image that were
visually similar and that appeared in the top retrievals by text also delivered very good results
(MAP 0.3612).</p>
        <p>When analysing these results, we think that when feeding back more images with text and
using a 20% weighting we could get even much better results than what we have received so far.
When comparing with the change between one and three qe-images with 16 grey levels, we think
that the improvement can be in the range of a MAP of 0.40.
3.5</p>
      </sec>
      <sec id="sec-3-5">
        <title>Visual and textual multi-lingual retrieval, manual feedback</title>
        <p>As we do currently not have an integrated interface of our visual and textual search engines these
results are based on the manual relevance feedback queries with the visual retrieval results, only.
Based on the documents marked relevant after a rst visual query step, a query was constructed.
For the textual query, only positive documents were taken whereas for the visual part positive
and negative images were taken into account. The text was generated in the same way as before
by adding the case notes without names, dates, XML tags and ACR codes into one large le. If
there were several images of the same case, the text was copied several times. These texts were
submitted to the easyIR system. Again, the resulting list of case results and scores was normalised
to 1 and the expanded from cases to images. The system we used for this runs was the one with
16 grey levels and 4 directions and thus the worst system in a rst visual results as well as the
rst in visual feedback. Still, the textual component alone improves the results signi cantly.</p>
        <p>For the visual query, positive and negative feedback were taken into account. The results were
equally normalised to a range between 0 and 1. For merging the results we used three di erent
ratios between visual and textual characteristics:
25% textual, 75% visual; GE rfvistex1
20% textual, 80% visual; GE rfvistex20
10% textual, 90% visual; GE rfvistex10</p>
        <p>At this point we were at least sure that the text does contain relevant information and not
automatically expanded case texts. Still, it is important to not have a too strong in uence of
the textual features as they are on a case and not an image basis whereas the gold standard
is generated based on an image basis. The gradient of similarity within the textual results list
is much higher than within the visual result list which explains part of the risk of too strongly
weighing the textual features.</p>
        <p>The results show that the relevance feedback results are by far the best results in the entire
competition. Best results are obtained when combining the results by 20% textual and 80%
visual (0.4214). When higher weighing the textual features (25 %) the results drop signi cantly
(0.3824). When lower weighting the textual features, the results drop in performance but only
slightly (0.4189). This suggests that the optimal weighting in our case will be in between the 10
and 20% area. Having the gift base system for this run would also be an interesting option as the
query results seem to be much better in a rst query step.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Further ideas that are currently not explored</title>
      <p>The ACR codes should be translated into text for better indexing and retrieval. They contain
very valuable information and are in several case notes. We currently do not use the ACR codes
that are attached to some of the images at all.</p>
      <p>Image normalisation should be applied to avoid that images which lye in a di erent grey
spectrum are not properly retrieved. Currently, this can be the case quite often as there is no
control on the level/window settings for a medical doctor when inserting images. Images are in
JPEG and so information from the original DICOM images might have got lost.</p>
      <p>Using a gradient of the similarity scores to de ne hon many of the rst N images might be
relevant and could be sent back as automatic query expansion is another promising idea. This
can allows us a more reasonable way to choose images for automatic query expansion. Currently,
the values that we use are fairly conservative as a wrong query expansion can delete the quality
of retrieval completely.</p>
      <p>Work work will also need to be done with respect to quantisations of the feature space.
Currently, a surprisingly small number leads to best results but it will have to be analysed which
queries were responsible for this and which other factors such as directions, scales and
quantisations of Gabor lters might play a vital role.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>We had a lack of manpower to do a proper adaptation and evaluation of the parameters that we
could use. Thus we could not use the software tools up to their perfect performance. Especially the
use of relevance feedback over several steps is expected to lead to a much better performance. The
use of some ground truth data to optimise the system will also for sure lead to much better results.
For further imageCLEF it is expected to have training data accessible before the conference and
a di erent database during the conference. There was also a lack of experience with combining
textual and visual features for retrieval. Many ideas can be performed for this combination to
optimise retrieval results.</p>
      <p>The most important conclusions are surely:
a surprisingly small number of grey levels led to best results in a rst query step;
query expansion for visual retrieval does not change the performance much;
a larger number of grey levels is better for relevance feedback;
textual features improve performance with automatic query expansion as well as with manual
relevance feedback;
relevance feedback improves results enormously and remains a power tool for information
retrieval;
relevance feedback and visual/textual combinations led to the vest overall results in the
competition;
there is still a lot to be tried out!</p>
      <p>This leaves us with several important outcomes and many ideas to prove now that the ground
truth is available.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Clough</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanderson</surname>
          </string-name>
          .
          <article-title>The clef 2003 cross language image retrieval task</article-title>
          .
          <source>In Proceedings of the Cross Language Evaluation Forum (CLEF</source>
          <year>2004</year>
          ),
          <year>2004</year>
          (submitted).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Clough</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanderson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Mu</surname>
          </string-name>
          <article-title>ller. A proposal for the clef cross language image retrieval track (imageclef) 2004. In The Challenge of Image and Video Retrieval (CIVR</article-title>
          <year>2004</year>
          ), Dublin, Ireland,
          <year>July 2004</year>
          . Springer LNCS.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Mu</surname>
          </string-name>
          <article-title>ller,</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Rosset</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Geissbuhler</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Terrier</surname>
          </string-name>
          .
          <article-title>A reference data set for the evaluation of medical image retrieval systems</article-title>
          .
          <source>Computerized Medical Imaging and Graphics</source>
          ,
          <year>2004</year>
          (to appear).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Mu</surname>
          </string-name>
          <article-title>ller, A</article-title>
          . Rosset,
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Vallee</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Geissbuhler</surname>
          </string-name>
          .
          <article-title>Integrating content{based visual access methods into a medical case database</article-title>
          .
          <source>In Proceedings of the Medical Informatics Europe Conference (MIE</source>
          <year>2003</year>
          ), St. Malo, France, May
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Mu</surname>
          </string-name>
          <article-title>ller, A</article-title>
          . Rosset,
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Vallee</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Geissbuhler</surname>
          </string-name>
          .
          <article-title>Comparing feature sets for content{ based medical information retrieval</article-title>
          .
          <source>In Proceedings of the SPIE International Conference on Medical Imaging, SPIE</source>
          Vol.
          <volume>5371</volume>
          , San Diego, CA, USA,
          <year>February 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rosset</surname>
          </string-name>
          , H. Muller, M. Martins,
          <string-name>
            <given-names>N.</given-names>
            <surname>Dfouni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Vallee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>O.</given-names>
            <surname>Ratib</surname>
          </string-name>
          .
          <article-title>Casimage project { a gidital teaching les authoring environment</article-title>
          .
          <source>Journal of Thoracic Imaging</source>
          ,
          <volume>19</volume>
          (
          <issue>2</issue>
          ):1{
          <issue>6</issue>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ruch</surname>
          </string-name>
          .
          <article-title>Query translation by text categorization</article-title>
          .
          <source>In Proceedings of the conference on Computational Linguistics (COLING</source>
          <year>2004</year>
          ), Geneva, Switzerland,
          <year>August 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ruch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chichester</surname>
          </string-name>
          , G. Cohen,
          <string-name>
            <given-names>G.</given-names>
            <surname>Coray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ehrler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ghorbel</surname>
          </string-name>
          , H. Muller, and
          <string-name>
            <given-names>V.</given-names>
            <surname>Pallotta</surname>
          </string-name>
          .
          <article-title>Report on the trec 2003 experiment: Genomic track</article-title>
          .
          <source>In Proceedings of the 2003 Text REtrieval Conference (TREC)</source>
          , Gaithersburg,
          <string-name>
            <surname>MD</surname>
          </string-name>
          , USA,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Squire</surname>
          </string-name>
          , W. Muller, H. Muller, and
          <string-name>
            <given-names>T.</given-names>
            <surname>Pun</surname>
          </string-name>
          .
          <article-title>Content{based query of image databases: inspirations from text retrieval</article-title>
          .
          <source>Pattern Recognition Letters (Selected Papers from The 11th Scandinavian Conference on Image Analysis SCIA '99)</source>
          ,
          <volume>21</volume>
          (
          <fpage>13</fpage>
          -14):
          <volume>1193</volume>
          {
          <fpage>1198</fpage>
          ,
          <year>2000</year>
          .
          <string-name>
            <given-names>B.K.</given-names>
            <surname>Ersboll</surname>
          </string-name>
          , P. Johansen, Eds.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>