<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>University and Hospitals of Geneva at ImageCLEF 2007</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Xin Zhou, Julien Gobeill, Patrick Ruch, Henning Muller University and Hospitals of Geneva</institution>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This article describes the participation of the University and Hospitals of Geneva at three tasks of the 2007 ImageCLEF image retrieval benchmark. Two of these tasks were medical tasks and one was a photographic retrieval task. The visual retrieval techniques relied mainly on the GNU Image Finding Tool (GIFT) whereas multilingual text retrieval was performed by mapping the full text documents and the queries in a variety of languages onto MeSH (Medical Subject Headings) terms, using the EasyIR text retrieval engine for retrieval. For the visual tasks it becomes clear that the baseline GIFT runs do not have the same performance as more sophisticated modern techniques do. GIFT can be seen as a baseline for the visual retrieval as it has been used for the past four years in ImageCLEF. Whereas in 2004 the performance of GIFT was among the best systems it now is towards the end of the spectrum, showing the clear improvement in retrieval quality of participants over the years. Due to time constraints no optimisations could be performed and no relevance feedback was used, usually one of the strong points of GIFT. The text retrieval runs have a fairly good performance showing the e ectiveness of the approach to map terms onto an ontology. Mixed runs are in performance slightly lower than the best text results alone, meaning that more care needs to be taken in combining runs other than a simple linear combination. English is by far the language with the best results; even a mixed run of the three languages was lower in performance. This can partly be explained with the judges as they are all native English speakers. Thus, a bias towards relevance for English documents is unfortunately possible.</p>
      </abstract>
      <kwd-group>
        <kwd>Image retrieval</kwd>
        <kwd>text categorization</kwd>
        <kwd>multimodal retrieval</kwd>
        <kwd>automatic annotation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        ImageCLEF1 [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] has started within CLEF2 (Cross Language Evaluation Forum [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]) in 2003
with the goal to benchmark image retrieval in multilingual document collections. A medical image
retrieval task3 was added in 2004 to explore domain{speci c multilingual information retrieval and
also multi modal retrieval by combining visual and textual features for retrieval. Since 2005, a
medical retrieval and a medical image annotation task are both presented as part of ImageCLEF
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        More about the ImageCLEF tasks, topics, and results in 2007 can also be read in [
        <xref ref-type="bibr" rid="ref3 ref6 ref7">3, 6, 7</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Retrieval Strategies</title>
      <p>This section describes the basic technologies that are used for the data retrieval. More details on
optimisations per tasks are given in the results section.
2.1</p>
      <sec id="sec-2-1">
        <title>Text retrieval approach</title>
        <p>
          The text retrieval approach used in 2007 is similar to the techniques already applied in 2006 [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
The full text of the documents in the collection and of the queries were mapped to a x number
of MeSH terms, and retrieval was then performed in the MeSH{term space. Based on the results
of 2006, when 3, 5, and 8 terms were extracted we increased the number of terms further. It was
shown in 2006 that a larger number of terms lead to better results, although several of the terms
might be incorrect, these incorrect terms create less damage than the few additionally correct
terms add in quality. Thus 15 terms were generated for each document in 2007 and 3 terms from
every query, separated by language. Term generation is based on the MeSH categorizer [
          <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
          ]
developed in Geneva. As MeSH exists in English, German, and French, multilingual treatment of
the entire collection is thus possible. For ease of computation an English stemmer was used on
the collection and all XML tags in the documents were removed, basically removing all structure
of the documents. The entire text collection was indexed with the easyIR toolkit [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] using a
pivoted{normalization weighting schema. Schema tuning was discarded due to the lack of time.
        </p>
        <p>Queries were executed in each of the three languages separately and one run combined the
results of the three languages.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Visual retrieval techniques</title>
        <p>
          The technology used for the visual retrieval of images is mainly taken from the Viper 4 project [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
Outcome of the Viper project is the GNU Image Finding Tool, GIFT 5. This tool is open source
and can be used by other participants of ImageCLEF. A ranked list of visually similar images
for every query topic was made available for participants and serves as a baseline to measure the
quality of submissions. Feature sets used by GIFT are:
        </p>
        <p>Local color features at di erent scales by partitioning the images successively into four
equally sized regions (four times) and taking the mode color of each region as a descriptor;
global color features in the form of a color histogram, compared by a simple histogram
intersection;
local texture features by partitioning the image and applying Gabor lters in various scales
and directions, quantised into 10 strengths;
1http://ir.shef.ac.uk/imageclef/
2http://www.clef-campaign.org/
3http://ir.ohsu.edu/image/
4http://viper.unige.ch/
5http://www.gnu.org/software/gift/
global texture features represented as a simple histogram of responses of the local Gabor
lters in various directions and scales.</p>
        <p>
          A particularity of GIFT is that it uses many techniques well{known from text retrieval. Visual
features are quantised and the feature space is similar to the distribution of words in texts. A
simple tf/idf weighting is used and the query weights are normalised by the results of the query
itself. The histogram features are compared based on a histogram intersection [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>This section details the results obtained for the various tasks. It always compares our results to the
best results in the competition to underline the fact that our results are a baseline for comparison
of techniques.
3.1</p>
      <sec id="sec-3-1">
        <title>Photographic Image Retrieval</title>
        <p>The two runs submitted for the photographic retrieval task do not contain any optimisations and
are a simple baseline using the GIFT system to compare performance of other techniques and
their improvement over the years. Only visual retrieval was attempted and no text was used. The
two runs are fully automatic.</p>
        <p>Table 1 shows the results of the two submitted runs with gift compared to best overall visual
run submitted. MAP is much lower than the best run, almost by a factor of ten, whereas early
precision is about a factor of ve lower. The best run uses the standard GIFT system whereas
the second run uses a smaller number of colors (9 hues instead of 18) and a smaller number of
saturations as well. The results with these changes are slightly lower but the number of relevant
images found is signi cantly higher, meaning that more fuzziness in the feature space is better for
nding relevant images but less good concerning early precision.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Medical Image Retrieval</title>
        <p>This section describes the three categories of runs that were submitted for the medical retrieval
task. All runs were automatic and so the results are classi ed by media.
3.2.1</p>
        <sec id="sec-3-2-1">
          <title>Visual Retrieval</title>
          <p>The purely visual retrieval was performed with the standard GIFT system using 4 grey levels and
with a modi ed gift using 8 grey levels. A third run was created by a linear combination the two
previous runs.</p>
          <p>Figure 2 shows the results of the best overall visual run and all of our runs. It is actually
interesting to see that all but three visual runs have very low performance in 2007. These three
runs used training data on almost the same collection of the years 2005 and 2006 to select and
weight features, leading to an extreme increase in retrieval performance. Our runs are on the lower
end of the spectrum concerning MAP. Early precision becomes much better in the combination
runs using a combination of two grey level quantisations.
Textual retrieval was performed using each of the query languages separately and in one combined
run.</p>
          <p>Results can be seen in Table 3. The results show clearly that English obtains the best
performance among the three languages. This can be explained as the majority of the documents are in
English and the majority of relevance judges are also native English speakers For most of the best
performing runs it is not clear whether they use a single language or a mix of languages, which is
not really a realistic scenario for multilingual retrieval. Both, German and French retrieval have a
lower performance than English and the run linearly combining the three languages is also lower
in performance than English alone.
3.2.3</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Mixed{media retrieval</title>
          <p>There were two di erent sorts of mixed media runs in 2007 from the University and Hospitals
of Geneva. One was a combination of our own visual and textual runs and the other was a
combination of the GIFT results with results from the FIRE (Flexible Image Retrieval Engine)
system and a system from OHSU (Oregon Health and Science University).</p>
          <p>
            The combinations of our visual with our own English retrieval run are all better in quality
than the combinations with the FIRE and OHSU runs. Combinations are all simple, linear
comrun ID
best mixed run
GE VT1 4
GE VT1 8
GE VT5 4
GE VT5 8
GE VT10 4
GE VT10 8
3gift-3 re-4ohsu
5gift-5ohsu
7gift-3ohsu
binations with a percentage of 10%, 50% and 90% of the visual runs. It shows that the smallest
proportion of visual in uence delivers the best results, although not as high as the purely textual
run alone. Di erences between the two grey level quantisations (8 and 4) are extremely small.
All combinations runs between systems at OHSU and the FIRE system did not work very well,
having a very low performance.
For medical image classi cation the basic GIFT system was used as a baseline for classi cation.
It shows as already in [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] that the features are not too well suited for image classi cation as they
do not include any invariance and are on a very low level. Performance as shown in Table 5 is low
compared to the best systems.
          </p>
          <p>The strategy was to perform the classi cation in an image retrieval way. No training phase was
carried out. Visually similar images with known classes are used to classify images from the test
set. In practice, the rst 10 retrieved images of every image of th test set were taken into account,
and the scores of these images were used to choose the IRMA code on all hierarchy levels. When
the sum of the scores for a certain code reaches a xed threshold, an agreement can be assumed
for this level. This allows the classi cation to be performed up to this level. Otherwise, this level
and all further levels were not classi ed and left empty.</p>
          <p>Thresholds and score distribution strategies varied slightly. Three score distribution strategies
were used:</p>
          <p>Every retrieved image votes equally. A code at a certain level will be chosen only if more
than half of the results are in agreement.</p>
          <p>Retrieved images vote with decreasing importance values (from 10 to 1) according to the
rank. A code at a certain level will be chosen if more than 66% of hte maximum were reached
for one code.</p>
          <p>The retrieved images vote with their absolute similarity value. A code at a certain level will
be chosen if the average of the similarity score for this code is higher than 0.15.
The performance varies slightly depending on the chosen strategies. Results in Table 5 show that
the easiest method gives the best result. It can be concluded that a high similarity score is not a
signi cant parameter to classify images.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion</title>
      <p>The results show clearly that visual retrieval with the GIFT is not state of the art anymore and
that more speci c techniques can receive much better retrieval results. Still, the GIFT runs serve
as a baseline as they can be reproduced easily as the software is open source and they have been
used in ImageCLEF since 2004, which clearly shows the improvement of techniques participating
in ImageCLEF since this time.</p>
      <p>The text retrieval approach shows that the extraction of MeSH terms from documents and
queries and then performing retrieval based on these terms is working well. Bias is towards the
English terms with a majority of documents being in English and also the relevance judges being
all native speakers.</p>
      <p>Combining visual and textual retrieval remains di cult and in our case no result is as good as
the English text results alone. Much potential still seems to be in this combination of media.</p>
      <p>For the classi cation of images our extremely easy was mainly hindered by the simple base
features that were used.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>This study was partially supported by the Swiss National Science Foundation (Grants 3200{
065228 and 205321{109304/1) and the European Union (SemanticMining Network of Excellence,
INFS{CT{2004{507505) via OFES Grant (No 03.0399).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Paul</given-names>
            <surname>Clough</surname>
          </string-name>
          , Henning Muller, and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Sanderson</surname>
          </string-name>
          .
          <article-title>The CLEF 2004 cross language image retrieval track</article-title>
          . In C. Peters,
          <string-name>
            <given-names>P.</given-names>
            <surname>Clough</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          , G. Jones,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kluck</surname>
          </string-name>
          , and B. Magnini, editors,
          <source>Multilingual Information Access for Text</source>
          ,
          <article-title>Speech and Images: Results of the Fifth CLEF Evaluation Campaign</article-title>
          , pages
          <volume>597</volume>
          {
          <fpage>613</fpage>
          . Lecture Notes in Computer Science (LNCS), Springer, Volume
          <volume>3491</volume>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Paul</given-names>
            <surname>Clough</surname>
          </string-name>
          , Henning Muller, and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Sanderson</surname>
          </string-name>
          .
          <article-title>The CLEF cross{language image retrieval track (ImageCLEF) 2004</article-title>
          . In Carol Peters, Paul Clough, Julio Gonzalo, Michael Jones,
          <string-name>
            <given-names>Gareth J. F.</given-names>
            and
            <surname>Kluck</surname>
          </string-name>
          , and Bernardo Magnini, editors,
          <source>Multilingual Information Access for Text</source>
          ,
          <article-title>Speech and Images: Result of the fth CLEF evaluation campaign</article-title>
          , volume
          <volume>3491</volume>
          of Lecture Notes in Computer Science (LNCS), pages
          <fpage>597</fpage>
          {
          <fpage>613</fpage>
          ,
          <string-name>
            <surname>Bath</surname>
          </string-name>
          , UK,
          <year>2005</year>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Deselaers</surname>
          </string-name>
          , Allan Hanbury, and et al.
          <article-title>Overview of the ImageCLEF 2007 object retrieval task</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Tobias</given-names>
            <surname>Gass</surname>
          </string-name>
          , Antoine Geissbuhler, and
          <article-title>Henning Muller. Learning a frequency{based weighting for medical image classi cation</article-title>
          .
          <source>In Medical Imaging and Medical Informatics (MIMI)</source>
          <year>2007</year>
          , Beijing, China,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Julien</given-names>
            <surname>Gobeill</surname>
          </string-name>
          , Henning Muller, and Patrick Ruch.
          <article-title>Translation by text categorization: Medical image retrieval in ImageCLEFmed 2006</article-title>
          .
          <source>In CLEF 2006 Proceedings</source>
          , volume
          <volume>4730</volume>
          of Springer Lecture Notes in Computer Science, Alicante, Spain,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Grubinger</surname>
          </string-name>
          , Paul Clough, Allan Hanbury, and
          <article-title>Henning Muller. Overview of the ImageCLEF 2007 photographic retrieval task</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Henning</given-names>
            <surname>Mu</surname>
          </string-name>
          ller, Thomas Deselaers, Eugene Kim, Jayashree Kalpathy-Cramer,
          <string-name>
            <given-names>Thomas M.</given-names>
            <surname>Deserno</surname>
          </string-name>
          , Paul Clough, and
          <string-name>
            <given-names>William</given-names>
            <surname>Hersh</surname>
          </string-name>
          .
          <article-title>Overview of the ImageCLEFmed 2007 medical retrieval and annotation tasks</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Henning</given-names>
            <surname>Mu</surname>
          </string-name>
          ller, Thomas Deselaers, Thomas Lehmann, Paul Clough, and
          <string-name>
            <given-names>William</given-names>
            <surname>Hersh</surname>
          </string-name>
          .
          <article-title>Overview of the ImageCLEFmed 2006 medical retrieval and annotation tasks</article-title>
          .
          <source>In CLEF working notes</source>
          , Alicante, Spain, Sep.
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Ruch</surname>
          </string-name>
          .
          <article-title>Automatic assignment of biomedical categories: toward a generic approach</article-title>
          .
          <source>Bioinformatics</source>
          ,
          <volume>22</volume>
          (
          <issue>6</issue>
          ):
          <volume>658</volume>
          {
          <fpage>664</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Patrick</surname>
            <given-names>Ruch</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robert H. Baud</surname>
          </string-name>
          , and
          <article-title>Antoine Geissbuhler. Learning{free text categorization</article-title>
          . In Michel Dojat, Elpida T. Keravnou, and Pedro Barahona, editors,
          <source>AIME</source>
          , volume
          <volume>2780</volume>
          of Lecture Notes in Computer Science, pages
          <volume>199</volume>
          {
          <fpage>208</fpage>
          . Springer,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Ruch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jimeno</surname>
          </string-name>
          <string-name>
            <surname>Yepes</surname>
          </string-name>
          , Frdric Ehrler, Julien Gobeill, and
          <string-name>
            <given-names>Imad</given-names>
            <surname>Tbahriti</surname>
          </string-name>
          .
          <article-title>Report on the trec 2006 experiment: Genomics track</article-title>
          .
          <source>In TREC</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Jacques</given-names>
            <surname>Savoy</surname>
          </string-name>
          .
          <source>Report on CLEF{2001 experiments. In Report on the CLEF Conference 2001 (Cross Language Evaluation Forum)</source>
          , pages
          <fpage>27</fpage>
          {
          <fpage>43</fpage>
          ,
          <string-name>
            <surname>Darmstadt</surname>
          </string-name>
          , Germany,
          <year>2002</year>
          . Springer LNCS 2406.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>David</given-names>
            <surname>McG. Squire</surname>
          </string-name>
          , Wolfgang Muller, Henning Muller, and Thierry Pun.
          <article-title>Content{based query of image databases: inspirations from text retrieval</article-title>
          .
          <source>Pattern Recognition Letters (Selected Papers from The 11th Scandinavian Conference on Image Analysis SCIA '99)</source>
          ,
          <volume>21</volume>
          (
          <fpage>13</fpage>
          - 14):
          <volume>1193</volume>
          {
          <fpage>1198</fpage>
          ,
          <year>2000</year>
          .
          <string-name>
            <given-names>B.K.</given-names>
            <surname>Ersboll</surname>
          </string-name>
          , P. Johansen, Eds.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Michael</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Swain</surname>
          </string-name>
          and
          <string-name>
            <surname>Dana H. Ballard</surname>
          </string-name>
          . Color indexing.
          <source>International Journal of Computer Vision</source>
          ,
          <volume>7</volume>
          (
          <issue>1</issue>
          ):
          <volume>11</volume>
          {
          <fpage>32</fpage>
          ,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>