<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Merging results from di erent media: Lic2m experiments at ImageCLEF 2005</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Romaric Besancon</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christophe Millet CEA-LIST/LIC</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>milletc@zoe.cea.fr</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>General Terms</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Linguistic Processing, Cross-lingual Text Retrieval, Content Based Image Retrieval</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Measurement</institution>
          ,
          <addr-line>Performance, Experimentation</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the ImageCLEF 2005 campaign, the LIC2M participated in the ad hoc task, the medical task and the annotation task. For both ad hoc and medical task, we perform experiments on merging the results of two independent search systems: a crosslanguage information retrieval system exploiting the text part of the query and a content-based image retrieval system exploiting the example images given with the query. The results show that a well-tuned merging may improve performance, but the tuning is made di cult because the performance of each system highly depends on the corpus and queries. Annotation task has been performed using a KNN classi er with the image indexes of our CBIR system.</p>
      </abstract>
      <kwd-group>
        <kwd>H</kwd>
        <kwd>3 [Information Storage and Retrieval]</kwd>
        <kwd>H</kwd>
        <kwd>3</kwd>
        <kwd>1 Content Analysis and Indexing</kwd>
        <kwd>H</kwd>
        <kwd>3</kwd>
        <kwd>3 Information Search and Retrieval</kwd>
        <kwd>H</kwd>
        <kwd>3</kwd>
        <kwd>4 Systems and Software</kwd>
        <kwd>H</kwd>
        <kwd>3</kwd>
        <kwd>7 Digital Libraries</kwd>
        <kwd>I</kwd>
        <kwd>4</kwd>
        <kwd>7 [Computing Methodologies]</kwd>
        <kwd>Image Processing and Computer Vision|Feature Measurement</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>ImageCLEF campaign aims at studying cross-language image retrieval, that potentially uses text
and image matching techniques. The LIC2M participated in ImageCLEF 2005 to perform
experiments on merging strategies to integrate the results obtained from the cross-language text retrieval
system and the content-based image retrieval system that are developed in our lab.</p>
      <p>In both ad hoc and medical tasks of the ImageCLEF 2005 campaign, text and visual
information were provided for the queries. In ad hoc task, the basic query is textual (title and narrative
are provided), but two example images are provided; in medical task, query images are given and
a short textual description give precisions about the research goal. We applied the same strategy
for the two tasks, using our general-domain systems for multilingual text retrieval and
contentbased image retrieval, taking into account both text and visual part of the query and applying a
posteriori merging strategies on the results provided independently by each system.</p>
      <p>We present in section 2 the retrieval systems for text and image and the merging strategies
used. We then present the results obtained for the ad hoc task and the medical task in sections 3
and 4 respectively. The strategy and results for the annotation task are presented in section 5.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Retrieval systems</title>
      <sec id="sec-2-1">
        <title>Multilingual Text Retrieval System</title>
        <p>
          The multilingual text retrieval system used for these experiments is basically the same as the one
used for the previous CLEF campaigns, and a more detailed description can be found in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The
system has not been specially adapted to work on the text of the ImageCLEF corpora, and has
simply been used as is. In particular, for both the ad hoc and medical corpora, no special treatment
has been performed to take into account the structure of the documents (such as photographer's
name, location, date for the captions and description, diagnosis, clinical presentation in the medical
annotations): all elds containing some text have been taken as is. No adaptation has been made
to take into account the speci cities of medical texts (specialized vocabulary). Notice that this
system is not only cross-lingual but multilingual, because it integrates a concept-based merging
technique to merge results found in each target language. Its basic principle is brie y described
here.
        </p>
        <p>Document and query processing The documents and queries are processed through a
linguistic analyzer, that performs in particular a part-of-speech tagging, a lemmatization, and extracts
compounds and named entities from the text. The elements extracted from the documents are
indexed into inverted les. The elements extracted from the queries are used as query \concepts".
Each concept is reformulated into a set of search terms for each target language, either using
a monolingual expansion dictionary (that introduces synonyms and related words), or using a
bilingual dictionary.</p>
        <p>Document Retrieval Each search term is searched in the index, and documents containing the
term are retrieved. All retrieved documents are then associated with a concept pro le, indicating
the presence of query concepts in the document. This concept pro le depends on the query
concepts, and is language-independent (which allow merging results from di erent languages).
Documents sharing the same concept pro le are clustered together, and a weight is associated
with each cluster according to its concept pro le and to the weight of the concepts (the weight
of a concept depends on the weight of each of its reformulated term in the retrieved documents).
The clusters are sorted according to their weights and the rst 1000 documents in this sorted list
are retrieved.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Content-based Image Retrieval System</title>
        <p>
          The content-based image retrieval system we used in ImageCLEF 2005 is the system PIRIA
(Program for the Indexing and Research of Images by A nity)[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], developed in our lab. The
query image is submitted to the system, which returns a list of images ranked by their similarity
to the query image. The similarity is obtained by a metric distance that operates on every image
signatures. These indexed images are compared according to several classi ers : principally Color,
Texture and Form if the segmentation of the images is relevant. The system takes into account
geometric transformations and variations like rotation, symmetry, mirroring, etc. PIRIA is a
global one-pass system, feedback or \relevant/non relevant" learning methods are not used.
Color Indexing This indexer rst quanti es the image, and then, for each quanti ed color,
it computes how much this color is connex. It can also be described as a border/interior pixel
classi cation [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. The distance used for the color indexing is a classical L2 norm.
Texture Indexing A global texture histogram is used for the texture analysis. The histogram
is computed from the Local Edge Pattern descriptors [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. These descriptors describe the local
structure according to the edge image computed with a Sobel ltering. We obtain a 512-bins
texture histogram, which is associated with a 64-bins color histogram where each plane of the
RGB color space is quantized into 4 colors. Distances are computed with a L1 norm.
Form Indexing The form indexer used consists of a projection of the edge image along its
horizontal and vertical axes. The image is rst resized in 100x100. Then, the Sobel edge image
is computed and divided into four equal sized squares (up left, up right, bottom left and bottom
right). Then, each 50x50 part is projected along its vertical and horizontal axes, thus giving a
400-bins histogram. The L2 distance is used to compare two histograms.
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Search and Merging Strategy</title>
        <p>For both ad hoc and medical task, the queries contain textual and visual information. Textual
information is used to search relevant text documents with multilingual text retrieval system. For
ad hoc task, each text document corresponds to a single image: the images corresponding to the
relevant texts are then given as results. For the medical task, a text document may be associated
with several images. In that case, the score obtained by the text documents is given to each image
it is associated with: the rst 1000 images in this image list are kept.</p>
        <p>Since the PathoPic corpus of the medical task contains annotations in English and German
that are associated with the same image, the multilingual retrieval system may return both English
and German annotations as relevant documents (maybe with di erent scores), creating duplicate
elements in the result list. In this case, the score associated with the corresponding image is the
best score returned. To make sure that the number of retrieved images is 1000, we set the number
of retrieved documents for the text retrieval system at 2000 for the medical task1.</p>
        <p>Independently, visual information was used by the CBIR system to retrieve similar images.
Queries contain several images: a rst merging has been performed to obtain a single image list
from the results of each query image: the score associated to result images is set to the max of
the scores obtained for each query image.</p>
        <p>Merging the results obtained by each system is simply done by a weighted sum of the scores
obtained by each system. To be comparable, the scores of each system are normalized, for each
query, by the highest score obtained for the query. This merging is parameterized by a merging
coe cient : for a query q and an image document retrieved for this query d 2 Ret(q), the merging
score is
s(d) =
+ (1</p>
        <p>)
sT (d)
d2RetT (q) sT (d)
max</p>
        <p>sI (d)
max sI (d)
d2RetI (q)
where sT (d) is the score of the text retrieval system and sI (d) the score of the image retrieval
system.</p>
        <p>A conservative merging strategy has also been tested: by conservative, we mean that we use
the results obtained by one system only to reorder the results obtained by the other (results can be
added at end of list if the number of documents retrieved by main system is less than 1000). The
score of a document is modi ed using the same merging coe cient. For example, if the merging
is conservative with the text results:
s0(d) =
(s(d)
0
if sT (d) 6= 0
otherwise
The results we obtained in ImageCLEF 2004 tend to show that this kind of conservative merging
strategies gives good performances. We will use the term of expansionist merging strategy to
denote standard merging strategy, as opposed to the conservative one.</p>
        <p>1this duplication of results was not detected before the submission of the runs, but the technique we used for
merging text and image results remove the duplicate documents.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results for the Ad hoc task</title>
      <p>In the ad hoc task, we used textual queries in English, French and Spanish. We tried using the
title only (T) or the title and the narrative (T+N). Comparative results for textual retrieval only,
using either T or T+N are given in Table 1. These results show that average precision is better
when using the title only, but the number of relevant documents is generally better when using
also the narrative part (except for French, for which it is a bit worse). This can be explained by
the fact that narrative introduce more words that allow to increase the total number of documents
retrieved (for English and Spanish, there are 6 queries for which the system does not nd 1000
documents matching title only, only 3 for French), and the number of relevant documents. But
narrative also introduces more noise, which makes the precision decrease.</p>
      <p>map
relret
r1000</p>
      <p>We present in Table 2 the results obtained by the merging of the two systems, using the
texture indexer for the CBIR system. The results are presented for conservative and expansionist
strategies and for di erent values of the merging coe cient (when = 1, the search is only based
on text, when = 0, the search is only based on images). Values below 0.5 are not presented but
does not give better results. For expansionist strategy, the results are given for the mean average
precision (map) and the number of relevant documents retrieved (relret ); for conservative strategy,
only the map is presented (relret is constant).</p>
      <p>These results show that this simple merging of text and image results based on a weighted sum
of the scores can increase the mean average precision (gain of 17 or 18%) and the best value for
is around 0.7 (though di erences with surrounding values are small).</p>
      <p>Concerning conservative/expansionist strategies, our previous experiments in ImageCLEF
showed that the StAndrews collection, composed of old photographs, is not well adapted the kind of
image indexers we use, that rely mostly on color for segmentation. We therefore chose the text
retrieval as base for conservative merging. Looking at the relevant documents retrieved prove
us right: text retrieval allow to retrieve 1246 relevant documents, whereas image retrieval only
retrieve 367 relevant documents (239 of which were also found by the text retrieval system).
However, the two merging strategies give comparable results, even though, as one can expect, the
performance of the expansionist strategy decreases faster with .</p>
      <p>Similar results are presented in Table 3 using the color indexer for the CBIR system. Results
are comparable: for this corpus, the two image indexers tend to retrieve similar documents (2/3
of relevant documents retrieved by both systems are identical).</p>
    </sec>
    <sec id="sec-4">
      <title>Results for the Medical task</title>
      <p>In the medical task, we tested text retrieval using queries in English, French and German (searching
for each in all target languages).</p>
      <p>Based on our experiments in ImageCLEF 2004, we assumed that image retrieval for the medical
task gives good results. Submitted runs for the medical task in the ImageCLEF 2005 campaign
include runs based on visual queries only (texture and color indexers), and for English, French and
German, a conservative merging of image results based on the texture indexer and text results,
with = 0:9. Unfortunately, the use of texture or color indexer with the ImageCLEFmed 2005
visual queries gave poor results, and conservative merging based on these results does not give
much better results2.</p>
      <p>We present in Table 4 the results obtained by the merging of text and image systems, using
the texture indexer for the CBIR system, with di erent values of the merging coe cient , and
for conservative and expansionist merging strategies (conservative strategy based on text results).</p>
      <p>Except for German (for which our linguistic processing is clearly not well adapted to medical
text), the conservative merging strategy improves performances (the best merging coe cient seems
to be around 0.5). Expansionist merging gives comparable results: improvement of mean average
precision is less important, but the number of relevant documents retrieved is generally improved,
which tends to prove that both systems retrieve di erent documents3: conservative merging
improves the ordering of documents retrieved by one system whereas expansionist merging improves
the number of documents retrieved.</p>
      <p>We present in Table 5 similar results using the color indexer for visual retrieval. Results are
slightly worse, but the same kind of tendencies as for the texture indexer can be noticed.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Annotation task</title>
      <p>For the automatic annotation task, we submitted three runs, each corresponding to one of the
three indexers described in section 2 (Color, Texture and Form).</p>
      <p>All images are rst indexed with the chosen indexer. Then, a k-Nearest Neighbor classi er is
used to classify the indexed images. Odd numbers from 3 to 13 have been tested for k for each
2Furthermore, we detected a bug in submitted runs, concerning the document identi er matching (1 vs.
0000001 ) that made the Peir corpus documents ignored in text retrieval results.</p>
      <p>3We veri ed that text results with English queries contain 999 relevant images, image results with texture indexer
contain 822 relevant images and only 218 images were common to the two systems.
indexer, and evaluated with the leave-one-out method. The best k were 3 for the form indexer
and 9 for the color indexer and the texture indexer.</p>
      <p>The attributed class is decided by a majority vote of the nearest neighbors. In case of ties,
distances to nearest neighbors are used (for example, in 9-NN, if 4 neighbors are from a class A,
4 neighbors from a class B, and 1 from another class, we use the distances between the requested
image and its neighbors to select the nearest class).</p>
      <p>We present in Table 6 the results obtained for each of the indexers. It is not a surprise that
the form indexer performed better than the others, as all the images in the database were in grey
levels, and the form indexer is designed for such images, whereas the color and texture indexers
are not well adapted to it (remember that the texture indexer includes a 64-bins color histogram).
error rate</p>
      <sec id="sec-5-1">
        <title>9-NN Color 9-NN Texture</title>
        <p>46.0 % 42.5 %</p>
      </sec>
      <sec id="sec-5-2">
        <title>3-NN Form</title>
        <p>36.9 %
The experiments performed by the LIC2M in the ImageCLEF 2005 campaign show that merging
results from di erent media may increase the performance of a search system: a well-tuned a
posteriori merging of the results obtained by two general purpose systems (no particular adaptation
of the systems was made for the two tasks) can improve the mean average precision by at least
15%.</p>
        <p>The di culty relies on the tuning of the merging strategy. We used a simple weighted sum
of the scores given by each system but the importance given to each system should rely on the
performance of the system on a particular corpus, that is not easily predicted (best strategy for the
ImageCLEF 2004 medical task appears to be opposite to the best strategy for ImageCLEF 2005
medical task, that has a more varied corpus and more di cult visual queries).</p>
        <p>Further experiments will be undertaken to try to make the systems give a con dence score
associated with its results and adapt the merging strategy according to this con dence. Other
more sophisticated merging strategies will also be considered.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Romaric</given-names>
            <surname>Besancon</surname>
          </string-name>
          , Gael de Chalendar, Olivier Ferret, Christian Fluhr, Olivier Mesnard, and
          <string-name>
            <given-names>Hubert</given-names>
            <surname>Naets</surname>
          </string-name>
          .
          <article-title>Concept-based searching and merging for multilingual information retrieval: First experiments at clef 2003</article-title>
          . In Carol Peters, Julio Gonzalo,
          <string-name>
            <given-names>Martin</given-names>
            <surname>Braschler</surname>
          </string-name>
          , and Michael Kluck, editors,
          <source>Comparative Evaluation of Multilingual Information Access Systems</source>
          , pages
          <fpage>174</fpage>
          {
          <fpage>184</fpage>
          . Springer,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Ya-Chun</surname>
            <given-names>Cheng</given-names>
          </string-name>
          and
          <string-name>
            <surname>Shu-Yuan Chen</surname>
          </string-name>
          .
          <article-title>Image classi cation using color, texture and regions</article-title>
          .
          <source>Image and Vision Computing</source>
          ,
          <volume>21</volume>
          (
          <issue>9</issue>
          ),
          <year>September 2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Magali</given-names>
            <surname>Joint</surname>
          </string-name>
          ,
          <article-title>Pierre-Alain Moellic, Patrick Hede, and Pascal Adam. PIRIA : A general tool for indexing, search and retrieval of multimedia content</article-title>
          .
          <source>In SPIE Electroning Imaging</source>
          <year>2004</year>
          , San Jose, California USA,
          <year>January 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Renato. O.</given-names>
            <surname>Stehling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Mario. A.</given-names>
            <surname>Nascimento</surname>
          </string-name>
          , and
          <string-name>
            <surname>Alexandre</surname>
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Falca</surname>
          </string-name>
          <article-title>~o. A compact and e cient image retrieval approach based on border/interior pixel classi cation</article-title>
          .
          <source>In CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management</source>
          ,
          <source>McLean</source>
          , Virginia, USA,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>