<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Text and Image Retrieval Systems: Lic2m experiments at ImageCLEF 2006</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Romaric Besanc¸on</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christophe Millet</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>milletc@zoe.cea.fr</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>General Terms</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Linguistic Processing, Cross-lingual Text Retrieval, Content Based Image Retrieval</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Measurement</institution>
          ,
          <addr-line>Performance, Experimentation</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1835</year>
      </pub-date>
      <abstract>
        <p>In the ImageCLEF 2006 campaign, the LIC2M participated in the imageclefphoto ad hoc task. We perform experiments on merging the results of two independent search systems: a cross-language information retrieval system exploiting the text part of the query and a content-based image retrieval system exploiting the example images given with the query. The merging is performed a posteriori using a weighted sum of the scores given by each system. This kind of merging can improve the results, but gain in the submitted runs remains quite small, comparatively to our experiments of last campaigns. This is due to the relatively poor results of the CBIR part. A first analysis give some hints on possible improvements: example images are often chosen to be visually different to show several aspects of possible relevant images for the chosen topic, therefore the merge of CBIR results for each example image can be irrelevant. However, using only one example image (provided we can choose the best one), or using CBIR results for each example image, but only in correspondence with text results can improve the results.</p>
      </abstract>
      <kwd-group>
        <kwd>H</kwd>
        <kwd>3 [Information Storage and Retrieval]</kwd>
        <kwd>H</kwd>
        <kwd>3</kwd>
        <kwd>1 Content Analysis and Indexing</kwd>
        <kwd>H</kwd>
        <kwd>3</kwd>
        <kwd>3 Information Search and Retrieval</kwd>
        <kwd>H</kwd>
        <kwd>3</kwd>
        <kwd>4 Systems and Software</kwd>
        <kwd>H</kwd>
        <kwd>3</kwd>
        <kwd>7 Digital Libraries</kwd>
        <kwd>I</kwd>
        <kwd>4</kwd>
        <kwd>7 [Computing Methodologies]</kwd>
        <kwd>Image Processing and Computer Vision</kwd>
        <kwd>Feature Measurement</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The CEA-LIST/LIC2M laboratory participated in ImageCLEF 2006 to perform experiments on
merging strategies to integrate the results obtained from the cross-language text retrieval system
and the content-based image retrieval (CBIR) system that are developed in our lab, using a simple
merging strategy similar to the one used in previous ImageCLEF campaigns [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>We use text and visual information from the queries: the title provided for the text retrieval
system, and the example images for the CBIR system. Both systems are general-domain systems
and are used independently on each part of the query. Then, a posteriori merging strategies are
applied on the results provided by each system.</p>
      <p>We present in section 2 the retrieval systems for text and image and the merging strategies
used. We present the results obtained in section 3.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Retrieval systems</title>
      <p>
        Both text retrieval system and CBIR systems are the same that were used in previous ImageCLEF
campaigns [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The basic principles of the systems are presented here.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Multilingual Text Retrieval System</title>
        <p>The multilingual text retrieval system has not been specially adapted to work on the text of the
ImageCLEF corpora, and has simply been used as is: no special treatment has been performed
to take into account the structure of the documents (such as title, description, location,date): all
fields containing some text have been taken as is. The system works as follows:
Document and query processing The documents and queries are processed through a
linguistic analyzer, that extracts relevant linguistic elements such as lemmas, named entities and
compounds. The elements extracted from the documents are indexed into inverted files. The
elements extracted from the queries are used as query “concepts”. Each concept is reformulated
into a set of search terms for each target language (in the case of imageClefPhoto, only one target
language was used) either using a monolingual expansion dictionary (that introduces synonyms
and related words), or using a bilingual dictionary.</p>
        <p>Document Retrieval Each search term is searched in the index, and documents containing the
term are retrieved. All retrieved documents are then associated with a concept profile, indicating
the presence of query concepts in the document. This concept profile depends on the query
concepts, and is language-independent (which allow merging results from different languages).
Documents sharing the same concept profile are clustered together, and a weight is associated
with each cluster according to its concept profile and to the weight of the concepts (the weight
of a concept depends on the weight of each of its reformulated term in the retrieved documents).
The clusters are sorted according to their weights and the first 1000 documents in this sorted list
are retrieved.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Content-based Image Retrieval System</title>
        <p>
          The content-based image retrieval system we used in ImageCLEF 2006 is the system PIRIA
(Program for the Indexing and Research of Images by Affinity)[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], developed in our lab. The
query image is submitted to the system, which returns a list of images ranked by their similarity
to the query image. The similarity is obtained by a metric distance that operates on every image
signatures. These indexed images are compared according to several classifiers : principally Color,
Texture and Form if the segmentation of the images is relevant. The system takes into account
geometric transformations and variations like rotation, symmetry, mirroring, etc. PIRIA is a
global one-pass system, feedback or “relevant/non relevant” learning methods are not used.
Color Indexing This indexer first quantifies the image, and then, for each quantified color,
it computes how much this color is connex. It can also be described as a border/interior pixel
classification [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The distance used for the color indexing is a classical L2 norm.
Texture Indexing A global texture histogram is used for the texture analysis. The histogram
is computed from the Local Edge Pattern descriptors [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. These descriptors describe the local
structure according to the edge image computed with a Sobel filtering. We obtain a 512-bins
texture histogram, which is associated with a 64-bins color histogram where each plane of the
RGB color space is quantized into 4 colors. Distances are computed with a L1 norm.
Form Indexing The form indexer used consists of a projection of the edge image along its
horizontal and vertical axes. The image is first resized in 100x100. Then, the Sobel edge image
is computed and divided into four equal sized squares (up left, up right, bottom left and bottom
right). Then, each 50x50 part is projected along its vertical and horizontal axes, thus giving a
400-bins histogram. The L2 distance is used to compare two histograms.
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Search and Merging Strategy</title>
        <p>Both systems are used independently to retrieve documents from textual and visual information.
For the CBIR results, since queries contain several images, a first merging has been performed
to obtain a single image list from the results of each query image: the score associated to result
images is set to the max of the scores obtained for each query image.</p>
        <p>
          Results obtained by each system are then merged using a weighted sum of the scores obtained by
each system. To make results from the different systems comparable, we tried several normalization
functions, presented in Table 1, where αi is the weight associated with the scores of the ith system,
RSVmax is the the highest score obtained for a query, RSVmin the lowest score, RSVavg the average
score and RSVδ the standard deviation of the scores. These functions have for instance been tested
by [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] for data fusion in the multilingual tracks of previous CLEF campaigns. The submitted runs
used the normRSV merging function.
        </p>
        <p>sumRSV
normRSVMax
normRSV
Zscore</p>
        <p>P αi ∗ RSVi
P αi ∗ RSVi/RSVmax
P αi ∗ (RSVi − RSVmin)/(RSVmax − RSVmin)
P αi ∗ [(RSV − RSVavg)/RSVδ + (RSVavg − RSVmin)/RSVδ]</p>
        <p>Based on the results from the previous campaigns, we also considered a conservative merging
strategy: we use the results obtained by one system only to reorder the results obtained by the
other, the score of a document is modified using the same merging coefficient.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results for the ImageClefPhoto task</title>
      <p>We used, for the text retrieval part, textual queries in English, Spanish, French and German. We
used English and German as independent target languages (as the annotations in both languages
refer to the same images and form more an aligned corpus, it did not seem interesting to use
both languages as a single multilingual corpus). We only submitted runs with the English target
language. For the CBIR system, we tested the color and texture indexers.</p>
      <p>We present in Table 2 the results obtained by the CBIR system alone and the text system
alone.</p>
      <p>We present in Table 3 the results obtained by the merging of the two systems, with a normRSV
merging schema, and for different values of α (α being the weight associated with the text results,
1−α to the image results). In this merging, we used the English topics with the English annotations
and the color indexer. The runs submitted for merged results used α = 0.7.</p>
      <p>Results are given for the mean average precision (map) and the number of relevant documents
retrieved (relret ).</p>
      <p>CBIR results
indexer map
color 0.0468
texture 0.0363</p>
      <p>We see from these results that this simple a-posteriori merging of text and image results based
on a weighted sum of the scores can increase the mean average precision and number of relevant
documents retrieved (best value of α is around 0.9 or 0.8).</p>
      <p>eng</p>
      <p>However, the gain is still small, due to the fact that CBIR results are surprisingly quite poor.
On one hand, we are investigating in more details the flaws of the indexers for this new image
corpus. On the other hand, a first analysis of the results show that merging the CBIR results
for the example images before merging with the text results is not a good idea: when several
example images are given, they can provide different aspects of what the results should look like,
and therefore can be as different as possible, in the range of relevant images. The merging of
results base on purely visual similarity can be irrelevant in this case. The analysis of the CBIR
results show that the rate of common images in the results for the different example images of
a same topic does not exceed, in average, 16 to 18%. Table 4 present the results obtained using
only one example image for each topic. The best example image has been taken (according to the
reference), and the gain on mean average precision in this case is more than 11%. The problem
still remains to find the best image example (in our results, the average precision obtained for
each image example does not correlate well with the average score given by the CBIR system).</p>
      <p>eng</p>
      <p>Another solution to this problem can be to consider each result of an example image as an
independent result, and merge all results according to the same schema. In this case, the weight
associated to the text result is α and the weight associated to each CBIR result is α/n where n is
the number of example images. Table 5 present the results obtained with this method. The gain
in this case is around 8%, but this method does not need to determine a priori the best example
image.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>The experiments performed by the LIC2M in the ImageCLEF 2006 campaign show that merging
results from different media may increase the performance of a search system: a well-tuned a
posteriori merging of the results obtained by two general purpose systems (no particular adaptation
of the systems was made for the two tasks) can improve the mean average precision. An analysis of
the CBIR results show that merging the results obtained for different example images can increase
the noise in global results since example images are often chosen to be visually different to show
several aspects of possible relevant images. Some solutions are proposed to cope with this aspect,
such as taking only one example image (the best), or using all example image, but merging each
with text results (not between them). Both solutions lead to better results in term of mean average
precision.</p>
      <p>More sophisticated solutions could be considered, such as working on the image analysis to
try to determinate the similarities of the example images and find similar images in the collection
based on these similarities, instead of considering each example image independently.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Romaric</given-names>
            <surname>Besanc</surname>
          </string-name>
          <article-title>¸on and Christophe Millet. Data fusion of retrieval results from different media: experiments at ImageCLEF 2005</article-title>
          . In Carol Peters, Fredric C. Gey, Julio Gonzalo,
          <string-name>
            <given-names>Gareth J.F.</given-names>
            <surname>Jones</surname>
          </string-name>
          , Michael Kluck, Bernardo Magnini, Henning Mu¨ller, and Maarten de Rijke, editors,
          <source>Accessing Multilingual Information Repositories: 6th Workshop of the Cross-Language Evaluation Forum</source>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2005</year>
          . Springer,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Ya-Chun</surname>
            <given-names>Cheng</given-names>
          </string-name>
          and
          <string-name>
            <surname>Shu-Yuan Chen</surname>
          </string-name>
          .
          <article-title>Image classification using color, texture and regions</article-title>
          .
          <source>Image and Vision Computing</source>
          ,
          <volume>21</volume>
          (
          <issue>9</issue>
          ),
          <year>September 2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Magali</given-names>
            <surname>Joint</surname>
          </string-name>
          ,
          <article-title>Pierre-Alain Mo¨ellic, Patrick H`ede, and Pascal Adam. PIRIA : A general tool for indexing, search and retrieval of multimedia content</article-title>
          .
          <source>In SPIE Electroning Imaging</source>
          <year>2004</year>
          , San Jose, California USA,
          <year>January 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Jacques</given-names>
            <surname>Savoy</surname>
          </string-name>
          and
          <string-name>
            <surname>Pierre-Yves Berger</surname>
          </string-name>
          .
          <article-title>Report on CLEF-2005 evaluation campaign: monolingual, bilingual, and GIRT information retrieval</article-title>
          . In Carol Peters, editor,
          <source>Working Notes for the CLEF 2005 Workshop</source>
          , Vienna, Austria,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Renato. O.</given-names>
            <surname>Stehling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Mario. A.</given-names>
            <surname>Nascimento</surname>
          </string-name>
          , and
          <string-name>
            <surname>Alexandre</surname>
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Falca</surname>
          </string-name>
          <article-title>˜o. A compact and efficient image retrieval approach based on border/interior pixel classification</article-title>
          .
          <source>In CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management</source>
          ,
          <source>McLean</source>
          , Virginia, USA,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>