<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Xtrieval Framework at CLEF 2008: ImageCLEF photographic retrieval task</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Thomas Wilhelm, Jens Kürsten, and Maximilian Eibl Chemnitz University of Technology Faculty of Computer Science, Dept.</institution>
          <addr-line>Computer Science and Media 09107 Chemnitz, Germany [ thomas.wilhelm</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes our participation at the ImageCLEF photographic retrieval task. We used our Xtrieval framework for the preparation and execution of the experiments. This year, we submitted 4 experiments in total. The experiments showed that our thesaurus based query expansions works well in improving the geometric mean average precision (GMAP) and binary preference (BPREF), but deteriorates the improvements gained by the addition of content-based image retrieval. The baseline (text-only) scored a mean average precision (MAP) of 0.0998. The combination of text and image retrieval gained a raise by 37 percent to a MAP of 0.1364. After applying the query expansion to both experiments the MAP for the text-only retrieval increased to 0.1081, but the MAP for the combined text and image retrieval decreased to 0.1140. By implementing an interface to the PostgreSQL database the retrieval speed and comparison operations for vectors could be speeded up.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Evaluation</kwd>
        <kwd>Content-based Image Retrieval</kwd>
        <kwd>Query Expansion</kwd>
        <kwd>Experimentation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>This year the data set was reduced again. All languages except English were removed. Only a subset of the
topics of last year was supplied and the only available language again is English. So the multilingual character of
this task got lost. In this year’s experiments we actually repeated our monolingual runs of the last year with
tweaked parameters and a new database backend for the storage of the MPEG-7 descriptors.</p>
    </sec>
    <sec id="sec-2">
      <title>2 Experiment Setup</title>
      <p>
        The base system from the last year (see [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]) was reused with the following setup: Apache Lucene,
customized analyzer with positional stopword1 removal and Snowball stemmer2. For the content-based image
retrieval we used Caliph &amp; Emir as described above.
      </p>
      <p>Like last year a thesaurus was used for query expansion. The parameters of the last year were further tuned to
reduce unfitting synonyms. As source for the thesauri we still use OpenOffice.org3.</p>
      <sec id="sec-2-1">
        <title>1 http://members.unine.ch/jacques.savoy/clef/index.html 2 http://snowball.tartarus.org/ 3 http://wiki.services.openoffice.org/wiki/Dictionaries</title>
        <p>
          The MPEG-7 features were calculated by Caliph &amp; Emir (see [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]). Contrary to our experiments of the last year
the MPEG-7 descriptors were not stored as text representations in Lucene4, but as vectors in a PostgreSQL5
database. PostgreSQL was chosen because it supports arrays as data types. In fact it is not necessary to know the
actual size of the arrays at design time. This approach is expected to achieve a much higher retrieval speed and it
is possible to use descriptors of Caliph &amp; Emir which has no string representation implemented (e.g. the
dominant color descriptor).
        </p>
        <p>The computations of the distance measures were externalized into the PostgreSQL database by implementing the
algorithms as stored procedures in PL/pgSQL6. PL/pgSQL is an internal programming language of PostgreSQL
which adds support for additional logic to SQL such as control structures. The following algorithms are
implemented so far: cosine similarity, Dice coefficient, Euclidean metric, intersection, Jaccard similarity
coefficient. The main advantage is the reduction of extra round trips between our application and the database
server. On the other hand the speed could be reduced by the fact that PL/pgSQL is an interpreted language.
All topics were preprocessed ad-hoc to retrieve all needed resources to perform the experiments. Especially the
example images were retrieved and analyzed in advance.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 Results</title>
      <p>Because of the before mentions reduction of data we only conducted four experiments. Considering the number
of participants (25) and their submissions (100) this seems to be average. In the following table all four
experiments are compared.
The results show that query expansion improves the mean average precision (MAP) only in text-only retrieval.
In combination with content-based image retrieval the result gets even worse. But the geometric mean average
precision (GMAP) and the binary preference (BPREF) improves in all cases. It is obvious that the thesaurus
based query expansion improves the results in means of recall, but deteriorates them in means of precision.
Additional image information is only able to improve results if no query expansion is applied, but it is a quite
high increase of the MAP by 37 percent.</p>
      <sec id="sec-3-1">
        <title>4 http://lucene.apache.org</title>
        <p>5 http://www.postgresql.org
6 http://www.postgresql.org/docs/current/static/plpgsql.html</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wilhelm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kürsten</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Eibl</surname>
          </string-name>
          , “
          <article-title>Experiments for the ImageCLEF 2007 Photographic Retrieval Task”</article-title>
          ; http://clef-campaign.org/2007/working_notes/wilhelmCLEF2007.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kürsten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wilhelm</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Eibl</surname>
          </string-name>
          , “
          <article-title>The xtrieval framework at clef 2007: Domain-specific track,” LNCS - Advances in Multilingual</article-title>
          and Multimodal Information Retrieval,
          <string-name>
            <given-names>C.</given-names>
            <surname>Peters</surname>
          </string-name>
          et al., ed., Berlin: Springer Verlag,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wilhelm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kürsten</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Eibl</surname>
          </string-name>
          , “
          <article-title>Extensible retrieval and evaluation framework: Xtrieval,” LWA 2008: Lernen - Wissen -</article-title>
          <string-name>
            <surname>Adaption</surname>
          </string-name>
          , Würzburg:
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Klieber</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Granitzer</surname>
          </string-name>
          , “Caliph &amp;
          <article-title>Emir: Semantics in Multimedia Retrieval</article-title>
          and Annotation,” 19th
          <source>International CODATA Conference</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>