<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Xtrieval Framework at CLEF 2008: Domain-Speci c Track</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Jens Kursten, Thomas Wilhelm and Maximilian Eibl Chemnitz University of Technology Faculty of Computer Science, Dept.</institution>
          <addr-line>Computer Science and Media 09107 Chemnitz, Germany [ jens.kuersten</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>This article describes our participation at the Domain-Speci c track. We used the Xtrieval framework [2], [3] for the preparation and execution of the experiments. The translation of the topics for the cross-lingual experiments was realized with a plug-in to access the Google AJAX language API2. This year, we submitted 20 experiments in total. In all our experiments we applied a standard top-k pseudo-relevance feedback algorithm. Also, all of our submissions were merged experiments, where multiple stemming approaches for each language were combined to improve retrieval performance. The evaluation of the experiments showed that the combination of stemming methods works very well. Translating the topics for the bilingual experiments deteriorated the retrieval e ectiveness only between 8 and 15 percent in comparison to our best monolingual experiments.</p>
      </abstract>
      <kwd-group>
        <kwd>Evaluation</kwd>
        <kwd>Cross-Language Information Retrieval</kwd>
        <kwd>Domain-Speci c Retrieval</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction and Outline</title>
    </sec>
    <sec id="sec-2">
      <title>Experimental Setup</title>
      <p>
        The approach we used for this years participation is mainly based on the following ideas. At rst we combine
several stemming methods for each language in the retrieval stage. The combination of the results was done
by our implementation of the Z-Score operator [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. We compared standard retrieval experiments with query
expansion based on the provided domain-speci c thesauri to investigate their impact in terms of retrieval
e ectiveness.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Con gurations and Results</title>
      <p>The detailed setup of our experiments and the results of the evaluation are presented in the following
subsections.
3.1</p>
      <sec id="sec-3-1">
        <title>Monolingual Experiments</title>
        <p>
          We submitted 5 monolingual experiments in total, 2 for the English and the German subtasks and 1 for the
Russian subtask. For all experiments a language-speci c stopword list was applied4. We used di erent
stemmers for each language: Porter5 and Krovetz [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] for English, Snowball5 and a n-gram variant decompounding
stemmer6 for German as well as an Java implementation of a stemmer4 for Russian. For two experiments the
provided thesauri were used for query expansion (tqe) and in all experiments a standard pseudo-relevance
feedback algorithm for top-k documents was used. In table 1, the retrieval performance of our experiments is
presented in terms of mean average precision (map) and the absolute rank of the experiment in the evaluation.
Our experiments on the German and English collections had very strong overall performance. In contrast to
that our experiment on the Russian collection performed very bad. It is also obvious that the thesaurus based
query expansion did not improve the retrieval performance, but at least it did not signi cantly deteriorate
the e ectiveness.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Bilingual Experiments</title>
        <p>We submitted 12 experiments in total for the bilingual subtask, i.e. 4 experiments were submitted for each
target language collection. We compared the translation from di erent source languages and the performance
of pure topic translation with combined translation. For the combined translation we used the pure topic
translation and tried to improve the translation with the help of the bilingual thesauri, i.e. for every term
occurring in the bilingual thesauri we added its provided translation to the topic. Again, we used a standard
pseudo-relevance feedback algorithm to improve retrieval e ectiveness. In Table 2 we compare each of the
bilingual experiments with respect to the performance of the corresponding monolingual experiment.
Probably due to the quality of Google's translation service and the strong performance of our monolingual
runs the retrieval e ectiveness of our bilingual experiments is also very good. Surprisingly one of our bilingual
experiments on the Russian target collection performed best, although our monolingual experiment had the
worst overall performance. This is thought to be due to the smaller number of submissions for the bilingual
4http://members.unine.ch/jacques.savoy/clef/index.html
5http://snowball.tartarus.org
6http://www-user.tu-chemnitz.de/~wags/cv/clr.pdf
The following list provides a summary of the analysis of our retrieval experiments for the Domain-Speci c
track at CLEF 2008:</p>
        <p>Monolingual: The performance of our monolingual experiments was very good for the German and
English collections and worse for the Russian collection. Interestingly, the retrieval e ectiveness could
not be improved by utilizing the provided domain-speci c thesauri for query expansion.
7http://www.clef-campaign.org/2005/working notes/workingnotes2005/appendix a.pdf - p. 61
8http://www.clef-campaign.org/2006/working notes/workingnotes2006/Appendix Domain Speci c.pdf - p. 63
9http://www.clef-campaign.org/2007/working notes/AppendixC.pdf - p. 206</p>
        <p>Bilingual: Probably due to the used translation service our bilingual experiments performed very well
and achieved the best results on each target collection. Astonishingly, we could not improve the retrieval
performance by using the provided bilingual thesauri.</p>
        <p>Multilingual: Again, mainly due to the quality of the translation and the result list combination
capabilities of the Xtrieval framework we achieved very impressive results in term of retrieval e ectiveness.</p>
        <p>There was no signi cant di erence between the experiments with English and German topics.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>This year, we achieved very good retrieval performance in almost all subtasks of the Domain-Speci c track.
Since our main research focus shifted to Multimedia Information Retrieval there were no interesting
contributions to retrieval community in this work, except for the fact that combining di erent stemming approaches
helped to improve retrieval performance. Another important observation in all our experiments for this years
CLEF campaign was that the translation service provided by Google seems to be extremely superior to any
other approach or system. This should motivate the cross-language community to investigate and improve
their current approaches.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>We would like to thank Jaques Savoy and his co-workers for providing numerous resources for language
processing. Also, we would like to thank Giorgio M. di Nunzio and Nicola Ferro for developing and operating
the DIRECT system10.</p>
      <p>This work was partially accomplished in conjunction with the project sachsMedia, which is funded by the
Entrepreneurial Regions 11 program of the German Federal Ministry of Education and Research.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Robert</given-names>
            <surname>Krovetz</surname>
          </string-name>
          .
          <article-title>Viewing morphology as an inference process</article-title>
          .
          <source>In SIGIR '93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <volume>191</volume>
          {
          <fpage>202</fpage>
          , New York, NY, USA,
          <year>1993</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Jens</given-names>
            <surname>Ku</surname>
          </string-name>
          rsten, Thomas Wilhelm, and
          <string-name>
            <given-names>Maximilian</given-names>
            <surname>Eibl</surname>
          </string-name>
          .
          <article-title>The xtrieval framework at clef 2007: Domainspeci c track</article-title>
          . In C. Peters,
          <string-name>
            <given-names>V.</given-names>
            <surname>Jijkoun</surname>
          </string-name>
          , Th. Mandl, H. Muller,
          <string-name>
            <given-names>D.W.</given-names>
            <surname>Oard</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Pen~as,
          <string-name>
            <given-names>V.</given-names>
            <surname>Petras</surname>
          </string-name>
          , and D. Santos, editors,
          <source>LNCS - Advances in Multilingual and Multimodal Information Retrieval</source>
          , volume
          <volume>5152</volume>
          , Berlin,
          <year>2008</year>
          . Springer Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Jens</given-names>
            <surname>Ku</surname>
          </string-name>
          rsten, Thomas Wilhelm, and
          <string-name>
            <given-names>Maximilian</given-names>
            <surname>Eibl</surname>
          </string-name>
          .
          <article-title>Extensible retrieval and evaluation framework: Xtrieval</article-title>
          . LWA 2008: Lernen - Wissen - Adaption, Wurzburg,
          <year>October 2008</year>
          , Workshop Proceedings,
          <year>October 2008</year>
          , to appear.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Jaques</given-names>
            <surname>Savoy</surname>
          </string-name>
          .
          <article-title>Data fusion for e ective european monolingual information retrieval</article-title>
          .
          <source>Working Notes for the CLEF 2004 Workshop</source>
          , Bath, UK.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>