<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Pro essing &amp; Management</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <pub-date>
        <year>2000</year>
      </pub-date>
      <volume>36</volume>
      <issue>341359</issue>
      <fpage>1</fpage>
      <lpage>2</lpage>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>and prepositional phrases have been extra ted by means of TreeTagger. TreeTagger is a tool
Institute for Computational Linguisti s of the University of Stuttgart1.
thanks to Google’s query syntax.</p>
      <p>These elds have been on atenated into one single text and all ontained nouns, noun phrases
On e nouns and phrases are identied they are taken to ompose the query, preserving phrases
for annotating text with part-of-spee h and lemma information whi h has been developed at the
indexed web pages in Fren h language. Sin e we expe t to re over 100 snippets, we have found
nal expanded query.
that supports indexing of large-s ale text databases, the onstru tion of simple language models
language models as well as a variety of other retrieval models. The toolkit is being developed
trieval system. The olle tion dataset has been indexed using Lemur IR It is a toolkit system2.
Fren h do uments have been pro essed in a similar way, but using the OR operator to join
found phrases for the generated Google query. This has been done due to the smaller number of
The next step is to exe ute both original and Google queries on the Lemur information
refor do uments, queries, or sub olle tions, and the implementation of retrieval systems based on
that with this operator this is possible, despite low quality texts been onsidered to produ e the
0.10
0.11
gm-ap
0.13
0.12
performan e is so poor for test data, analyzing, for instan e, side ee ts of the regression approa h.
involving web-based query generation for English and Fren h olle tions. The generation of a
nal list of results by merging sear h results obtained from two dieren t queries has been studied.</p>
      <p>We have reported on our experimentation for the Ad-Ho Robust Multilingual tra k CLEF task
are joined by means of logisti regression, instead of using an expanded query as we did last year.
improvement for test data. This question must be nd out and we hope to understand why the
The results are disappointing. While results for training data are very promising, there is not
These two queries are the original one and a new one generated from Google results. Both lists</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>