-

Pro essing & Management

2000

36 341359 1 2

and prepositional phrases have been extra ted by means of TreeTagger. TreeTagger is a tool Institute for Computational Linguisti s of the University of Stuttgart1. thanks to Google’s query syntax.

These elds have been on atenated into one single text and all ontained nouns, noun phrases On e nouns and phrases are identied they are taken to ompose the query, preserving phrases for annotating text with part-of-spee h and lemma information whi h has been developed at the indexed web pages in Fren h language. Sin e we expe t to re over 100 snippets, we have found nal expanded query. that supports indexing of large-s ale text databases, the onstru tion of simple language models language models as well as a variety of other retrieval models. The toolkit is being developed trieval system. The olle tion dataset has been indexed using Lemur IR It is a toolkit system2. Fren h do uments have been pro essed in a similar way, but using the OR operator to join found phrases for the generated Google query. This has been done due to the smaller number of The next step is to exe ute both original and Google queries on the Lemur information refor do uments, queries, or sub olle tions, and the implementation of retrieval systems based on that with this operator this is possible, despite low quality texts been onsidered to produ e the 0.10 0.11 gm-ap 0.13 0.12 performan e is so poor for test data, analyzing, for instan e, side ee ts of the regression approa h. involving web-based query generation for English and Fren h olle tions. The generation of a nal list of results by merging sear h results obtained from two dieren t queries has been studied.

We have reported on our experimentation for the Ad-Ho Robust Multilingual tra k CLEF task are joined by means of logisti regression, instead of using an expanded query as we did last year. improvement for test data. This question must be nd out and we hope to understand why the The results are disappointing. While results for training data are very promising, there is not These two queries are the original one and a new one generated from Google results. Both lists