Introduction and outline

CLEF 2008 Ad-Hoc Track: On-line Processing Experiments with Xtrieval

0 Jens Kursten, Thomas Wilhelm and Maximilian Eibl Chemnitz University of Technology Faculty of Computer Science, Dept.

Computer Science and Media 09107 Chemnitz, Germany [ jens.kuersten

This article describes our rst participation at the Ad-Hoc track. We used the Xtrieval framework [2], [3] for the preparation and execution of the experiments. We regard our experiments as online or live experiments since the preparation of all results including indexing and retrieval took us less than 4 hours in total. This year, we submitted 18 experiments in total, whereof only 4 were pure monolingual runs. In all our experiments we applied a standard top-k pseudo-relevance feedback algorithm. The translation of the topics for the multilingual experiments was realized with a plug-in to access the Google AJAX language API2. The performance of our monolingual experiments was slightly below the average for the German and French collection and in the top 5 for the English collection. Our bilingual experiments performed very well (at least in the top 3) for all target collections.

Evaluation Experimentation Cross-Language Information Retrieval

Introduction and outline

the evaluation, because we had to rebuild all three indexes within a few hours.

The remainder of the paper is organized as follows. Section 2 describes the general setup of our system. The individual con gurations and the results of our submitted experiments are presented in section 3. In sections 4 and 5 we summarize the results and sum up our observations. 2

Experimental setup

We think that our experiments for this year's Ad-Hoc track could be called on-line or live retrieval experiments. As already mentioned in the introduction, we used the wrong document identi ers for indexing, which resulted in completely useless experiments. We had 6 hours to x this problem and to re-run all or at least some feasible experiments. Therefore we had to rectify and verify the indexing process. Additionally, we had to implement a simple retrieval algorithm, because our more sophisticated approach using language detection stored all language-speci c information on indexing time and thus was not available for our nal experiments.

Nevertheless, we used di erent stemming approaches for German and English and combined the results in the retrieval stage by applying our implementation of the Z-Score operator [ 4 ]. We also used a standard top-k pseudo-relevance feedback algorithm in the retrieval stage. Our baseline retrieval experiment was compared to three additional experiments for each monolingual subtask and one additional experiment for each bilingual subtask. 3

Con gurations and Results

The detailed setup of our experiments are presented in the following subsections. 3.1

Monolingual Experiments We submitted 12 monolingual experiments in total, whereof 4 were submitted for each target collection in German, English and French. For all experiments a language-speci c stopword list was applied4. We used di erent stemmers for each language: Porter5 and Krovetz [ 1 ] for English, Snowball5 and a n-gram variant decompounding stemmer6 for German and again the Snowball5 implementation of a stemmer for French. We applied top-k (k = 10) pseudo-relevance feedback in all our experiments.

Besides a baseline experiment, which simply returns everything regardless in which language the description library record is stored, we also tried to implement a more sophisticated retrieval algorithm. In that retrieval algorithm we translate the query into the top 10 (in terms of occurrence) languages and merge these multilingual terms into a single query. We used three di erent weights for this query. In the rst setup we weighted all topic languages equally. For the second and third con guration we used the distribution (x ) of the language in the corresponding collection. In the second we weighted the topic languages with x and in the last con guration we simply used 1-x. For the experiments with x as language weight, we want to boost documents in languages with high occurrence frequency since they will probably have more relevant documents for a speci c topic. In contrast to that in the experiments with 1-x as language weight, we assume that documents in all language might contain relevant documents and therefore push up documents in languages with low occurrence frequency in the whole collection.

In table 1, the retrieval performance of our experiments is presented in terms of mean average precision (map) and the absolute rank of the experiment in the evaluation. We compare the baseline run with experiments using di erent language weights (lw).

The results show that our simple (and pure monolingual) con guration always outperformed the experiments with translation and language weights. The overall performance of our experiments is also not very promising, 1http://lucene.apache.org 2http://code.google.com/apis/ajaxlanguage/documentation 3http://json.org 4http://members.unine.ch/jacques.savoy/clef/index.html 5http://snowball.tartarus.org 6http://www-user.tu-chemnitz.de/~wags/cv/clr.pdf cut merged simple cut multi10 wx plusplus cut multi10 w1 plusplus cut multi10 w1minusx plusplus cut merged simple cut multi10 wx plusplus cut multi10 w1minusx plusplus cut multi10 w1 plusplus cut merged simple cut multi10 wx plusplus cut multi10 w1minusx plusplus cut multi10 w1 plusplus

DE DE DE DE EN EN EN EN FR FR FR FR except for one monolingual English experiment. The results also show, that the experiments with lw=x, which means the weight is equivalent to the occurrence of the language in the collection, signi cantly outperformed the other weighting schemes for all collections. The evaluation results of our bilingual experiments show strong performance for our baseline con gurations. For these experiments the decrease in retrieval performance varies between 4 and 12 percent in comparison to the corresponding monolingual experiment. This is probably due to quality of the translation. Another interesting observation can be made by analyzing our experiments on the language weights. The bilingual experiments perform just as well as the monolingual experiments, which is actually what we did expect. Only the experiment on the French collection achieved a remarkably better performance just by translating from English (instead of French) to all nine other languages.

id cut merged simple cut merged simple en2de cut multi10 w1 plusplus cut merged simple multi10 w1 en2de cut merged simple cut merged simple de2en cut multi10 w1 plusplus cut merged simple multi10 w1 de2en cut merged simple cut merged simple en2fr cut multi10 w1 plusplus cut merged simple multi10 w1 en2fr

DE EN!DE DE EN!DE EN DE!EN EN DE!EN FR EN!FR FR EN!FR

Result Analysis - Summary

The following list provides a summary of the analysis of our retrieval experiments for the Ad-Hoc track at CLEF 2008:

On-line Processing for Retrieval: Running (= indexing and retrieving) all listed experiments in less than 4 hours was one of most interesting experiences for us in this years evaluation. This fact impressively shows the performance and adaptability of the Xtrieval framework.

Monolingual: The performance of our monolingual experiments was slightly below the average for the German and French collection and very good for the English collection. The multilingual experiments (++) performed quite bad, mainly because we used 10 languages for querying the multilingual collections.

Bilingual: Probably due to the used translation service our bilingual experiments performed very well and achieved top results on each target collection. The performance of some multilingual experiments could be improved just by using another query language. But most of these experiments produced almost the same results as they did when the language of the query and the language of the target collection were the same. 5

Conclusion and Future Work

This year, we participated in the Ad-Hoc track for the rst time and we had to tackle a real bad problem on the day of the submission deadline. Therefore, we regard our experiments as on-line or live experiments. An important observation in all our experiments for this years CLEF campaign was that the translation service provided by Google seems to be extremely superior to any other approach or system. This should motivate the cross-language community to investigate and improve their current approaches. In the future we will try to use only 3 or 4 main languages for multilingual experiments on the collections and we assume that we can outperform our best experimental result from this work. Furthermore we will rebuild our indexes with help of language detection as we had planned and completed for the participation in this year.

Acknowledgments

We would like to thank Jaques Savoy and his co-workers for providing numerous resources for language processing. Also, we would like to thank Giorgio M. di Nunzio and Nicola Ferro for developing and operating the DIRECT system7.

This work was partially accomplished in conjunction with the project sachsMedia, which is funded by the Entrepreneurial Regions 8 program of the German Federal Ministry of Education and Research.

[1]

Robert

Krovetz . Viewing morphology as an inference process . In SIGIR '93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval , pages 191 { 202 , New York, NY, USA, 1993 . ACM.

[2]

Jens

Ku rsten, Thomas Wilhelm, and

Maximilian

Eibl . The xtrieval framework at clef 2007: Domainspeci c track . In C. Peters,

Jijkoun , Th. Mandl, H. Muller,

D.W.

Oard , A . Pen~as,

Petras , and D. Santos, editors, LNCS - Advances in Multilingual and Multimodal Information Retrieval , volume 5152 , Berlin, 2008 . Springer Verlag.

[3]

Jens

Ku rsten, Thomas Wilhelm, and

Maximilian

Eibl . Extensible retrieval and evaluation framework: Xtrieval . LWA 2008: Lernen - Wissen - Adaption, Wurzburg, October 2008 , Workshop Proceedings, October 2008 , to appear.

[4]

Jaques

Savoy . Data fusion for e ective european monolingual information retrieval . Working Notes for the CLEF 2004 Workshop , Bath, UK.