Robust Retrieval Experiments at the University of Hildesheim Ben Heuwing, Thomas Mandl University of Hildesheim, Information Science Marienburger Platz 22 D-31141 Hildesheim, Germany mandl@uni-hildesheim.de Abstract This paper reports on experiments submitted for the robust task at CLEF 2007. We applied a system previously tested for ad-hoc retrieval. Experiments were focused on the effect of blind relevance feedback and named entities. Experiments for mono- lingual English and French are presented. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Information Search and Retrieval; H.3.4 Systems and Software General Terms Measurement, Performance, Experimentation Keywords Multilingual Retrieval, Robust Retrieval, Evaluation Measures 1 Introduction We intended to provide a base line for the robust task at CLEF 2007. Our basic system was used at CLEF campaigns previously (Hackl et al. 2005). For the base line experiments, we optimized blind relevance feedback (BRF) parameters. The underlying basic retrieval engine of the system is the open source search engine Apache Lucene. 2 System Setup Five runs for the English and three for the French monolingual data were submitted. The results for both test and training topics are shown in table 1 and 2, respectively. Optimization of the Blind Feedback parameters on the English training topics of 2006 showed the best results when the query was expanded with 30 Terms from the top10 documents and the query-expansion was given a relative weight of 0.05 compared to the rest of the query. The same improvements (compared to the base run) can be seen on a smaller scale for the submitted runs. The use of Named Entities from the English documents did not have an effect on the retrieval quality. For the French runs the use of a heavy-weighted (equal to the rest of the query) query expansion with 50 terms from the best 5 documents came out as the best Blind Relevance parameters - even though for the training topics the base run performed better. Table 1. Results for Submitted Runs Run Language Stemming BRF NE MAP R- Precision (weight-docs-terms) Precision @10 HiMoEnBase English snowball - - 0,0527 0,0769 0.1555 HiMoEnBrf1 English snowball 1.0-10-30 - 0,0580 0,0888 0,3032 HiMoEnBrf2 English snowball 0.05-10-30 - 0,0586 0,0858 0,1683 HiMoEnBrfNe English snowball 0.05-10-30 1.0 0,0588 0,0858 0,1683 HiMoEnNe English snowball - 2.0 0,0527 0,0769 0,1555 HiMoFrBase French lucene - - 0,1954 0,1974 3.8915 HiMoFrBrf French lucene 0.5-5-25 - 0,1571 0,2024 0,3080 HiMoFrBrf2 French lucene 1.0-5-50 - 0,1630 0,2121 0,3992 Table 2. Result for Training Topics Run Language Stemming BRF NE MAP (weight-docs-terms) HiMoEnBase English snowball - - 0.1634 HiMoEnBrf1 English snowball 1.0-10-30 - 0.1489 HiMoEnBrf2 English snowball 0.05-10-30 - 0.1801 HiMoEnBrfNe English snowball 0.05-10-30 1.0 0.1801 HiMoEnNe English snowball - 2.0 0.1634 HiMoFrBase French lucene - - 0.2081 HiMoFrBrf French lucene 0.5-5-25 - 0.2173 HiMoFrBrf2 French lucene 1.0-5-50 - 0.2351 Only the runs for French have reached a competitive level of above 0.2 MAP. The results for the geometric average for the English topics are worse, because low performance for several topics leads to a sharp drop in the performance according to this measure. 3 Future Work For future experiments, we intend to exploit the knowledge on the impact of named entities on the retrieval process (Mandl & Womser-Hacker 2005) as well as selective relevance feedback strategies in order to improve robustness. References Mandl, Thomas; Hackl, René; Womser-Hacker, Christa (2007): Robust Ad-hoc Retrieval Experiments with French and English at the University of Hildesheim. In: Peters, Carol et al. (Eds.). 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, Alicante, Spain, Revised Selected Papers. Berlin et al.: Springer [Lecture Notes in Computer Science 4730] pp. 127-128. Mandl, Thomas; Womser-Hacker, Christa (2005): The Effect of Named Entities on Effectiveness in Cross-Language Information Retrieval Evaluation. In: Applied Computing 2005: Proc. ACM SAC Symposium on Applied Computing (SAC). Information Access and Retrieval (IAR) Track. Santa Fe, New Mexico, USA. March 13.-17. 2005. pp. 1059-1064. Peters, Carol et al. (2007): Overview of the ad-hoc Track at CLEF. In this volume.