=Paper=
{{Paper
|id=Vol-1172/CLEF2006wn-adhoc-MandlEt2006
|storemode=property
|title=Robust Ad-hoc Retrieval Experiments with French and English at the University of Hildesheim
|pdfUrl=https://ceur-ws.org/Vol-1172/CLEF2006wn-adhoc-MandlEt2006.pdf
|volume=Vol-1172
|dblpUrl=https://dblp.org/rec/conf/clef/MandlHW06a
}}
==Robust Ad-hoc Retrieval Experiments with French and English at the University of Hildesheim==
Robust Ad-hoc Retrieval Experiments with French and English at the University of Hildesheim Thomas Mandl, René Hackl, Christa Womser-Hacker University of Hildesheim, Information Science Marienburger Platz 22 D-31141 Hildesheim, Germany mandl@uni-hildesheim.de Abstract This paper reports on experiments submitted for the robust task at CLEF 2006 ad intended to provide a baseline for other runs for the robust task. We applied a system previously tested for ad-hoc retrieval. Runs for mono-lingual English and French were submitted. Results on both training as well as test topics are reported. Only for French, positive results above 0.2 MAP were achieved. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Information Search and Retrieval; H.3.4 Systems and Software General Terms Measurement, Performance, Experimentation Keywords Multilingual Retrieval, Robust Retrieval, Evaluation Measures 1 Introduction We intended to provide a base line for the robust task at CLEF 2006. Our system applied to ad-hoc CLEF 2005 data (Hackl et al. 2005) is an adaptive fusion system based on the MIMOR model (Mandl & Womser-Hacker 2004). For the base line experiments, we solely optimized blind relevance feedback (BRF) parameters based on a strategy developed by Carpineto et al. (Carpineto et al. 2001). The basic retrieval engine is Lucene. 2 System Setup Two runs for the English and two for the French monolingual data were submitted. The results for both test and training topics are shown in table 1 and 2, respectively. Table 1. Results for Submitted Monolingual Runs Run Language Stemming BRF GeoAve MAP (docs. - terms) uhienmo1 English Lucene 5-30 0.01% 7.98% uhienmo2 English Lucene 15-30 0.01% 7.12% uhifrmo1 French Lucene 5-30 5.76% 28.50% uhifrmo2 French Lucene 15-30 6.25% 29.85% Table 2. Result for Training Topics for Submitted Monolingual Runs Run Language Stemming BRF GeoAve MAP (docs. - terms) uhienmo1 English Lucene 5-30 0.01% 7.16% uhienmo2 English Lucene 15-30 0.01% 6.33% uhifrmo1 French Lucene 5-30 8.58% 25.26% uhifrmo2 French Lucene 15-30 9.88% 28.47% Only the runs for French have reached a competitive level of above 0.2 MAP. The results for the geometric average for the English topics are worse, because low performance for several topics leads to a sharp drop in the performance according to this measure. 3 Future Work For future experiments, we intend to exploit the knowledge on the impact of named entities on the retrieval process (Mandl & Womser-Hacker 2005) as well as selective relevance feedback strategies in order to improve robustness (Kwok 2005). References Carpineto, C.; de Mori. R.; Romano. G.; Bigi. B. (2001): An Information-Theoretic Approach to Automatic Query Expansion. In: ACM Transactions on Information Systems. 19 (1) pp. 1-27. Hackl, René; Mandl, Thomas; Womser-Hacker, Christa (2005): Mono- and Cross-lingual Retrieval Experiments at the University of Hildesheim. In: Peters, Carol; Clough, Paul; Gonzalo, Julio; Kluck, Michael; Jones, Gareth; Magnini, Bernard (eds): Multilingual Information Access for Text. Speech and Images: Results of the Fifth CLEF Evaluation Campaign. Berlin et al.: Springer [LNCS 3491] pp. 165-169. Kwok, K.L. (2005): An Attempt to Identify Weakest and Strongest Queries. In: ACM SIGIR 2005 Workshop: Predicting Query Difficulty - Methods and Applications. Salvador - Bahia - Brazil, August 19, 2005, http://www.haifa.ibm.com/sigir05-qp/papers/kwok.pdf Mandl, Thomas; Womser-Hacker, Christa (2004): A Framework for long-term Learning of Topical User Preferences in Information Retrieval. In: New Library World vol. 105 (5/6) pp. 184-195. Mandl, Thomas; Womser-Hacker, Christa (2005): The Effect of Named Entities on Effectiveness in Cross-Language Information Retrieval Evaluation. In: Applied Computing 2005: Proc. ACM SAC Symposium on Applied Computing (SAC). Information Access and Retrieval (IAR) Track. Santa Fe, New Mexico, USA. March 13.-17. 2005. pp. 1059-1064.