=Paper= {{Paper |id=Vol-1172/CLEF2006wn-adhoc-MandlEt2006 |storemode=property |title=Robust Ad-hoc Retrieval Experiments with French and English at the University of Hildesheim |pdfUrl=https://ceur-ws.org/Vol-1172/CLEF2006wn-adhoc-MandlEt2006.pdf |volume=Vol-1172 |dblpUrl=https://dblp.org/rec/conf/clef/MandlHW06a }} ==Robust Ad-hoc Retrieval Experiments with French and English at the University of Hildesheim== https://ceur-ws.org/Vol-1172/CLEF2006wn-adhoc-MandlEt2006.pdf
          Robust Ad-hoc Retrieval Experiments with French and English
                       at the University of Hildesheim

                              Thomas Mandl, René Hackl, Christa Womser-Hacker

                                 University of Hildesheim, Information Science
                                             Marienburger Platz 22
                                      D-31141 Hildesheim, Germany
                                           mandl@uni-hildesheim.de


                                                   Abstract
             This paper reports on experiments submitted for the robust task at CLEF 2006 ad
             intended to provide a baseline for other runs for the robust task. We applied a system
             previously tested for ad-hoc retrieval. Runs for mono-lingual English and French
             were submitted. Results on both training as well as test topics are reported. Only for
             French, positive results above 0.2 MAP were achieved.


Categories and Subject Descriptors
H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Information Search and
Retrieval; H.3.4 Systems and Software
General Terms
Measurement, Performance, Experimentation
Keywords
Multilingual Retrieval, Robust Retrieval, Evaluation Measures


1    Introduction

We intended to provide a base line for the robust task at CLEF 2006. Our system applied to ad-hoc CLEF 2005
data (Hackl et al. 2005) is an adaptive fusion system based on the MIMOR model (Mandl & Womser-Hacker
2004). For the base line experiments, we solely optimized blind relevance feedback (BRF) parameters based on a
strategy developed by Carpineto et al. (Carpineto et al. 2001). The basic retrieval engine is Lucene.


2    System Setup

Two runs for the English and two for the French monolingual data were submitted. The results for both test and
training topics are shown in table 1 and 2, respectively.

                                 Table 1. Results for Submitted Monolingual Runs
                    Run      Language Stemming            BRF           GeoAve       MAP
                                                      (docs. - terms)
                 uhienmo1     English      Lucene          5-30          0.01%       7.98%
                 uhienmo2     English      Lucene         15-30          0.01%      7.12%
                 uhifrmo1     French       Lucene          5-30          5.76%      28.50%
                 uhifrmo2     French       Lucene         15-30          6.25%      29.85%
                           Table 2. Result for Training Topics for Submitted Monolingual Runs
                     Run       Language Stemming              BRF            GeoAve         MAP
                                                          (docs. - terms)
                  uhienmo1       English      Lucene           5-30           0.01%        7.16%
                  uhienmo2       English      Lucene          15-30           0.01%       6.33%
                  uhifrmo1       French       Lucene           5-30           8.58%       25.26%
                  uhifrmo2       French       Lucene          15-30           9.88%       28.47%

Only the runs for French have reached a competitive level of above 0.2 MAP. The results for the geometric
average for the English topics are worse, because low performance for several topics leads to a sharp drop in the
performance according to this measure.


3        Future Work

For future experiments, we intend to exploit the knowledge on the impact of named entities on the retrieval
process (Mandl & Womser-Hacker 2005) as well as selective relevance feedback strategies in order to improve
robustness (Kwok 2005).


References

Carpineto, C.; de Mori. R.; Romano. G.; Bigi. B. (2001): An Information-Theoretic Approach to Automatic Query
  Expansion. In: ACM Transactions on Information Systems. 19 (1) pp. 1-27.
Hackl, René; Mandl, Thomas; Womser-Hacker, Christa (2005): Mono- and Cross-lingual Retrieval Experiments at the
  University of Hildesheim. In: Peters, Carol; Clough, Paul; Gonzalo, Julio; Kluck, Michael; Jones, Gareth; Magnini,
  Bernard (eds): Multilingual Information Access for Text. Speech and Images: Results of the Fifth CLEF Evaluation
  Campaign. Berlin et al.: Springer [LNCS 3491] pp. 165-169.
Kwok, K.L. (2005): An Attempt to Identify Weakest and Strongest Queries. In: ACM SIGIR 2005 Workshop: Predicting
  Query Difficulty - Methods and Applications. Salvador - Bahia - Brazil, August 19, 2005,
  http://www.haifa.ibm.com/sigir05-qp/papers/kwok.pdf
Mandl, Thomas; Womser-Hacker, Christa (2004): A Framework for long-term Learning of Topical User Preferences in
  Information Retrieval. In: New Library World vol. 105 (5/6) pp. 184-195.
Mandl, Thomas; Womser-Hacker, Christa (2005): The Effect of Named Entities on Effectiveness in Cross-Language
  Information Retrieval Evaluation. In: Applied Computing 2005: Proc. ACM SAC Symposium on Applied Computing
  (SAC). Information Access and Retrieval (IAR) Track. Santa Fe, New Mexico, USA. March 13.-17. 2005. pp. 1059-1064.