=Paper=
{{Paper
|id=Vol-1173/CLEF2007wn-adhoc-HeuwingEt2007
|storemode=property
|title=Robust Retrieval Experiments at the University of Hildesheim
|pdfUrl=https://ceur-ws.org/Vol-1173/CLEF2007wn-adhoc-HeuwingEt2007.pdf
|volume=Vol-1173
|dblpUrl=https://dblp.org/rec/conf/clef/HeuwingM07a
}}
==Robust Retrieval Experiments at the University of Hildesheim==
Robust Retrieval
Experiments at the University of Hildesheim
Ben Heuwing, Thomas Mandl
University of Hildesheim, Information Science
Marienburger Platz 22
D-31141 Hildesheim, Germany
mandl@uni-hildesheim.de
Abstract
This paper reports on experiments submitted for the robust task at CLEF 2007. We
applied a system previously tested for ad-hoc retrieval. Experiments were focused
on the effect of blind relevance feedback and named entities. Experiments for mono-
lingual English and French are presented.
Categories and Subject Descriptors
H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Information Search and
Retrieval; H.3.4 Systems and Software
General Terms
Measurement, Performance, Experimentation
Keywords
Multilingual Retrieval, Robust Retrieval, Evaluation Measures
1 Introduction
We intended to provide a base line for the robust task at CLEF 2007. Our basic system was used at CLEF
campaigns previously (Hackl et al. 2005).
For the base line experiments, we optimized blind relevance feedback (BRF) parameters. The underlying basic
retrieval engine of the system is the open source search engine Apache Lucene.
2 System Setup
Five runs for the English and three for the French monolingual data were submitted. The results for both test and
training topics are shown in table 1 and 2, respectively.
Optimization of the Blind Feedback parameters on the English training topics of 2006 showed the best results
when the query was expanded with 30 Terms from the top10 documents and the query-expansion was given a
relative weight of 0.05 compared to the rest of the query. The same improvements (compared to the base run)
can be seen on a smaller scale for the submitted runs. The use of Named Entities from the English documents did
not have an effect on the retrieval quality.
For the French runs the use of a heavy-weighted (equal to the rest of the query) query expansion with 50 terms
from the best 5 documents came out as the best Blind Relevance parameters - even though for the training topics
the base run performed better.
Table 1. Results for Submitted Runs
Run Language Stemming BRF NE MAP R- Precision
(weight-docs-terms) Precision @10
HiMoEnBase English snowball - - 0,0527 0,0769 0.1555
HiMoEnBrf1 English snowball 1.0-10-30 - 0,0580 0,0888 0,3032
HiMoEnBrf2 English snowball 0.05-10-30 - 0,0586 0,0858 0,1683
HiMoEnBrfNe English snowball 0.05-10-30 1.0 0,0588 0,0858 0,1683
HiMoEnNe English snowball - 2.0 0,0527 0,0769 0,1555
HiMoFrBase French lucene - - 0,1954 0,1974 3.8915
HiMoFrBrf French lucene 0.5-5-25 - 0,1571 0,2024 0,3080
HiMoFrBrf2 French lucene 1.0-5-50 - 0,1630 0,2121 0,3992
Table 2. Result for Training Topics
Run Language Stemming BRF NE MAP
(weight-docs-terms)
HiMoEnBase English snowball - - 0.1634
HiMoEnBrf1 English snowball 1.0-10-30 - 0.1489
HiMoEnBrf2 English snowball 0.05-10-30 - 0.1801
HiMoEnBrfNe English snowball 0.05-10-30 1.0 0.1801
HiMoEnNe English snowball - 2.0 0.1634
HiMoFrBase French lucene - - 0.2081
HiMoFrBrf French lucene 0.5-5-25 - 0.2173
HiMoFrBrf2 French lucene 1.0-5-50 - 0.2351
Only the runs for French have reached a competitive level of above 0.2 MAP. The results for the geometric
average for the English topics are worse, because low performance for several topics leads to a sharp drop in the
performance according to this measure.
3 Future Work
For future experiments, we intend to exploit the knowledge on the impact of named entities on the retrieval
process (Mandl & Womser-Hacker 2005) as well as selective relevance feedback strategies in order to improve
robustness.
References
Mandl, Thomas; Hackl, René; Womser-Hacker, Christa (2007): Robust Ad-hoc Retrieval Experiments with French and
English at the University of Hildesheim. In: Peters, Carol et al. (Eds.). 7th Workshop of the Cross-Language Evaluation
Forum, CLEF 2006, Alicante, Spain, Revised Selected Papers. Berlin et al.: Springer [Lecture Notes in Computer Science
4730] pp. 127-128.
Mandl, Thomas; Womser-Hacker, Christa (2005): The Effect of Named Entities on Effectiveness in Cross-Language
Information Retrieval Evaluation. In: Applied Computing 2005: Proc. ACM SAC Symposium on Applied Computing
(SAC). Information Access and Retrieval (IAR) Track. Santa Fe, New Mexico, USA. March 13.-17. 2005. pp. 1059-1064.
Peters, Carol et al. (2007): Overview of the ad-hoc Track at CLEF. In this volume.