Mono-and Bilingual Retrieval Experiments with a Social Science Document Corpus

Mono-and Bilingual Retrieval Experiments with a Social Science Document Corpus RenéHackl University of Hildesheim

Information Science Marienburger Platz 22 D-31141 Hildesheim Germany

ThomasMandl mandl@uni-hildesheim.de University of Hildesheim

Information Science Marienburger Platz 22 D-31141 Hildesheim Germany

Mono-and Bilingual Retrieval Experiments with a Social Science Document Corpus B6BE3D430E52F679C14F01489C1A6F19 GROBID - A machine learning software for extracting information from scholarly documents H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing H.3.3 Information Search and Retrieval H.3.4 Systems and Software Measurement, Performance, Experimentation Domain specific, Social Science, Bilingual retrieval, Thesaurus

This paper reports on our participation in CLEF 2005's domain-specific retrieval track. The experiments were based on previous experiences with the GIRT document corpus and were run in parallel to the multi-lingual experiments for CLEF 2005. We optimized the parameters of the system with one corpus from 2004 and applied these settings to the domain specific task. In that manner, the robustness of our approach over different document collection was assessed.

Introduction

In previous CLEF campaigns, we tested an adaptive fusion system based on the MIMOR model (Mandl & Womser-Hacker 2004) within the domain specific GIRT track (Hackl et al. 2003). For CLEF 2005, the parameter optimization was based on a French document collection. The parameter settings were applied to the four language document collection of the multilingual task of CLEF 2005 (Hackl et al. 2005). In addition, we applied almost the same settings to the domain specific track in order to test the robustness of our system over different collections.

Robustness has become an issue in information retrieval research recently. It has been noted often, that the variance between queries is worse than the variance between systems. There are often very difficult queries which few systems solve well and which lead to very bad results for most systems (Harman & Buckley 2004). Thorough failure analysis can lead to substantial improvement. For example, the absence of named entities are a factor which can make queries more difficult overall (Mandl & Womser-Hacker 2004). As a consequence, a new evaluation track for robust retrieval has been established at the Text Retrieval Conference (TREC). This track does not only measure the average precision over all queries but also emphasizes the performance of the systems for difficult queries. To perform well in this track is more important for the systems to retrieve at least a few documents for difficult queries than to improve the performance in average (Voorhees 2005). In order to allow a system evaluation based on robustness more queries than for a normal ad-hoc track are necessary. The concept of robustness is extended in TREC 2005. Systems need to perform well over different tracks and tasks (Voorhees 2005).

For multilingual retrieval, robustness would also be an interesting evaluation concept because the performance between queries differs greatly (Mandl & Womser-Hacker 2004). Robustness in multilingual retrieval could be interpreted in three ways:

• Stable performance over all topics instead of high average performance (like at TREC) • Stable performance over different tasks (like at TREC) • Stable performance over different languages (focus of CLEF)

For the participation in the domain specific track in 2005, we tested the stability of our ad-hoc system for the domain specific track.

Domain Specific Mono-and Cross-lingual Retrieval Experiments

Our system was optimized with the French collection of CLEF 2004. The optimization procedure is described in detail in Hackl et al. 2005. The GIRT runs were produced with only slightly different settings.

Previous experiences with the GIRT corpus showed that blind relevance feedback does not lead to good results (Kluck 2004). Our test runs confirmed that fact and blind relevance feedback was not applied for the submitted runs. Instead, term expansion was based the thesaurus available for the GIRT data. This thesaurus was developed by the Social Science Information Centre (Kluck 2004). For the query terms, the fields Broader, Narrower and Related term were extracted from the thesaurus and added to the query for the second run. The topic title weights were set to ten, topic description weights to three and the thesaurus terms were weighted with one. This weighting scheme was adopted from the ad-hoc task.

For the second mono-lingual run UHIGIRT2, we added terms from the multilingual European terminology database Eurodicautom1 which was also used for the ad-hoc experiments. However, Eurodicautom contributed terms for very few queries. Most often, it returned "out of vocabulary".

As bilingual GIRT run, we submitted one English-to-German run. The query and the thesaurus terms were translated by ImTranslator2 . In addition, the document field "english-translation" was indexed.

Conclusion and Outlook

For next year, we intend to implement for multi-lingual runs for the domain specific task. The thesaurus use led to a drop in performance. For the future, we intend to develop a more sophisticated strategy to apply thesaurus terms.

Table 1 .1Results from the CLEF 2005 Workshop. EDA = Euradicautom Although, our system has been tested with Russian data at earlier CLEF campaigns and at the ad-hoc task this year, the Russian social science RSSC collection could not be used because it was provided later than the rest of the data.RunIDLanguagesRun TypeFieldsRetrieved RelevantAvg.useddocs.Prec.UHIGIRT1 Monolingual German Lucene stemmerTD140026820.220UHIGIRT2 Monolingual German Lucene stemmer,TD133526820.193IZ thesaurus, EDAUHIGIRT3English-GermanLucene stemmer,TD115926820.178IZ thesaurus, EDAImTranslator

http://europa.eu.int/eurodicautom/Controller http://freetranslation.paralink.com/

<author> <persName><forename type="first">René</forename><forename type="middle">;</forename><surname>Hackl</surname></persName> </author> <author> <persName><forename type="first">Ralph</forename><surname>Kölle</surname></persName> </author> <imprint/> </monogr> </biblStruct> <biblStruct xml:id="b1"> <analytic> <title level="a" type="main">Domain Specific Retrieval Experiments at the University of Hildesheim with the MIMOR System Thomas;Mandl ChristaWomser-Hacker Advances in Cross-Language Information Retrieval: Third Workshop of the Cross-Language Evaluation Forum, CLEF 2002 et al.

Rome, Italy; Berlin

Springer 2003. September 2002 2785 <author> <persName><forename type="first">René</forename><surname>Hackl</surname></persName> </author> <imprint/> </monogr> </biblStruct> <biblStruct xml:id="b3"> <monogr> <title level="m" type="main">Ad-hoc Mono-and Multilingual Retrieval Experiments at the University of Hildesheim Thomas;Mandl ChristaWomser-Hacker 2005 In this volume <author> <persName><forename type="first">Donna</forename><surname>Harman</surname></persName> </author> <imprint/> </monogr> </biblStruct> <biblStruct xml:id="b5"> <analytic> <title level="a" type="main">The NRRC reliable information access (RIA) workshop ChrisBuckley Proceedings of the 27 th annual international conference on Research and development in information retrieval (SIGIR) the 27 th annual international conference on Research and development in information retrieval (SIGIR) 2004 The GIRT Data in the Evaluation of CLIR Systems -from 1997 until MichaelKluck Comparative Evaluation of Multilingual Information Access Systems: 4 th Workshop of the Cross-Language Evaluation Forum, CLEF 2003

Trondheim, Norway

2004. 2003. August 21-22, 2003 Revised Selected Papers A Framework for long-term Learning of Topical User Preferences in Information Retrieval Thomas;Mandl ChristaWomser-Hacker New Library World 105 5/6 2004 The Effect of Named Entities on Effectiveness in Cross-Language Information Retrieval Evaluation Thomas;Mandl ChristaWomser-Hacker Proc ACM SAC Symposium on Applied Computing (SAC). Information Access and Retrieval (IAR) Track ACM SAC Symposium on Applied Computing (SAC). Information Access and Retrieval (IAR) Track

Santa Fe, New Mexico, USA

2005. March 13.-17 The TREC robust retrieval track EllenVoorhees ACM SIGIR Forum 39 1 2005