=Paper= {{Paper |id=Vol-1171/CLEF2005wn-DomainSpecific-HacklEt2005 |storemode=property |title=Mono- and Bilingual Retrieval Experiments with a Social Science Document Corpus |pdfUrl=https://ceur-ws.org/Vol-1171/CLEF2005wn-DomainSpecific-HacklEt2005.pdf |volume=Vol-1171 |dblpUrl=https://dblp.org/rec/conf/clef/HacklM05a }} ==Mono- and Bilingual Retrieval Experiments with a Social Science Document Corpus== https://ceur-ws.org/Vol-1171/CLEF2005wn-DomainSpecific-HacklEt2005.pdf
                         Mono- and Bilingual Retrieval Experiments
                          with a Social Science Document Corpus

                                             René Hackl, Thomas Mandl

                                  University of Hildesheim, Information Science
                             Marienburger Platz 22, D-31141 Hildesheim, Germany
                                            mandl@uni-hildesheim.de


                                                    Abstract
              This paper reports on our participation in CLEF 2005‘s domain-specific retrieval
              track. The experiments were based on previous experiences with the GIRT
              document corpus and were run in parallel to the multi-lingual experiments for CLEF
              2005. We optimized the parameters of the system with one corpus from 2004 and
              applied these settings to the domain specific task. In that manner, the robustness of
              our approach over different document collection was assessed.


Categories and Subject Descriptors
H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Information Search and
Retrieval; H.3.4 Systems and Software
General Terms
Measurement, Performance, Experimentation
Keywords
Domain specific, Social Science, Bilingual retrieval, Thesaurus


1    Introduction

In previous CLEF campaigns, we tested an adaptive fusion system based on the MIMOR model (Mandl &
Womser-Hacker 2004) within the domain specific GIRT track (Hackl et al. 2003). For CLEF 2005, the
parameter optimization was based on a French document collection. The parameter settings were applied to the
four language document collection of the multilingual task of CLEF 2005 (Hackl et al. 2005).
In addition, we applied almost the same settings to the domain specific track in order to test the robustness of our
system over different collections.
    Robustness has become an issue in information retrieval research recently. It has been noted often, that the
variance between queries is worse than the variance between systems. There are often very difficult queries
which few systems solve well and which lead to very bad results for most systems (Harman & Buckley 2004).
Thorough failure analysis can lead to substantial improvement. For example, the absence of named entities are a
factor which can make queries more difficult overall (Mandl & Womser-Hacker 2004). As a consequence, a new
evaluation track for robust retrieval has been established at the Text Retrieval Conference (TREC). This track
does not only measure the average precision over all queries but also emphasizes the performance of the systems
for difficult queries. To perform well in this track is more important for the systems to retrieve at least a few
documents for difficult queries than to improve the performance in average (Voorhees 2005). In order to allow a
system evaluation based on robustness more queries than for a normal ad-hoc track are necessary. The concept
of robustness is extended in TREC 2005. Systems need to perform well over different tracks and tasks (Voorhees
2005).
    For multilingual retrieval, robustness would also be an interesting evaluation concept because the
performance between queries differs greatly (Mandl & Womser-Hacker 2004). Robustness in multilingual
retrieval could be interpreted in three ways:
         • Stable performance over all topics instead of high average performance (like at TREC)
         • Stable performance over different tasks (like at TREC)
         • Stable performance over different languages (focus of CLEF)
For the participation in the domain specific track in 2005, we tested the stability of our ad-hoc system for the
domain specific track.


2       Domain Specific Mono- and Cross-lingual Retrieval Experiments

Our system was optimized with the French collection of CLEF 2004. The optimization procedure is described in
detail in Hackl et al. 2005. The GIRT runs were produced with only slightly different settings.
    Previous experiences with the GIRT corpus showed that blind relevance feedback does not lead to good
results (Kluck 2004). Our test runs confirmed that fact and blind relevance feedback was not applied for the
submitted runs. Instead, term expansion was based the thesaurus available for the GIRT data. This thesaurus was
developed by the Social Science Information Centre (Kluck 2004). For the query terms, the fields Broader,
Narrower and Related term were extracted from the thesaurus and added to the query for the second run. The
topic title weights were set to ten, topic description weights to three and the thesaurus terms were weighted with
one. This weighting scheme was adopted from the ad-hoc task.
    For the second mono-lingual run UHIGIRT2, we added terms from the multilingual European terminology
database Eurodicautom1 which was also used for the ad-hoc experiments. However, Eurodicautom contributed
terms for very few queries. Most often, it returned "out of vocabulary".
    As bilingual GIRT run, we submitted one English-to-German run. The query and the thesaurus terms were
translated by ImTranslator2. In addition, the document field “english-translation” was indexed.


                          Table 1. Results from the CLEF 2005 Workshop. EDA = Euradicautom
        RunID            Languages                 Run Type        Fields     Retrieved     Relevant      Avg.
                                                                    used                     docs.        Prec.
      UHIGIRT1      Monolingual German       Lucene stemmer          TD          1400        2682         0.220
      UHIGIRT2      Monolingual German       Lucene stemmer,         TD          1335        2682         0.193
                                            IZ thesaurus, EDA
      UHIGIRT3         English-German        Lucene stemmer,        TD           1159         2682        0.178
                                            IZ thesaurus, EDA
                                               ImTranslator


Although, our system has been tested with Russian data at earlier CLEF campaigns and at the ad-hoc task this
year, the Russian social science RSSC collection could not be used because it was provided later than the rest of
the data.


3          Conclusion and Outlook

For next year, we intend to implement for multi-lingual runs for the domain specific task. The thesaurus use led
to a drop in performance. For the future, we intend to develop a more sophisticated strategy to apply thesaurus
terms.


References

Hackl, René; Kölle, Ralph; Mandl, Thomas; Womser-Hacker, Christa (2003): Domain Specific Retrieval Experiments at the
  University of Hildesheim with the MIMOR System. In: Peters, Carol; Braschler, Martin; Gonzalo, Julio; Kluck, Michael
  (Eds.): Advances in Cross-Language Information Retrieval: Third Workshop of the Cross-Language Evaluation Forum,
  CLEF 2002, Rome, Italy, September 2002. Berlin et al.: Springer [LNCS 2785] pp. 343-348.
Hackl, René; Mandl, Thomas; Womser-Hacker, Christa (2005): Ad-hoc Mono- and Multilingual Retrieval Experiments at the
  University of Hildesheim. In this volume

1
    http://europa.eu.int/eurodicautom/Controller
2
    http://freetranslation.paralink.com/
Harman, Donna; Buckley, Chris (2004): The NRRC reliable information access (RIA) workshop. In: Proceedings of the 27th
   annual international conference on Research and development in information retrieval (SIGIR). pp. 528-529.
Kluck, Michael (2004): The GIRT Data in the Evaluation of CLIR Systems - from 1997 until 2003. In: Comparative
   Evaluation of Multilingual Information Access Systems: 4th Workshop of the Cross-Language Evaluation Forum, CLEF
   2003, Trondheim, Norway, August 21-22, 2003, Revised Selected Papers. pp. 376-390
Mandl, Thomas; Womser-Hacker, Christa (2004): A Framework for long-term Learning of Topical User Preferences in
   Information Retrieval. In: New Library World vol. 105 (5/6) pp. 184-195.
Mandl, Thomas; Womser-Hacker, Christa (2005): The Effect of Named Entities on Effectiveness in Cross-Language
   Information Retrieval Evaluation. In: Proc ACM SAC Symposium on Applied Computing (SAC). Information Access and
   Retrieval (IAR) Track. Santa Fe, New Mexico, USA. March 13.-17. pp. 1059-1064.
Voorhees, Ellen (2005): The TREC robust retrieval track. In: ACM SIGIR Forum 39 (1) pp. 11-20.