=Paper= {{Paper |id=Vol-1172/CLEF2006wn-adhoc-ZazoEt2006 |storemode=property |title=REINA at CLEF 2006 Robust Task: Local Query Expansion Using Term Windows for Robust Retrieval |pdfUrl=https://ceur-ws.org/Vol-1172/CLEF2006wn-adhoc-ZazoEt2006.pdf |volume=Vol-1172 |dblpUrl=https://dblp.org/rec/conf/clef/RodriguezFB06 }} ==REINA at CLEF 2006 Robust Task: Local Query Expansion Using Term Windows for Robust Retrieval== https://ceur-ws.org/Vol-1172/CLEF2006wn-adhoc-ZazoEt2006.pdf
REINA at CLEF 2006 Robust Task: Local Query
  Expansion Using Term Windows for Robust
                  Retrieval
             Angel Zazo, Carlos G. Figuerola, and José Luis A. Berrocal
                   REINA Research Group - Universidad de Salamanca
                  C/ Francisco de Vitoria 6-16, 37008 Salamanca, SPAIN
                                 http://reina.usal.es


                                           Abstract
     This paper describes our work at CLEF 2006 Robust task. This task is an ad-hoc task
     that explores methods for stable retrieval by focusing on poorly performing topics. We
     have realized experiments for all subtask: monolingual (EN, ES, FR and IT), bilingual
     (IT→ES) and multilingual (ES→[EN ES FR IT]) retrieval.
         For monolingual retrieval we have focused our work on local query expansion, i.e.
     using only the information from retrieved documents. External corpora, such as the
     Web, were not used. Our document retrieval system is simple; it is based on vector
     space model. Some local expansion techniques were applied for training topics. The
     best improvement was achieved using association thesauri, which were constructed
     employing co-occurrence relations in term windows, not in complete document. This
     technique is effective and can be easily implemented without tuning some parameters.
     Our mandatory runs (title+description topic fields) have obtained good positions in
     all monolingual subtasks we participate.
         For bilingual retrieval two machine translation programs were used to translate the
     topics from Italian into Spanish. Both translations were joined before searching. The
     same expansion technique was also applied. Our mandatory run has got the top rank in
     the bilingual subtask. For multilingual research we used the same procedure to obtain
     the retrieval list for each target language, and we combined them with the MAX-MIN
     data fusion method. In this subtask, our mandatory run has been in the lower part of
     the ranking of runs.

Categories and Subject Descriptors
H.3.1 [Content Analysis and Indexing]: Indexing methods, Thesauruses; H.3.3 [Information
Search and Retrieval]: Query formulation, Relevance feedback ; H.3.4 [Systems and Soft-
ware]: Performance evaluation; I.2.7 [Natural Language Processing]: Machine Translation

General Terms
Measurement, Performance, Experimentation

Keywords
Robust Retrieval, Query Expansion, Term Windows, Association Thesauri, CLIR, Machine Trans-
lation
1     Introduction
Robust retrieval tries to obtain stable performance over all topics by focusing on poorly performing
topics. Robust tracks were carried out in TREC 2003, 2004 and 2005 (Voorhees, 2003, 2004, 2005)
for monolingual retrieval, but not for cross-language information retrieval.
    The users of a information retrieval system don’t know concepts such as average precision,
recall, etc. They only use it, and they usually remember better failures than success. Failures
decide if a system will be used again. The robustness ensures that all topics obtain minimum
effectiveness levels. In information retrieval the mean of the average precision (MAP) has been
used to measure the systems’ performance. But, poorly performing topics have little influence on
MAP. At TREC, geometric average (rather than MAP) turned out to be the most stable evaluation
method for robustness (Voorhees, 2004). The geometric average (GMAP) has the desired effect of
emphasizing scores close to 0.0 (the poor performers) while minimizing differences between larger
scores.
    In CLEF 2006 Ad-hoc track a new robust task was introduced. Three subtask were designed
for robust task:

    • Monolingual: for all six document languages: Dutch (NL), English (EN), German (DE),
      French (FR), Italian (IT) and Spanish (ES).
    • Three bilingual: Italian→Spanish, French→Dutch and English→German.

    • Multilingual: All six languages are allowed as topic language.

   Our research group has participated in all subtasks. We have carried out monolingual (EN,
ES, FR, IT), bilingual (IT→ES) and multilingual (ES→[EN ES FR IT]) experiments. For each
subtask two runs was submitted, one with title and description topic fields (mandatory) and
one with only the title field. All experiments were run with the same setup (except for language
specific resources).


2     Experiments
We have focused our work on local query expansion, i.e. using only the information from retrieved
documents. In CLEF 2002 we used association and similarity thesauri to expand sort queries: all
documents of the collection (i.e. global query expansion) was used to construct the thesauri (Zazo
et al., 2003). In later works (Zazo et al., 2002, 2005; Zazo, 2003) we have studied in depth several
query expansion techniques: local vs. global analysis, term reweighting, coefficients for expansion,
etc. Some conclusions we have taken out:

    • Query expansion depends on the technique using to obtain relations between terms.
    • Performance improves if terms added to the original query have high relation value with all
      terms of the original query, not with only one separately.
    • Expansion depends on the importance (weight) of the terms added to the original query.
    • Performance is higher for sort queries than long queries. Long queries usually have well
      defined the user information need, and frequently several additional terms are not necessary
      to improve performance.
    • In most cases the expansion techniques are based on local analysis, using the retrieved
      documents to obtain relations between terms. The performance of the first retrieval is
      fundamental to obtain high improvement with the expansion: a good retrieval system (term
      weighting) is better than a good expansion technique.
    Considering these items, a lot of experiments have been carried out, only with training topics
(mandatory). One observes that the topic collection of robust task came from CLEF 2001 through
CLEF 2003, but the document collections came from CLEF 2003, and they were different than
CLEF 2001 and 2002 collections. It’s known that retrieval performance depends not only in term
weighting, but topic and document collections; for the same document collection and weighting
schema, two different topic collections obtain different performance. So, we take a daring decision:
for our experiments we have only used the training topics of CLEF 2003 topic collection.
    Our primary effort was monolingual retrieval. The steps in monolingual subtask will be ex-
plained bellow. For bilingual and multilingual experiments we have used machine translation
(MT) programs to translate the topics into document language, and then performing monolingual
retrieval. The MAX-MIN data fusion method was used to joining lists in multilingual retrieval.

2.1    Monolingual Experiments
Our document retrieval system is simple. It is based on vector space model. Not additional
plugins for word sense disambiguation nor other linguistic techniques were used. We have focused
our work on local query expansion, i.e. using only the information from retrieved documents.
Complete document collection or external corpora, such as the Web, were not used. First, it is
necessary to have a good term weighting schema to take as the base, and to check if stop words
removing or stemming processes improve robustness. Second, we have applied some local query
expansion techniques to see which had better improvement over the least effective topics.
   For each test we realized, each topic was classified into three category: “OK” if its average
precision was >MAP; “bad” if it was only >MAP/2, and “hard” if it was