Applying Query Expansion techniques to Ad Hoc Monolingual tasks with the IR-n system Elisa Noguera and Fernando Llopis Grupo de investigación en Procesamiento del Lenguaje Natural y Sistemas de Información Departamento de Lenguajes y Sistemas Informáticos University of Alicante, Spain elisa,llopis@dlsi.ua.es Abstract The paper describes our participation in Monolingual tasks at CLEF 2007. We submit- ted results for the following languages: Hungarian, Bulgarian and Czech. We focused on studying different query expansion techniques: Probabilistic Relevance Feedback (PRF) and Mutual Information Relevance Feedback (MI-RF) to improve retrieval per- formance. After an analysis of our experiments and of the official results at CLEF 2007, we achieved considerably improved scores by using query expansion techniques for different languages (Hungarian, Bulgarian and Czech). Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: H.3.3 Information Search and Retrieval General Terms Experimentation, Performance, Query Expansion Keywords Information Retrieval 1 Introduction Query expansion (QE) is a technique commonly used in Information Retrieval (IR) [5] [2] to im- prove retrieval performance by reformulating the original query adding new terms or re-weighting the original terms. Query expansion terms can be automatically extracted from documents or taken from knowledge resources. In our seventh participation at CLEF, we focused on comparing two different query expansion strategies: Probabilistic Relevance Feedback (PRF) and Mutual Information Relevance Feedback (MI-RF). Specifically, we participated in tasks for the following languages: Hungarian, Bulgarian and Czech. We used the IR-n system [3]. It is a Passage Retrieval (PR) system which uses passages with a fixed number of sentences. This provides the passages with some syntactical content. This paper is organized as follows: the next section describes the task developed by our system and the training carried out for CLEF 2007. The results obtained are then presented. Finally, we present the conclusions and the future work. 2 Relevance Feedback Query expansion techniques as Relevance Feedback (RF) can substantially improve retrieval ef- fectiveness. Most of the IR systems commonly implemented query expansion techniques. RF is usually performed in the following way: • A search using the original query is performed, selecting the n terms from top-ranked doc- uments. • The n terms are added to the original query to formulate a new query. • The new query is performed to produce a new ranked list of documents. An important factor is how to assign the weight to the selected terms with respect to the terms from the initial query. In this work, we compare two formulas in order to calculate this weight (wt ): Probabilistic and Mutual Information. 2.1 Probabilistic Relevance Feedback (PRF) This is the term relevance weighting formula proposed by Robertson and Sparck Jones in [4]. The relevance weight of term t is given by: (mt + 0.5) · (n − nt − m + mt + 0.5) wt = (1) (m − mt + 0.5) · (nt − mt + 0.5) where n is the number of documents in the collection, m is the number of documents considered as relevants (in this case 10 documents), nt is the number of documents which the term t appears and mt is the number of relevant documents in which the term t appears. wt will be better for those terms which have a higher frecuency in the relevant documents than the whole collection. Our system allows two query expansion techniques based on this formula: (1) query expansion based on the most relevant passages or (2) the most relevant documents. 2.2 Mutual Information Relevance Feedback (MI-RF) This is based on the idea the co-ocurrence between two terms can determine the semantic relation that exists between them [1]. The mutual information score grows with the increase in frequency of word co-occurrence. If two words co-occur mainly due to chance their mutual information score will be close to zero. If they occur predominantly individually, then their mutual information will be a negative number. The standard formula for calculating mutual information is: P (x, y) M I(x, y) = log( ) (2) P (x) · P (y) where P (x, y) is the probability that words x and y occur together; P (x) and P (y) are the probabilities that x and y occur individually. The relevance weight wt of each term t is calculated adding the MI between t and each term of the query. 3 Experiments This section describes the training process carried out in order to obtain the best features to improve the performance of the system. In CLEF 2007, our system participated in the following Monolingual tasks: Hungarian, Bulgarian and Czech. The aim of the experimental phase was to set up the optimum value of the input parameters for each collection. CLEF 2005 and 2006 (Hungarian and Bulgarian) collections were used for training. Query expansion techniques were also evaluated for all languages. Here below, we describe the input parameter of the system: • Size passage (sp): We established two passage sizes: 8 sentences (normal passage) or 30 sentences (big passage). • Weighting model (wm): We use two weighting models: okapi and dfr. • Dfr parameters: these are c and avgld . • Query expansion parameters: If exp has value 1, this denotes we use PRF based on passages. If exp has value 2, the PRF is based on documents. And, if exp has value 3, MI-RF query expansion is used. Moreover, np and nd denote the k terms (nd) extracted from the best ranked passages or documents (np) from the original query. • Evaluation measure: Mean average precision (avgP) is the evaluation measure used in order to evaluate the experiments. This value was obtained with the 2006 collections. Table 1 shows the best configuration for each language: Table 1: Best results obtained with training data CLEF 2006 language sp wm C avgld exp np nd avgP Hungarian 8 dfr 2 300 0.3182 Hungarian 8 dfr 2 300 2 10 10 0.3602 Hungarian 8 dfr 2 300 3 10 10 0.3607 Bulgarian 30 dfr 1.5 300 0.1977 Bulgarian 30 dfr 1.5 300 2 10 10 0.2112 Bulgarian 30 dfr 1.5 300 3 10 10 0.2179 The best weighting scheme for Hungarian and Bulgarian was dfr. For Hungarian, we used 30 as passage size. For Bulgarian, we set up 8 as passage size. Finally, the configuration used for Czech was the same as for Hungarian (dfr as weighting scheme and 30 as passage size). 4 Results at CLEF 2007 We submitted four runs for each language in our participation at CLEF 2007. The best parameters, i.e. those that gave the best results in system training, were used in all cases. This is the description of the runs that we submitted at CLEF 2007: • IRnxxyyyyN – xx is the language (BU, HU or CZ) – yyyy is the query expansion (nexp: not used, exp2 : PRF, exp3 : MI-RF). – N means the tag narrative was used. The official results for each run are showed in Table 2. Like other systems which use query expansion techniques, these models also improve performance with respect to the base system. Our results are appreciably above baseline in all languages. The best percentage of improvement in AvgP is 40.09% for Hungarian. 5 Conclusions and Future Work In this eighth CLEF evaluation campaign, we compared different query expansion techniques in our system for Hungarian, Bulgarian and Czech (see Table 1). Specifically, we compare two query Table 2: CLEF 2007 official results. Monolingual tasks. Language Run AvgP Dif Hungarian IRnHUnexp (baseline) 33.90 IRnHUexp2 38.94 +14.88% IRnHUexp3 39.42 +16.29% IRnHUexp2N 40.09 +18.26% Bulgarian IRnBUnexp (baseline) 21.19 IRnBUexp2 25.97 +22.57% IRnBUexp3 26.35 +24.36% IRnBUexp2N 29.81 +40.09% Czech IRnCZnexp (baseline) 20.92 IRnCZexp2 24.81 +18.61% IRnCZexp3 24.84 +18.76% IRnCZexp2N 27.68 +32.36% expansion techniques: Probabilistic Relevance Feedback (PRF) and Mutual Information Relevance Feedback (MI-RF). The results of this evaluation indicate that for the Hungarian, Bulgarian and Czech our ap- proach proved to be effective (see Table 2) because the results are above baseline. For all languages, the best results were obtained using MI-RF (see Table 2). In the future we intend to test this approach in other languages such as Spanish. We also intend to study ways of integrating NLP knowledge and procedures into our basic IR system and evaluating the impact. Acknowledgements This research has been partially supported by the framework of the project QALL-ME (FP6- IST-033860), which is a 6th Framenwork Research Programme of the European Union (EU), by the Spanish Government, project TEXT-MESS (TIN-2006-15265-C06-01) and by the Valencia Government under project number GV06-161. References [1] William A. Gale and Kenneth W. Church. Identifying word correspondence in parallel texts. In HLT ’91: Proceedings of the workshop on Speech and Natural Language, pages 152–157, Morristown, NJ, USA, 1991. Association for Computational Linguistics. [2] Donna Harman. Relevance feedback revisited. In SIGIR ’92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, pages 1–10, New York, NY, USA, 1992. ACM Press. [3] F. Llopis. IR-n: Un Sistema de Recuperación de Información Basado en Pasajes. PhD thesis, University of Alicante, 2003. [4] Stephen E. Robertson and Karen Sparck Jones. Relevance weighting of search terms, pages 143–160. Taylor Graham Publishing, London, UK, 1988. [5] Jinxi Xu and W. Bruce Croft. Query expansion using local and global document analysis. In SIGIR ’96: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 4–11, New York, NY, USA, 1996. ACM Press.