-

gba@fub.it Gianni Amati Fondazione Ugo Bordoni

2002

388 400

3 Description of PROSIT 2 Indexing 3.1 Term weighting tion gain is obtained by a combination of three distinct probabilistic processes: the probabilistic with respect to an \Elite" set of documents, which is the set of documents containg the query term, and the probabilistic process deriving the term frequency within the document normalized collection, the probabilistic process computing a conditional probability of occurrence of the term However, for both TREC-10 and CLEF collections we have introduced a parameter for document The framework is based on computing the information gain for each term query. The informaprocess computing the amount of the information content of the term with respect to the entire length normalization which enhances the retrieval outcome. to the document average length. The framework thus consists of three independent components: the average length of documents the total number of term occurrences in its elite set the length of the document the within document term frequency the size of the elite set of the term (see below) Et the size of the collection candidates to be assessed relevant and therefore we might consider them to constitute a second query. We have considered in our experiments only 3 documents as pseudo-relevant documents Formulas 1, 2 and 3 produce a rst ranking of possible relev ant documents. The topmost ones are dieren t \Elite set T of documents", namely documents which best describe the content of the Leibler divergence function: and extracted from them the rst 10 most informativ e terms which were added to the original query. The most informative terms are selected by using the information-theoretic Kullbackexperiments and is called L2 stands for Bose-Einstein and L for Laplace). BE (BE c avg l l tf n = tf 1 + log2 The correcting factor c = 3 may be inserted to the term frequency normalization and obtain

Based on these observations, we decided to use bigrams in addition to, not in place of, single

most returned documents for the CLEF query \Kaurismaki lms" w ere about other famous lms, information. inverted le. During query ev aluation, for each bigram extracted from the query, the posting lists Bigrams are good for disambiguating terms and for handling topic drift, i.e., when the results On the other hand, some bigrams that are generated automatically may, in turn, over-emphasize concepts that are common to both relevant and nonrelevant documents [9]. So far, the results associated with the bigram words in the inverted le are merged and a new pseudo posting list is directly into PROSIT’s main algorithm. topic but not to the requested aspect of it. This phenomenon can also be seen as some query terms created that contains all documents that contain the bigram along with the relevant occurrence necessary to encode the information about the position of each term in each document into the of PROSIT. In this way one can hope to increase the quality of the documents on which the documents that have the same pair of words occurring within the specied windo w. The score performance remained lower than that obtainable by using just unigram scores. subsequent query expansion step is based. This may happen because more top relevant documents whereas the use of bigrams considerably improved the precision of search. After the rst ranking w as computed using unigram and bigram scores, the top documents We submitted one run to CLEF 2002, labeLLed as \fub02l", which was produced using PROSIT of PROSIT because the order of words in the expanded query is not relevant. similar to the query. We used a simple technique known as lexical aÆnities. Lexical aÆnities are identied b y nding keywords, or unigrams. We attempted to improve its performance by using two-word index units were used to generate the expanded query and PROSIT computed the second ranking as if it were just using unigrams. We chose to not expand the original query with two-word units due to the are retrieved or because the nonrelevant documents which contribute to query expansion are more The lexical aÆnity technique was reported to produce very good results on the web TREC collection, even better than those obtained using unigrams [5]. However, we were not able to dimensionality problem, and we did not use the bigram method during the second-pass ranking augmented with the bigrams procedure just described. CLEF experiments, we used the query title and chose a distance of 5 words. All the bigrams pairs of words that occur close to each other in a window of some predened small size. F or the From an implementation point of view, in order to eÆciently compute the bigram scores it is about the eectiv eness of bigrams versus unigrams have not been conclusive. words. Second, instead of running two separate ranking systems, one for unigrams and the other PROSIT, like most information retrieval systems, is based on index units consisting of single for bigrams, and then combining their scores, we tried to incorporate the bigram component assigned to bigrams is computed using the same weighting function used for unigrams. (bigrams). The bigram scores were thus combined with the unigram score to produce the rst-pass ranking obtain such good results on the CLEF collection. In fact, we found that the bigram performance generated this way are seen as new index units and are used to increase the relevance of those of queries on specic aspects of wide topics con tain documents that are relevant to the general was considerably worse than the unigram performance; even when combining the scores, the matching out of context of their relationships to other terms [4]. For instance, using unigrams

4 Augmenting PROSIT with bigrams

In the second retrieval we used the term weighting function: w = (6) al, that is according Formulas 2 and 3.

PROSIT+bigrams

0.5208 0.5088 both test collections but it was still worse than baseline performance on the CLEF 2002 collection of the best system at CLEF 2001 (0.4865). This result is a conrmation of the high eectiv eness Table 1 also shows that, in general, the variations in performance when passing from basic Combining both enhancements improved the retrieval performance over using CLM alone on test collections, with the value obtained for CLEF 2001 (0.5116) being much higher than the result The results of Table 1 show that the performance of standard PROSIT was excellent on both 2001 and CLEF 2002 Italian monolingual tasks. Table 1 shows the retrieval performance of the four systems on the two test collections using the average precision as evaluation measure. We tested PROSIT and its three variants (i.e., PROSIT with bigrams, PROSIT with coordination performance across both test collections, whereas the use of coordination level matching was slightly benecial for CLEF 2001 and detrimen tal for CLEF 2002. document and query statistics. of the probabilistic ranking model implemented in PROSIT, which is exclusively based on simple PROSIT to enhanced PROSIT were small. More in particular, the use of bigrams improved level matching, and PROSIT with both bigrams and coordination level matching) on the CLEF

It should also be noted that we experimented with other types of multi-word index units, by

using just two words with a window of size 5 was the optimal choice. using windows of dieren t size and by selecting a larger number of words. However, we found that

Consistent with earlier eectiv eness results, most information retrieval systems are based on best

+ coordination level matching (run fub02lb) seen as an alternative or as a complementary technique to traditional best-matching retrieval. In In this way, the documents were partially ordered according to their coordination level matching matching algorithms between query and documents. interest in precision rather than recall have fostered new research on exact matching retrieval, tiveness by simply preferring the documents that contained all the words of the query title, without To implement this strategy, we modied the standard best-matc hing similarity score between documents to rerank retrieval results may improve performance in certain situations (e.g.,[7], [3]). with the query title, with ties being broken using their best-matching similarity score to the query. that matched all of the query keywords above documents that matched all but one of the keywords, query and documents, computed as explained in Section 2, by adding a much larger addendum However, the results were somewhat disappointing. We obtained a much better retrieval eecpaying attention to lower levels of coordination matching. This was our choice (run fub02b). and so on. Finally, we submitted a fourth run by using the fully enhanced version of PROSIT, i.e., bigrams particular, it has been shown that taking into account the number of query words matched by the to it which was proportional to the number of terms shared by the document and the query title. However, the use of very short queries on the part of most of the users and the prevailing For the CLEF experiments, we focused on the query title. The goal was to prefer documents selection of retrieval results for interactive searches. International Journal On Digital Libraries, [3] E. Berenci, C. Carpineto, V. Giannini, S. Mizzaro. Eectiv eness of keyword-based display and 3(3):249-260, 2000.

7 Conclusions

References based on measuring divergence from randomness. ACM Transactions on Information Systems, [2] Gianni Amati and Cornelis Joost van Rijsbergen. Probabilistic models of information retrieval (to appear), 2002. a probabilistic framework for topic relevance term weighting. In E.M. Voorhees and D.K. [1] Gianni Amati, Claudio Carpineto, and Giovanni Romano. FUB at TREC 10 web track: Harman, editors, In Proceedings of the 10th Text Retrieval Conference TREC 2001, pages 182{191, Gaithersburg, MD, 2002. NIST Special Pubblication 500-250. the use of bigrams and coordination level matching within PROSIT’s main algorithm. From our We have experimented with the PROSIT system on the Italian monolingual task and have explored experimental evaluation, the following main conclusions can be drawn. coordination. JASIS, 49(14):1254-1269,1998. [4] D. Bodo, A. Kam bil. Partial coordination. I. The best of pre-coordination and post

We regret that due to tight schedule we were not able to test PROSIT on the other CLEF

monolingual tasks. However, as the application of PROSIT to the Italian task did not require any languages. This is left for future work. special work, we are conden t that with a small eort w e could obtain similar results for the other and unigram scores performed better but it was still inferior to the results obtained by using Using bigrams in the place of unigrams hurt performance; the combination of bigram scores held across both test collections. unigrams alone. However, using the bigram scores in the rst-pass ranking, just to rank the documents used for query expansion, resulted in a performance improvement. These results on both the CLEF 2001 and CLEF 2002 Italian monolingual tasks. These results are even The novel probabilistic model implemented in PROSIT achieved high retrieval eectiv eness more remarkable considering that the system employs very simple indexing techniques and does not rely on any specialised or ad hoc natural language processing techniques. query analysis of concept drift in the nal retriev ed documents. and worse than using bigrams alone on CLEF 2002. Overall, the results about the enhanced versions of PROSIT are inconclusive. More work is sample of performance measures or by considering other query scenarios. standing of why the use of bigrams into PROSIT’s main algorithm yielded positive results in the experiments reported in this paper. This might be done, for instance, by analysing the variations needed to collect further evidence about their eectiv eness, e.g., by using a more representative Besides more robust evaluation of retrieval performance, it would be useful a better underon quality of the top ranked documents used for query expansion or by performing a query by Using coordination level matching to rerank the retrieval results did not, in general, improve documents according to their level of coordination matching hurt performance on both test performance Favouring the documents that contained all the keywords in the query title collections. worked better on one test collection and worse on the other collection, whereas ordering the automatic query expansion. ACM Transactions on Information Systems, 19(1):1{27, 2001. [6] C. Carpineto, R. De Mori, G. Romano, and B. Bigi. An information theoretic approach to