-

University of Hagen at CLEF2006: Reranking documents for the domain-specific task

General Terms

Experimentation, Performance, Measurement

0 Johannes Leveling FernUniversität in Hagen (University of Hagen) Intelligent Information and Communication Systems (IICS) 58084 Hagen , Germany

This paper describes the participation of the IICS group at the domain-specific task (GIRT) of the CLEF campaign 2006. The focus of our retrieval experiments is on trying to increase precision by reranking documents in an initial result set. The reranking method is based on antagonistic terms, i.e. terms with a semantics different from the terms in a query, for example antonyms or cohyponyms of search terms. We analyzed GIRT data from 2005, i.e. the cooccurrence of search terms and antagonistic terms in documents that were assessed as relevant versus non-relevant documents to derive values for recalculating document scores. Several experiments were performed, using different initial result sets as a starting point. A pre-test with GIRT 2004 data showed a significant increase in mean average precision (a change from 0.2446 mean average precision to 0.2986 MAP). Precision for the official runs for the domain specific task at CLEF 2006 did not change significantly, but the best experiment submitted included a reranking of result documents (0.3539 MAP). In an additional reranking experiment that was run on a result set with an already high MAP (provided by the Berkeley group), a significant decrease in precision was observed (MAP dropped from 0.4343 to 0.3653 after reranking). There are several explanations for these results: First, a simple and obvious explanation is that improving precision by reranking becomes more difficult the better initial results already are. Second, our calculation of new scores includes a factor with a value that was probably chosen too high. We plan to perform additional experiments with more conservative values for this factor.

H 3 1 [Information Storage and Retrieval] Content Analysis and Indexing Indexing methods Linguistic processing H 3 3 [Information Storage and Retrieval] Information Search and Retrieval Query formulation Search process H 3 4 [Information Storage and Retrieval] Systems and Software Performance evaluation (efficiency and effectiveness) I 2 4 [Artificial Intelligence] Knowledge Representation Formalisms and Methods Semantic networks

There are several successful methods for improving performance in information retrieval (IR), such as stemming search terms and document terms to increase recall or expanding a query with related terms to increase precision. For our participation at the domain-specific task in CLEF 2006, a method for reranking documents in the initial result set to increase precision was investigated. The method determines a set of antagonistic terms, i.e. terms that are antonyms or cohyponyms of search terms, and it reduces the score (and subsequently the rank) of documents containing these terms. As some search terms will occur in a text together with their corresponding antagonistic terms frequently (e.g., “day and night”, “man and woman”, “black and white”), the factor of cooccurrence with antagonistic terms is considered as well to adapt the calculation of new scores.

For the retrieval experiments, the setup for our previous participations at the domain-specific task was used. It is described in more detail in [ 9, 8 ]. The setup includes of a deep linguistic analysis, query expansion with semantically related terms, blind feedback, an entry vocabulary module (EVM, see [ 5, 3 ]), and several retrieval functions implemented in the Cheshire II DBMS: tf-idf , Okapi/BM25 [ 11 ], and Cori/InQuery [ 2 ]. For the bilingual experiments, a single online translation service, Promt1, was employed to translate English topic titles and topic descriptions into German. 2 2.1

The idea Reranking with information about antagonistic terms

There has already been some research on reranking documents to increase precision in IR. Gey et al. [ 4 ] describe experiments with Boolean filters and negative terms for TREC data. In general, this method does not provide a significant improvement, but an analysis for specific topics shows a potential for performance increase. Our group regards Boolean filters to be too restrictive to help improve precision. Furthermore, the case of a cooccurrence of term and filter term (or antagonistic term) in queries or documents is not addressed.

Kamps [ 7 ] describes a method for reranking using a dimensional global analysis of data. The evaluation of this method is based on GIRT (German Indexing and Retrieval Testdatabase) and Amaryllis data. The observed improvement for the GIRT data is lower but on the same order as the increase in precision observed in our pre-test (see Sect. 3). While this approach is promising, it relies on a controlled vocabulary and therefore will not be portable between domains or even different text corpora.

For our experiments in the domain-specific task for CLEF 2006 (GIRT), the general idea is to rerank documents in the result set (1000 documents) by combining information about semantic relations between terms (here: antagonistic relations such as antonymy or cohyponymy) as well as statistic information about the cooccurrence frequency of a term and its antagonistic terms (short: “a-terms” ). Reranking consists of decreasing a document score whenever a query term and one of its a-terms are found in the document. 2.2

Types of antagonistic relations

We introduce the notion of antagonistic terms, meaning terms with a semantics different from search terms. For a given search term t, the set of antagonistic terms at contains terms that are antonyms of t (atword), terms that are antonyms of a member in the set of synonyms of t (atsynset), terms that are antonyms of hyponyms of t (athypo), terms that are antonyms of hypernyms of t (athyper), and cohyponyms of t (atcohypo).

Figure 1 shows an excerpt of a semantic net consisting of semantic relations such as synonymy (SYNO), antonymy (ANTO, including subsumed relations for converseness, contrariness, and complement) and subordination (SUB). From this semantic net, it can be inferred that animal and plant (antonyms), reptile and mammalian (antonym of synonym), vertebrate and plant (antonym of hypernym/hyponym), and vertebrate and invertebrate (cohyponyms) are a-terms. 2 We combine different semantic information resources to cre1http://www.promt.ru/, http://www.e-promt.com/ 2Note that in a more complete example, plant and animal would also be cohyponyms of a more general concept, and vertebrate and invertebrate might be considered antonyms. ate the semantic net holding this background knowledge, including the computer lexicon HaGenLex [ 6 ], a mapping of HaGenLex concepts to GermaNet synonym sets [ 1 ], the GIRT-Thesaurus (for hyponym relations) and semantic subordination relations semi-automatically extracted from German noun compounds in text corpora.

reptile Kriechtier tetrapod Landwirbeltier

SUB vertebrate Wirbeltier

SUB mammal Säugetier

SUB SUB fish

Fisch

SYNO mammalian animal

Tier SUB

ANTO plant

Pflanze SUB invertrebrate wirbelloses Tier

Using queries and relevance assessments from the GIRT task in 2005, we created a statistics on cooccurrence of query terms and their antagonistic terms in documents assessed as relevant and in other (nonrelevant) documents. Table 1 gives an overview over the difference in percentage of term cooccurrence in documents assessed as relevant and other (non-relevant) documents in the GIRT collection. This statistics serves to determine to what amount the score of a document in the result set should be adjusted. For example, a document D with a score SD that contains a search term A but does contain its cohyponym B will have its score increased. • In general, document scores for documents containing neither a term Ai nor its a-term Bj are decreased (last row).

2.3 The reranking formula

1. Let D be the initial document set (1000 documents) 2. Let q be the set of query terms Ai 3. Let SDmax be the highest score of all documents in D 4. For each Ai ∈ q: • Let B be the set of a-terms Bj for Ai • For each Bj ∈ B: – For each Dk ∈ D: ∗ Compute the new score Snew of document Dk according to

Formula 1 and assign it to Dk 5. Normalize all documents scores SDk so that all values fall into the interval [0, · · · , SDmax] 6. Sort D according to new values SDk and return reranked result set

3 A pre-test: reranking results from CLEF 2004

In a pre-test of the reranking algorithm, the data consists of queries from GIRT 2004, the corresponding relevance assessments, and the GIRT document corpus. Experiments with different values for the factor c were performed. Table 2 shows the mean average precision (MAP) for our official run from 2004, and MAP for the reranked result set for different values of the factor c. The precision was significantly increased from 0.2446 to 0.2986 MAP for c = 0.01.

4 CLEF 2006: reranking results

For the runs submitted for relevance assessment, we employed the experimental setup for the domainspecific task at CLEF in 2005: using query expansion with semantically related terms, and blind feedback for the topic fields title and description. For the bilingual experiments, queries were translated by the Promt online machine translation service. Settings for the following parameters were varied: • LA: obtain search terms by a linguistic analysis (see [ 8 ]) • RS: rerank result set (as described in Sect. 2)

For experiments with reranking, the factor c was set to 0.025. Table 3 and Table 4 show results for official runs and additional runs. 5

A post-test: Reranking results of the Berkeley group

We performed an additional experiment with an even higher MAP of the initial result set, using results of an unofficial run from the Berkeley group.3 The experiments of the Berkeley group were based on the setup for their participation at the GIRT task in 2005 (see [ 10 ]). Reranking applied on the results set 3Thanks to Vivien Petras at UC Berkeley for providing the data. found by Berkeley, which has an average MAP of 0.4343 for the monolingual German task (3212 rel_ret), significantly lowered performance to 0.3653 MAP. 6

Discussion of results

We performed different sets of experiments with reranking initial result sets for the domain-specific task at CLEF. In a pre-test that was based on the data from CLEF 2004 and results submitted in 2004, reranking increased the MAP from 0.2446 to 0.2976 (+ 21.6%) change). As a single result set is used as input for reranking experiments, recall is not affected.

Results for the official experiments indicate that reranking does not significantly change the MAP. For the monolingual run, MAP dropped from 0.3205 to 0.3179 in one pair of experiments and rose from 0.3525 to 0.3539 in another. For a bilingual pair of comparable experiments, MAP dropped from 0.2190 to 0.2180.

An additional reranking experiment was based on data provided by the Berkeley group with their setup from 2005. The MAP decreased from 0.4343 to 0.3653 when reranking was applied to this data.

There are several explanations as to why precision is affected so differently: • There may be different intrinsic characteristics for the domain-specific query topics in 2004 and 2006 (the GIRT data did not change), i.e. there may have been fewer antagonistic terms found for the query terms in 2006. We did not have time to test this hypothesis. • The dampening factor c was not fine-tuned for the retrieval method employed to obtain the initial result set: for the experiments in 2004, we used a database management system with a tf-idf IR model, while for GIRT 2006, the OKAPI/BM25 IR model was applied. The corresponding result sets show a different range and distribution of document scores. Thus, the effect of reranking document with the proposed method may depend on the retrieval method employed to obtain the initial results. • Reranking will obviously become harder the better the initial precision already is. The results from the Berkeley group will be more difficult to improve, as they already have a high precision. • The dampening factor c should have been initialized with a lower value. Due to time constraints, our group did not have time to repeat reranking experiments with different and more conservative values of c. 7

Conclusion

In this paper, a novel method to rerank documents was presented. It is based on a combination of information about antagonistic relations between terms in queries and documents and their cooccurrence. Different evaluations for this method were presented, showing mixed results.

For a pre-test with CLEF data from 2004, a performance increase in precision was observed. Official results for CLEF 2006 show no major changes, and an additional experiment based on data from the Berkeley group even shows a decrease in precision.

While the pre-test showed that our reranking approach should work in general, the official and additional experiments indicate that it becomes more difficult to increase precision the higher it already is. We plan to complete reranking experiments with different settings and analyze differences in query topics for GIRT 2004–2006.

[1]

Harald Baayen , Richard Piepenbrock, and

Leon

Gulikers . The CELEX Lexical Database. Release 2 (CD-ROM) . Linguistic Data Consortium , University of Pennsylvania, Philadelphia, Pennsylvania, 1995 .

[2] James

Callan , Zhihong Lu, and W. Bruce

Croft . Searching distributed collections with inference networks . In Proceedings of the ACM SIGIR 1995 , 1995 .

[3] Fredric

Gey , Michael Buckland, Aitao Chen, and Ray

Larson . Entry vocabulary - a technology to enhance digital search . In Proceedings of the First International Conference on Human Language Technology , March 2001 .

[4] Fredric

Gey , Aitao Chen, Jianzhang He, Liangjie Xu , and Jason Meggs . Term importance, Boolean conjunct training, negative terms, and foreign language retrieval: probabilistic algorithms at TREC-5 . In National Institute for Standards and Technology, editor, Proceedings of TREC-5, the Fifth NISTDARPA Text REtrieval Conference , pages 181 - 190 , Washington, DC, 1996 .

[5] Fredric

Gey , Hailing Jiang, Vivien Petras, and Aitao

Chen . Cross-language retrieval for the CLEF collections - comparing multiple methods of retrieval . In C. Peters, editor, Cross-Language Information Retrieval and Evaluation: Workshop of Cross-Language Evaluation Forum , CLEF 2000 , volume 2069 of Lecture Notes in Computer Science (LNCS) , pages 116 - 128 . Springer, Berlin, 2001 .

[6]

Sven

Hartrumpf , Hermann Helbig, and

Rainer

Osswald . The semantically based computer lexicon HaGenLex - Structure and technological environment . Traitement automatique des langues , 44 ( 2 ): 81 - 105 , 2003 .

[7]

Jaap

Kamps . Improving retrieval effectiveness by reranking documents based on controlled vocabulary . In Sharon McDonald and John Tait , editors, Advances in Information Retrieval: 26th European Conference on IR Research (ECIR 2004 ), volume 2997 of Lecture Notes in Computer Science (LNCS), pages 283 - 295 . Springer, Heidelberg, 2004 .

[8]

Johannes

Leveling . A baseline for NLP in domains-pecific information retrieval . In C. Peters,

F. C.

Gey ,

Gonzalo ,

Müller ,

G. J. F.

Jones ,

Kluck ,

Magnini , and M. de Rijke, editors, CLEF 2005 Proceedings, Lecture Notes in Computer Science (LNCS) . Springer, Berlin, 2006 . In print.

[9]

Johannes

Leveling and

Sven

Hartrumpf . University of Hagen at CLEF 2004: Indexing and translating concepts for the GIRT task . In C. Peters,

Clough ,

G. J. F.

Jones ,

Gonzalo ,

Kluck , and B. Magnini, editors, Multilingual Information Access for Text, Speech and Images: 5th Workshop of the Cross-Language Evaluation Forum , CLEF 2004 , volume 3491 of Lecture Notes in Computer Science (LNCS), pages 271 - 282 . Springer, Berlin, 2005 .

[10]

Vivien

Petras . How one word can make all the difference - using subject metadata for automatic query expansion and reformulation . In Carol Peters, editor, Results of the CLEF 2005 Cross-Language System Evaluation Campaign, Working Notes for the CLEF 2005 Workshop , Wien, Austria, September 2005 . Centromedia.

[11] Stephen

Robertson , Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike

Gatford . Okapi at TREC-3 . In D. Harman, editor, Proceedings of the Third Text REtrieval Conference (TREC3) , pages 109 - 126 . National Institute of Standards and Technology (NIST), Special Publication 500- 226 , 1994 .