<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>University of Hagen at CLEF2006: Reranking documents for the domain-specific task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>General Terms</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Experimentation, Performance, Measurement</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Johannes Leveling FernUniversität in Hagen (University of Hagen) Intelligent Information and Communication Systems (IICS) 58084 Hagen</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the participation of the IICS group at the domain-specific task (GIRT) of the CLEF campaign 2006. The focus of our retrieval experiments is on trying to increase precision by reranking documents in an initial result set. The reranking method is based on antagonistic terms, i.e. terms with a semantics different from the terms in a query, for example antonyms or cohyponyms of search terms. We analyzed GIRT data from 2005, i.e. the cooccurrence of search terms and antagonistic terms in documents that were assessed as relevant versus non-relevant documents to derive values for recalculating document scores. Several experiments were performed, using different initial result sets as a starting point. A pre-test with GIRT 2004 data showed a significant increase in mean average precision (a change from 0.2446 mean average precision to 0.2986 MAP). Precision for the official runs for the domain specific task at CLEF 2006 did not change significantly, but the best experiment submitted included a reranking of result documents (0.3539 MAP). In an additional reranking experiment that was run on a result set with an already high MAP (provided by the Berkeley group), a significant decrease in precision was observed (MAP dropped from 0.4343 to 0.3653 after reranking). There are several explanations for these results: First, a simple and obvious explanation is that improving precision by reranking becomes more difficult the better initial results already are. Second, our calculation of new scores includes a factor with a value that was probably chosen too high. We plan to perform additional experiments with more conservative values for this factor.</p>
      </abstract>
      <kwd-group>
        <kwd>H</kwd>
        <kwd>3</kwd>
        <kwd>1 [Information Storage and Retrieval]</kwd>
        <kwd>Content Analysis and Indexing</kwd>
        <kwd>Indexing methods</kwd>
        <kwd>Linguistic processing</kwd>
        <kwd>H</kwd>
        <kwd>3</kwd>
        <kwd>3 [Information Storage and Retrieval]</kwd>
        <kwd>Information Search and Retrieval</kwd>
        <kwd>Query formulation</kwd>
        <kwd>Search process</kwd>
        <kwd>H</kwd>
        <kwd>3</kwd>
        <kwd>4 [Information Storage and Retrieval]</kwd>
        <kwd>Systems and Software</kwd>
        <kwd>Performance evaluation (efficiency and effectiveness)</kwd>
        <kwd>I</kwd>
        <kwd>2</kwd>
        <kwd>4 [Artificial Intelligence]</kwd>
        <kwd>Knowledge Representation Formalisms and Methods</kwd>
        <kwd>Semantic networks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>There are several successful methods for improving performance in information retrieval (IR), such as
stemming search terms and document terms to increase recall or expanding a query with related terms to
increase precision. For our participation at the domain-specific task in CLEF 2006, a method for reranking
documents in the initial result set to increase precision was investigated. The method determines a set of
antagonistic terms, i.e. terms that are antonyms or cohyponyms of search terms, and it reduces the score
(and subsequently the rank) of documents containing these terms. As some search terms will occur in a text
together with their corresponding antagonistic terms frequently (e.g., “day and night”, “man and woman”,
“black and white”), the factor of cooccurrence with antagonistic terms is considered as well to adapt the
calculation of new scores.</p>
      <p>
        For the retrieval experiments, the setup for our previous participations at the domain-specific task was
used. It is described in more detail in [
        <xref ref-type="bibr" rid="ref8 ref9">9, 8</xref>
        ]. The setup includes of a deep linguistic analysis, query
expansion with semantically related terms, blind feedback, an entry vocabulary module (EVM, see [
        <xref ref-type="bibr" rid="ref3 ref5">5,
3</xref>
        ]), and several retrieval functions implemented in the Cheshire II DBMS: tf-idf , Okapi/BM25 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and
Cori/InQuery [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. For the bilingual experiments, a single online translation service, Promt1, was employed
to translate English topic titles and topic descriptions into German.
2
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>The idea</title>
      <sec id="sec-2-1">
        <title>Reranking with information about antagonistic terms</title>
        <p>
          There has already been some research on reranking documents to increase precision in IR. Gey et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]
describe experiments with Boolean filters and negative terms for TREC data. In general, this method does
not provide a significant improvement, but an analysis for specific topics shows a potential for performance
increase. Our group regards Boolean filters to be too restrictive to help improve precision. Furthermore,
the case of a cooccurrence of term and filter term (or antagonistic term) in queries or documents is not
addressed.
        </p>
        <p>
          Kamps [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] describes a method for reranking using a dimensional global analysis of data. The
evaluation of this method is based on GIRT (German Indexing and Retrieval Testdatabase) and Amaryllis data.
The observed improvement for the GIRT data is lower but on the same order as the increase in precision
observed in our pre-test (see Sect. 3). While this approach is promising, it relies on a controlled vocabulary
and therefore will not be portable between domains or even different text corpora.
        </p>
        <p>For our experiments in the domain-specific task for CLEF 2006 (GIRT), the general idea is to rerank
documents in the result set (1000 documents) by combining information about semantic relations between
terms (here: antagonistic relations such as antonymy or cohyponymy) as well as statistic information about
the cooccurrence frequency of a term and its antagonistic terms (short: “a-terms” ). Reranking consists of
decreasing a document score whenever a query term and one of its a-terms are found in the document.
2.2</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Types of antagonistic relations</title>
      <p>We introduce the notion of antagonistic terms, meaning terms with a semantics different from search terms.
For a given search term t, the set of antagonistic terms at contains terms that are antonyms of t (atword),
terms that are antonyms of a member in the set of synonyms of t (atsynset), terms that are antonyms of
hyponyms of t (athypo), terms that are antonyms of hypernyms of t (athyper), and cohyponyms of t (atcohypo).</p>
      <p>
        Figure 1 shows an excerpt of a semantic net consisting of semantic relations such as synonymy (SYNO),
antonymy (ANTO, including subsumed relations for converseness, contrariness, and complement) and
subordination (SUB). From this semantic net, it can be inferred that animal and plant (antonyms), reptile and
mammalian (antonym of synonym), vertebrate and plant (antonym of hypernym/hyponym), and vertebrate
and invertebrate (cohyponyms) are a-terms. 2 We combine different semantic information resources to
cre1http://www.promt.ru/, http://www.e-promt.com/
2Note that in a more complete example, plant and animal would also be cohyponyms of a more general concept, and vertebrate
and invertebrate might be considered antonyms.
ate the semantic net holding this background knowledge, including the computer lexicon HaGenLex [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
a mapping of HaGenLex concepts to GermaNet synonym sets [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the GIRT-Thesaurus (for hyponym
relations) and semantic subordination relations semi-automatically extracted from German noun compounds
in text corpora.
      </p>
      <p>reptile
Kriechtier
tetrapod
Landwirbeltier</p>
      <p>SUB
vertebrate
Wirbeltier</p>
      <p>SUB
mammal
Säugetier</p>
      <p>SUB SUB
fish</p>
      <p>Fisch</p>
      <sec id="sec-3-1">
        <title>SYNO mammalian animal</title>
        <p>Tier
SUB</p>
      </sec>
      <sec id="sec-3-2">
        <title>ANTO plant</title>
        <p>Pflanze
SUB
invertrebrate
wirbelloses Tier</p>
        <p>Using queries and relevance assessments from the GIRT task in 2005, we created a statistics on
cooccurrence of query terms and their antagonistic terms in documents assessed as relevant and in other
(nonrelevant) documents. Table 1 gives an overview over the difference in percentage of term cooccurrence in
documents assessed as relevant and other (non-relevant) documents in the GIRT collection. This statistics
serves to determine to what amount the score of a document in the result set should be adjusted. For
example, a document D with a score SD that contains a search term A but does contain its cohyponym B will
have its score increased.
• In general, document scores for documents containing neither a term Ai nor its a-term Bj are
decreased (last row).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>2.3 The reranking formula</title>
      <p>1. Let D be the initial document set (1000 documents)
2. Let q be the set of query terms Ai
3. Let SDmax be the highest score of all documents in D
4. For each Ai ∈ q:
• Let B be the set of a-terms Bj for Ai
• For each Bj ∈ B:
– For each Dk ∈ D:
∗ Compute the new score Snew of document Dk according to</p>
      <p>Formula 1 and assign it to Dk
5. Normalize all documents scores SDk so that all values fall into
the
interval [0, · · · , SDmax]
6. Sort D according to new values SDk and return reranked result set</p>
      <sec id="sec-4-1">
        <title>3 A pre-test: reranking results from CLEF 2004</title>
        <p>In a pre-test of the reranking algorithm, the data consists of queries from GIRT 2004, the corresponding
relevance assessments, and the GIRT document corpus. Experiments with different values for the factor c
were performed. Table 2 shows the mean average precision (MAP) for our official run from 2004, and MAP
for the reranked result set for different values of the factor c. The precision was significantly increased from
0.2446 to 0.2986 MAP for c = 0.01.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4 CLEF 2006: reranking results</title>
        <p>
          For the runs submitted for relevance assessment, we employed the experimental setup for the
domainspecific task at CLEF in 2005: using query expansion with semantically related terms, and blind feedback
for the topic fields title and description. For the bilingual experiments, queries were translated by the Promt
online machine translation service. Settings for the following parameters were varied:
• LA: obtain search terms by a linguistic analysis (see [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ])
• RS: rerank result set (as described in Sect. 2)
        </p>
        <p>For experiments with reranking, the factor c was set to 0.025. Table 3 and Table 4 show results for
official runs and additional runs.
5</p>
      </sec>
      <sec id="sec-4-3">
        <title>A post-test: Reranking results of the Berkeley group</title>
        <p>
          We performed an additional experiment with an even higher MAP of the initial result set, using results
of an unofficial run from the Berkeley group.3 The experiments of the Berkeley group were based on
the setup for their participation at the GIRT task in 2005 (see [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]). Reranking applied on the results set
3Thanks to Vivien Petras at UC Berkeley for providing the data.
found by Berkeley, which has an average MAP of 0.4343 for the monolingual German task (3212 rel_ret),
significantly lowered performance to 0.3653 MAP.
6
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>Discussion of results</title>
        <p>We performed different sets of experiments with reranking initial result sets for the domain-specific task at
CLEF. In a pre-test that was based on the data from CLEF 2004 and results submitted in 2004, reranking
increased the MAP from 0.2446 to 0.2976 (+ 21.6%) change). As a single result set is used as input for
reranking experiments, recall is not affected.</p>
        <p>Results for the official experiments indicate that reranking does not significantly change the MAP. For
the monolingual run, MAP dropped from 0.3205 to 0.3179 in one pair of experiments and rose from 0.3525
to 0.3539 in another. For a bilingual pair of comparable experiments, MAP dropped from 0.2190 to 0.2180.</p>
        <p>An additional reranking experiment was based on data provided by the Berkeley group with their setup
from 2005. The MAP decreased from 0.4343 to 0.3653 when reranking was applied to this data.</p>
        <p>There are several explanations as to why precision is affected so differently:
• There may be different intrinsic characteristics for the domain-specific query topics in 2004 and
2006 (the GIRT data did not change), i.e. there may have been fewer antagonistic terms found for
the query terms in 2006. We did not have time to test this hypothesis.
• The dampening factor c was not fine-tuned for the retrieval method employed to obtain the initial
result set: for the experiments in 2004, we used a database management system with a tf-idf IR
model, while for GIRT 2006, the OKAPI/BM25 IR model was applied. The corresponding result sets
show a different range and distribution of document scores. Thus, the effect of reranking document
with the proposed method may depend on the retrieval method employed to obtain the initial results.
• Reranking will obviously become harder the better the initial precision already is. The results from
the Berkeley group will be more difficult to improve, as they already have a high precision.
• The dampening factor c should have been initialized with a lower value. Due to time constraints, our
group did not have time to repeat reranking experiments with different and more conservative values
of c.
7</p>
      </sec>
      <sec id="sec-4-5">
        <title>Conclusion</title>
        <p>In this paper, a novel method to rerank documents was presented. It is based on a combination of
information about antagonistic relations between terms in queries and documents and their cooccurrence. Different
evaluations for this method were presented, showing mixed results.</p>
        <p>For a pre-test with CLEF data from 2004, a performance increase in precision was observed. Official
results for CLEF 2006 show no major changes, and an additional experiment based on data from the
Berkeley group even shows a decrease in precision.</p>
        <p>While the pre-test showed that our reranking approach should work in general, the official and
additional experiments indicate that it becomes more difficult to increase precision the higher it already is. We
plan to complete reranking experiments with different settings and analyze differences in query topics for
GIRT 2004–2006.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Harald</surname>
          </string-name>
          <string-name>
            <surname>Baayen</surname>
          </string-name>
          , Richard Piepenbrock, and
          <string-name>
            <given-names>Leon</given-names>
            <surname>Gulikers</surname>
          </string-name>
          .
          <article-title>The CELEX Lexical Database. Release 2 (CD-ROM)</article-title>
          .
          <source>Linguistic Data Consortium</source>
          , University of Pennsylvania, Philadelphia, Pennsylvania,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>James</surname>
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Callan</surname>
            , Zhihong Lu, and
            <given-names>W. Bruce</given-names>
          </string-name>
          <string-name>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>Searching distributed collections with inference networks</article-title>
          .
          <source>In Proceedings of the ACM SIGIR</source>
          <year>1995</year>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Fredric</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gey</surname>
          </string-name>
          , Michael Buckland, Aitao Chen, and
          <string-name>
            <surname>Ray</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Larson</surname>
          </string-name>
          .
          <article-title>Entry vocabulary - a technology to enhance digital search</article-title>
          .
          <source>In Proceedings of the First International Conference on Human Language Technology</source>
          ,
          <year>March 2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Fredric</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gey</surname>
          </string-name>
          , Aitao Chen, Jianzhang He,
          <string-name>
            <surname>Liangjie Xu</surname>
            ,
            <given-names>and Jason</given-names>
          </string-name>
          <string-name>
            <surname>Meggs</surname>
          </string-name>
          .
          <article-title>Term importance, Boolean conjunct training, negative terms, and foreign language retrieval: probabilistic algorithms at TREC-5</article-title>
          . In National Institute for Standards and Technology, editor,
          <source>Proceedings of TREC-5, the Fifth NISTDARPA Text REtrieval Conference</source>
          , pages
          <fpage>181</fpage>
          -
          <lpage>190</lpage>
          , Washington, DC,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Fredric</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gey</surname>
            , Hailing Jiang, Vivien Petras, and
            <given-names>Aitao</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>Cross-language retrieval for the CLEF collections - comparing multiple methods of retrieval</article-title>
          . In C. Peters, editor,
          <source>Cross-Language Information Retrieval and Evaluation: Workshop of Cross-Language Evaluation Forum</source>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2000</year>
          , volume
          <volume>2069</volume>
          <source>of Lecture Notes in Computer Science (LNCS)</source>
          , pages
          <fpage>116</fpage>
          -
          <lpage>128</lpage>
          . Springer, Berlin,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Sven</given-names>
            <surname>Hartrumpf</surname>
          </string-name>
          , Hermann Helbig, and
          <string-name>
            <given-names>Rainer</given-names>
            <surname>Osswald</surname>
          </string-name>
          .
          <article-title>The semantically based computer lexicon HaGenLex - Structure and technological environment</article-title>
          .
          <source>Traitement automatique des langues</source>
          ,
          <volume>44</volume>
          (
          <issue>2</issue>
          ):
          <fpage>81</fpage>
          -
          <lpage>105</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Jaap</given-names>
            <surname>Kamps</surname>
          </string-name>
          .
          <article-title>Improving retrieval effectiveness by reranking documents based on controlled vocabulary</article-title>
          .
          <source>In Sharon McDonald and John Tait</source>
          , editors,
          <source>Advances in Information Retrieval: 26th European Conference on IR Research (ECIR</source>
          <year>2004</year>
          ), volume
          <volume>2997</volume>
          of Lecture Notes in Computer Science (LNCS), pages
          <fpage>283</fpage>
          -
          <lpage>295</lpage>
          . Springer, Heidelberg,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Johannes</given-names>
            <surname>Leveling</surname>
          </string-name>
          .
          <article-title>A baseline for NLP in domains-pecific information retrieval</article-title>
          . In C. Peters,
          <string-name>
            <given-names>F. C.</given-names>
            <surname>Gey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. J. F.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kluck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Magnini</surname>
          </string-name>
          , and M. de Rijke, editors,
          <source>CLEF 2005 Proceedings, Lecture Notes in Computer Science (LNCS)</source>
          . Springer, Berlin,
          <year>2006</year>
          . In print.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Johannes</given-names>
            <surname>Leveling</surname>
          </string-name>
          and
          <string-name>
            <given-names>Sven</given-names>
            <surname>Hartrumpf</surname>
          </string-name>
          . University of Hagen at CLEF 2004:
          <article-title>Indexing and translating concepts for the GIRT task</article-title>
          . In C. Peters,
          <string-name>
            <given-names>P.</given-names>
            <surname>Clough</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. J. F.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kluck</surname>
          </string-name>
          , and B. Magnini, editors,
          <source>Multilingual Information Access for Text, Speech and Images: 5th Workshop of the Cross-Language Evaluation Forum</source>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2004</year>
          , volume
          <volume>3491</volume>
          of Lecture Notes in Computer Science (LNCS), pages
          <fpage>271</fpage>
          -
          <lpage>282</lpage>
          . Springer, Berlin,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Vivien</given-names>
            <surname>Petras</surname>
          </string-name>
          .
          <article-title>How one word can make all the difference - using subject metadata for automatic query expansion and reformulation</article-title>
          . In Carol Peters, editor,
          <source>Results of the CLEF 2005 Cross-Language System Evaluation Campaign, Working Notes for the CLEF 2005 Workshop</source>
          , Wien, Austria,
          <year>September 2005</year>
          . Centromedia.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Stephen</surname>
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Robertson</surname>
            , Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and
            <given-names>Mike</given-names>
          </string-name>
          <string-name>
            <surname>Gatford</surname>
          </string-name>
          .
          <article-title>Okapi at TREC-3</article-title>
          . In D. Harman, editor,
          <source>Proceedings of the Third Text REtrieval Conference (TREC3)</source>
          , pages
          <fpage>109</fpage>
          -
          <lpage>126</lpage>
          . National Institute of Standards and Technology (NIST),
          <source>Special Publication 500- 226</source>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>