<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Report on CLEF-2005 Evaluation Campaign: Monolingual, Bilingual, and GIRT Information Retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jacques Savoy</string-name>
          <email>Jacques.Savoy@unine.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierre-Yves Berger</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Natural Language Processing with European Languages, Bilingual Information Retrieval</institution>
          ,
          <addr-line>Digital Libraries, Hungarian Language, Bulgarian Language, Portuguese Language, French Language</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Neuchatel</institution>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>For our fifth participation in the CLEF evaluation campaigns, the first objective was to propose an effective and general stopword list along with a light stemming procedure for the Hungarian, Bulgarian and Portuguese (Brazilian) languages. Our second objective was to obtain a better picture of the relative merit of various search engines when processing documents in those languages. To do so we evaluated our scheme using two probabilistic models and nine vectorprocessing approaches. In the bilingual track, we evaluated both the machine translation and bilingual dictionary approaches to automatically translate a query submitted in English into various target languages. This year we explored new freely available translation sources, together with a combined query translation approach in order to obtain a better translation of the user's information need. Finally, using the GIRT corpora (available in English, German and Russian), we investigated variations in retrieval effectiveness when including or excluding manually assigned keywords attached to bibliographic records (mainly comprising a title and an abstract).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Since 2001 our research group has been investigating effective information retrieval (IR) techniques when
handling a variety of natural languages
        <xref ref-type="bibr" rid="ref8 ref9">(Savoy 2004a; 2005a)</xref>
        in order to improve both monolingual and
bilingual searches. Continuing along this same stream, our participation in the CLEF 2005 evaluation
campaign will target various objectives. First, our aim is to propose linguistic tools for less frequently spoken
languages such as Bulgarian and Hungarian, to explore the underlying IR problems with closely related
languages such as Portuguese and Brazilian, and to explore new alternatives when translating a query from one
source language (English in this study) to other target languages (more precisely the French, Portuguese,
Bulgarian and Hungarian languages). The domain-specific GIRT corpus presents other interesting features,
namely questions related to digital libraries with a collection comprising a large number of bibliographic
records.
      </p>
      <p>In addition to these particular objectives, various interesting problems must be analyzed and resolved. All
languages are not written with the same alphabet, and Bulgarian for example uses the Cyrillic alphabet. The
presence of diacritics in others also raises certain questions that directly affect the effectiveness of IR systems.
Can we simply ignore them? Do they have a real impact on mean average precision? Does the distinction
between uppercase and lowercase letters really influence information retrieval systems or does this distinction
need only be preserved when high search precision is required?</p>
      <p>In our work we have assumed that the semantic content of documents (or requests) is mainly linked to nouns
and adjectives, and thus an effective search system can be based on the use of an appropriate set of weighted
keywords extracted from corresponding documents (or requests). Based on this assumption, we designed a set
of stopword lists and light stemming procedures for certain European and Asian languages. Following our
suggestion, these linguistic tools were designed to automatically remove the inflectional suffixes attached to
nouns and adjectives linked to gender (masculine, feminine, neural), to number (singular or plural), and to case
(nominative, dative, ablative, etc.). Needless to say we were also interested in other linguistic phenomena, such
as compound constructions (does an effective IR system really need to decompound them and is this linguistic
phenomenon really important for the retrieval of languages other than German?)</p>
      <p>The rest of this paper is organized as follows: Section 2 describes the main characteristics of the
CLEF2005 test-collection, Section 3 outlines the main aspects of our stopword lists and light stemming procedures.
Section 4 analyses the principal features of different indexing and search strategies, and evaluates their use with
the four corpora. The data fusion approaches adapted in our experiments are explained in Section 5, and
Section 6 depicts our official results. Our bilingual experiments are presented and evaluated in Section 7 while
Section 8 describes our experiments involving the domain-specific GIRT corpus.</p>
    </sec>
    <sec id="sec-2">
      <title>2 Overview of the Test-Collections</title>
      <p>The corpora used in our experiments include newspaper and news agency articles, namely Le Monde
(19941995, French), SDA (1994-1995, French), Público (1994-1995, Portuguese), Folha (1994-1995, Brazilian),
Magyar Hirlap (2002, Hungarian), Sega (2002, Bulgarian), Standart (2002, Bulgarian). As shown in Table 1,
the Portuguese corpus (212.9 indexing terms / document) has a larger mean size article than the French
collection (178). This mean value is relatively similar for the Bulgarian (133.7) and Hungarian (142.1)
languages. It is interesting to note that even though the Hungarian collection is the smallest (105 MB), it
contains the largest number of distinct indexing terms (657,132), computed after stemming.</p>
      <sec id="sec-2-1">
        <title>French Portuguese</title>
        <p>Size (in MB) 487 MB 564 MB
# of documents 177,452 210,734
# of distinct terms 455,366 582,117
Number of distinct indexing terms / document
Mean 127.8 153.5
Standard deviation 106.57 114.95
Median 92 129
Maximum 2,645 2,655
Minimum 1 1
Number of indexing terms / document
Mean 178 212.9
Standard deviation 159.87 186.4
Median 126 171
Maximum 6,720 7,554
Minimum 1 1
Number of queries
Number rel. items
Mean rel./ request
Standard deviation
Median
Maximum
Minimum</p>
        <p>50
2,537
50.74
45.349
35.5
185 (Q#253)
1 (Q#255)</p>
        <p>50
2,904
58.08
50.415</p>
        <p>44
239 (Q#286)
2 (Q#258)</p>
        <p>Bulgarian
213 MB
69,195
414,253</p>
        <p>During the indexing process in our automatic runs, we retained only the following logical sections from the
original documents: &lt;TITLE&gt;, &lt;TEXT&gt;, &lt;LEAD&gt;, &lt;LEAD1&gt;, &lt;TX&gt;, &lt;LD&gt;, &lt;TI&gt; and &lt;ST&gt;. For this restriction
we found 1,854 documents in the Bulgarian collection to have no indexable content (for example, they may
correspond to articles containing only a picture with the tags &lt;PICTURE&gt;, &lt;IMGTEXT&gt; and &lt;IMGAUTHOR&gt;).
From the topic descriptions we automatically removed certain phrases such as “Relevant document report …”,
“Finde Dokumente, die über …”, “Keressünk olyan cikkeket, amelyek …” or “Trouver des documents qui …”,
etc. As shown in the Appendix, the available topics cover various subjects (e.g., “Anti-Smoking Legislation”,
“Football Refereeing Disputes”, or “Lottery Winnings”), including both regional (“Swiss Referendums”) or
international coverage (“Anti-abortion Movements”).</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 Stopword Lists and Stemming Procedures</title>
      <p>In order to define general stopword lists, we first created a list of the top 200 most frequent words found in
the various languages, from which some words were removed (e.g., police, minister, president, Magyar). From
this list of very frequent words, we added articles, pronouns, prepositions, conjunctions or very frequently
occurring verb forms (e.g., to be, is, has, etc.). Based on this scheme, we created a new list for the Bulgarian
and Hungarian languages (these lists are available at www.unine.ch/info/clef/). Our final stopword list contained
463 words for the French language, 761 for Hungarian, 418 for Bulgarian and 400 for Portuguese-Brazilian (we
added 8 Brazilian words to our Portuguese stopword list. These eight words are usually variants with or
without accents, such as “vezes” in Portuguese and “vêzes” in Brazilian).</p>
      <p>Once high-frequency words were removed, our indexing procedure generally applied a stemming algorithm
in an attempt to conflate word variants into the same stem or root. In developing such a procedure, we first
wanted to remove only inflectional suffixes such as singular and plural word forms, and also feminine and
masculine forms so that they would conflate to the same root.</p>
      <p>Bulgarian involved additional morphological difficulties, given that in this language the definite article is
usually represented by a suffix. For example, “mope” (sea) becomes “mopeto” (the sea) while “mopeta” (seas)
becomes “mopetata” (the seas). The general noun pattern is as follows: &lt;stem&gt; &lt;plural&gt; &lt;article&gt;. Contrary to
other Slavic languages (such as Russian), Bulgarian does not indicate grammatical cases by adding a suffix.</p>
      <p>The Hungarian language shares certain similarities with the Finnish language (although both languages do
not belong strictly to the same family, they can be viewed as cousins). Like Finnish, Hungarian has several
number cases (usually 18) and each case has its own unambiguous form. For example, the noun “house” (“hàz”)
may appear as “hàzat” (accusative case, as in “(I see) the house”), “hàzakat” (accusative plural case, as in “(I see)
the houses”), “hàzamat” (“… my house”) or “hàzamait” (“… my houses”). In this language, the general
construction used for nouns is as follows: &lt;stem&gt; &lt;plural&gt; &lt;possessive marker&gt; &lt;case&gt;. For example, for
&lt;hàz&gt; a &lt;m&gt; a &lt;t&gt; in which the letter “a” is introduced to facilitate better pronunciation (“hàzmt” could be
difficult to pronounce). From the IR point of view, certain linguistic aspects in Hungarian are viewed as good
news. For example, a gender distinction is not attached to each noun (like in English) and adjectives are
invariable, as in “… a szép hàzat” (“a beautiful house”) or “… a szép hàzamat” (“my beautiful house”). Our
suggested stemming procedures for these languages can be found at www.unine.ch/info/clef/.</p>
      <p>Diacritic characters are usually not present in English collections (with certain exceptions, such as “résumé”
or “cliché”). For the Hungarian, and Portuguese languages, these characters were replaced by their corresponding
non-accentuated letter. Removing accents may however generate some semantic ambiguity (e.g., between “kor”
(“age”) and “kór” (“illness”), or “ver” (“hurt”) and “vér” (“blood”) in Hungarian language).</p>
      <p>
        Finally, most European languages manifest other morphological characteristics, with compound word
constructions being only one example (e.g., handgun, worldwide). Recently,
        <xref ref-type="bibr" rid="ref2">Braschler &amp; Ripplinger (2004)</xref>
        showed that decompounding German words could significantly improve retrieval performance, and in some
experiments with Hungarian where we used our decompounding algorithm
        <xref ref-type="bibr" rid="ref2 ref8 ref9">(Savoy 2004b)</xref>
        , both compound
words and their component parts were left in the documents and queries.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4 Indexing and Searching Strategies</title>
      <p>In order to obtain a broader view of the relative merit of various retrieval models, we first adopted a binary
indexing scheme in which each document (or request) was represented by a set of keywords, without any weight.
To measure the similarity between documents and requests, we computed the inner product (retrieval model
denoted “doc=bnn, query=bnn” or “bnn-bnn”). In order to weight the presence of each indexing term in a
document surrogate (or in a query), we took the term occurrence frequency into account (denoted tfij for indexing
term tj in document Di, and the corresponding retrieval model was denoted: “doc=nnn, query=nnn”) or we might
also account for their inverse document frequency (denoted idfj). Moreover, we might normalize each indexing
weight using different weighting schemes, as is described in the Appendix.</p>
      <p>
        In addition to these models based on the vector-space paradigm, we also considered probabilistic models
such as the Okapi model
        <xref ref-type="bibr" rid="ref6">(Robertson et al. 2000)</xref>
        . As a second probabilistic approach, we implemented the
Prosit approach, one member of a family of models suggested by
        <xref ref-type="bibr" rid="ref1">Amati &amp; van Rijsbergen (2002</xref>
        ) and based on
combining two information measures, formulated as follows:
Prob1ij = tfnij / (tfnij + 1)
      </p>
      <p>with tfnij = tfij · log2[1 + ((C · mean dl) / li)]
Prob2ij = [1 / (1+lj)] · [lj / (1+lj)]tfnij</p>
      <p>with lj = tcj / n
where wij indicates the indexing weight attached to term tj in document Di, li the number of indexing terms
included in the representation of Di, where tcj represents the number of occurrences of term tj in the collection and
n the number of documents in the corpus. In our experiments, the constants b, k1, avdl, pivot, slope, C and
mean dl were fixed according to the values listed in Table!2 (the German, English and Russian languages are
used in the GIRT experiments).</p>
      <p>
        To measure the retrieval performance, we adopted non-interpolated mean average precision (MAP) (computed
on the basis of 1,000 retrieved items per request by the new TREC-EVAL program). To statistically determine
whether or not a given search strategy would be better than another, we applied the bootstrap methodology
        <xref ref-type="bibr" rid="ref7">(Savoy 1997)</xref>
        . Thus, in the tables included in this paper we underlined statistically significant differences using
on a two-sided non-parametric bootstrap test, and based on the MAP difference with a significance level fixed at
5%.
We indexed the different collections using words as indexing units. The evaluations of our two probabilistic
models and nine vector-space schemes are listed in Table 3 for the French and Portuguese corpus, and in Table 4
for the Bulgarian and Hungarian collection. In these tables, the best performance under given conditions (with
the same indexing scheme and the same collection) is listed in bold type. Based on the best performance, this
approach is also used as a baseline for our statistical testing. The underlined results therefore indicate that the
difference in mean average precision can be viewed as statistically significant when compared to the best system
value. As depicted in Table 3, the Okapi model was found to be the best IR model for French and Portuguese
collection. For these two corpora however, the difference in MAP between the various IR models is usually
statistically significant. As shown in Table 4 (and in Table A.4 in the Appendix) similar conclusions can be
drawn for the Bulgarian and Hungarian collection. In this case the best performing system was the Prosit model
for Bulgarian, and the Okapi probabilistic approach for Hungarian. Moreover five IR models were shown to
have similar statistical performance levels (Okapi, Prosit, “doc=Lnu, query=ltc”, “doc=dtu, query=dtn”,
“doc=atn, query=ntc”).
      </p>
      <p>
        Moreover, the data in these tables shows that when the number of search terms increases (from T, TD to
TDN), retrieval effectiveness usually increases also (except for the “doc=bnn, query=bnn” or “doc=nnn,
query=nnn” IR models). From an analysis of the five best retrieval schemes shown in Tables 3 and 4 (namely,
Prosit, Okapi, “doc=Lnu, query=ltc”, “doc=dtu, query=dtn” and “doc=atn, query=ntc”), the improvement is
around 33.4% when comparing title-only (or T) with TDN queries for the Portuguese collection, 31.3% when
comparing the French corpus, 21% for Hungarian (see Table A.4 in the Appendix), and 6.4% for the Bulgarian
collection.
With the Hungarian collection, we automatically decompounded long words (composed by more than 8
characters) using our own algorithm
        <xref ref-type="bibr" rid="ref2 ref8 ref9">(Savoy 2004b)</xref>
        . In this experiment, both the compound words and their
components were left in documents and queries (under the label “TD-decomp“ in Table 4). Using the TD
queries and the Okapi model, we achieved a MAP of 0.3391, reflecting a degradation of -3.1% when compared
to an indexing approach that did not use decompounding. Based on the five best retrieval schemes, the mean
degradation is around -1.6%. Using a lighter stemmer (less rules) for the Hungarian language (retrieval
performance depicted under the label “TD-light“ in Table 4), the mean difference in MAP over the five best
retrieval schemes is around 2% and in favor of a more complex stemming approach.
      </p>
      <p>
        It was observed that pseudo-relevance feedback (PRF or blind-query expansion) seemed to be a useful
technique for enhancing retrieval effectiveness. In this study, we adopted Rocchio's approach
        <xref ref-type="bibr" rid="ref3">(Buckley et
al. 1996)</xref>
        with a = 0.75, b = 0.75, whereby the system was allowed to add m terms extracted from the k best
ranked documents from the original query. To evaluate this proposition, we used the Okapi and the Prosit
probabilistic models and enlarged the query by the 10 to 50 terms retrieved from the 3 to 10 best-ranked articles.
      </p>
      <p>Table 5 depicts our best results using pseudo-relevance feedback technique for the Okapi model and
demonstrates that the optimal parameter setting seemed to be collection-dependant. Moreover, performance
improvement also seemed to be collection dependant (or language dependant), with the French corpus showing
an increase of +9.2% (from a mean average precision of 0.3754 to 0.4099), +5.2% for the Portuguese collection
(from 0.3477 to 0.3668), +1.3% for the Hungarian collection (from 0.3501 to 0.3545), and +0.8% for the
Bulgarian corpus (from 0.2704 to 0.2726). Table 6 shows how similar conclusions can be drawn using the
Prosit model. In this case however, the blind query expansion depicted a greater improvement for all collections
(e.g., for the French corpus, an increase of +14.3%, from a mean average precision of 0.3696 to 0.4225). In
both Tables 5 and 6, the baseline used for our statistical testing was the MAP, calculated before the query was
automatically expanded. In this case, it is interesting to note that our statistical testing cannot always detect a
significant difference in MAP before and after blind query expansion, specially for the Bulgarian and Hungarian
collection.</p>
    </sec>
    <sec id="sec-5">
      <title>5 Data Fusion</title>
      <p>
        It is assumed that combining different search models should improve retrieval effectiveness, due to the fact
that different document representations might retrieve different pertinent items and thus increase the overall recall
        <xref ref-type="bibr" rid="ref12">(Vogt &amp; Cottrell 1999)</xref>
        . On the other hand, when combining different search schemes, we might suppose that
these various IR strategies are more likely to rank the same relevant items higher on the list than they would for
non-relevant documents (viewed as outliers). Thus, combining them could improve retrieval effectiveness by
ranking pertinent documents higher and ranking non-relevant items lower. Based on our previous studies
        <xref ref-type="bibr" rid="ref10 ref11 ref2 ref8 ref9">(Savoy 2004b, 2005a)</xref>
        , this expected positive effect does not always work.
      </p>
      <p>
        In this current study we combine only the two probabilistic models because they usually depict the best or
one of the best retrieval performances
        <xref ref-type="bibr" rid="ref10 ref11 ref2 ref8 ref9">(Savoy 2004b, 2005a)</xref>
        . To achieve this we evaluated various fusion
operators (see Table 7 for a list of their precise descriptions). For example, the Sum RSV operator indicates that
the combined document score (or the final retrieval status value) is simply the sum of the retrieval status value
(RSVk) of the corresponding document Dk computed by each single indexing scheme
        <xref ref-type="bibr" rid="ref4">(Fox &amp; Shaw 1994)</xref>
        .
Table 7 thus illustrates how both the Norm Max and Norm RSV apply a normalization procedure when
combining document scores. When combining the retrieval status value (RSVk) for various indexing schemes
and in order to favor some more efficient retrieval schemes, we could multiply the document score by a constant
ai (usually equal to 1) reflecting the differences in retrieval performance.
      </p>
      <p>Sum RSV
Norm Max
Norm RSV
Z-Score</p>
      <sec id="sec-5-1">
        <title>SUM (ai . RSVk)</title>
      </sec>
      <sec id="sec-5-2">
        <title>SUM (ai . (RSVk / Maxi))</title>
        <p>SUM [ai . ((RSVk - Mini) / (Maxi - Mini))]
ai . [((RSVk - Meani) / Stdevi) + di] with di = [(Meani - Mini) / Stdevi]</p>
        <p>In addition to using these data fusion operators, we also considered the round-robin approach, wherein we
took one document in turn from all individual lists and removed any duplicates, retaining the most highly
ranked instance. Finally we suggested merging the retrieved documents according to the Z-Score, computed for
each result list. Within this scheme, for the ith result list, we needed to compute the average RSVk value
(denoted Meani) and the standard deviation (denoted Stdevi). Based on these we could then normalize the
retrieval status value for each document Dk provided by the ith result list by computing the deviation of RSVk
with respect to the mean (Meani). In Table 7, Mini (Maxi) denotes the minimal (maximal) RSV value in the ith
result list. Of course, we might also weight the relative contribution of each retrieval scheme by assigning a
different ai value to each retrieval model.</p>
        <sec id="sec-5-2-1">
          <title>Okapi &amp; PRF doc/term</title>
          <p>Prosit &amp; PRF doc/term</p>
        </sec>
        <sec id="sec-5-2-2">
          <title>Round-robin</title>
          <p>Sum RSV
Norm Max
Norm RSV
Z-Score
Z-ScoreW</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6 Official Results</title>
    </sec>
    <sec id="sec-7">
      <title>7 Bilingual Information Retrieval</title>
      <p>For the bilingual track, we chose English as the language for submitting queries to be automatically
translated into four different languages, using nine different machine translation (MT) systems and four bilingual
dictionaries (“Babylon”, “Ectaco”, “Medios”, and “Kerekes”). The following freely available translation tools
were used in our experiments:</p>
      <p>SYSTRAN
GOOGLE
FREETRANSLATION
INTERTRAN
WORLDLINGO
BABELFISH
www.systranlinks.com/
www.google.com/language_tools
www.freetranslation.com/web.htm
www.tranexp.com/
www.worldlingo.com/
babelFish.altavista.com/</p>
      <p>- 7
PROMT
ALPHAWORKS
APPLIEDLANGUAGE
BABYLON
ECTACO
MEDIOS
KEREKES
webtranslation.paralink.com/
www.alphaWorks.ibm.com/
www.appliedLanguage.com/
www.babylon.com
www.ectaco.co.uk/free-online-dictionaries/
consulting.medios.fi/dictionary/ (only for Hungarian language)
www.cab.u-szeged.hu/cgi-bin/szotar (only for Hungarian language)</p>
      <p>When using the different bilingual dictionaries to translate an English request word-by-word, usually more
than one translation is provided, in an unspecified order. We decided to pick only the first translation available
(labeled “Babylon 1” or “Ectaco 1”), the first two terms (e.g., “Babylon 2” or “Medios 2”) or the first three
available translations (labeled “Babylon 3”).</p>
      <p>Moreover, the query terms could be preprocessed in order to obtain their part-of-speech (PoS) information
(using www.ims.unistuttgart.de/projekte/corplex/TreeTagger/). Using this information, we could find the
corresponding lemma and use it instead of the surface word before searching in the bilingual dictionaries. Once
this lemmatizing procedure was done, we added the term “+ PoS” in the corresponding run label. Table 10
contains an example of this query preprocessing, showing how the plural form was removed (e.g., “disputes”
into “dispute”) and how various verb forms were transformed into their lexical forms (e.g., “made” into “make”
or “refereeing” into “referee”).</p>
      <p>&lt;num&gt; C263 &lt;/num&gt;
&lt;title&gt; Football Refereeing Disputes &lt;/title&gt;
&lt;desc&gt; Find documents in which decisions made by a referee during a football match are criticised. &lt;/desc&gt;
&lt;narr&gt; Relevant documents report on football (soccer) matches in which the referee made some disputable or
disputed decision. &lt;/narr&gt;
&lt;num&gt; C263 &lt;/num&gt;
&lt;title&gt; Football referee dispute &lt;/title&gt;
&lt;desc&gt; find document in which decision make by a referee during a football match be criticize. &lt;/desc&gt;
&lt;narr&gt; relevant document report on football (soccer) match in which the referee make some disputable or
disputed decision. &lt;/narr&gt;
From this data, we can see that for the French collection the best translation is obtained by Google and for
the Portuguese corpus by Promt. The FreeTranslation and Promt MT systems usually obtain satisfactory
retrieval performances for these two languages (around 79.3% of the MAP obtained by the corresponding</p>
      <p>French
50 queries</p>
      <p>0.3754
0.3149 (83.9%)
0.3259 (86.8%)
0.2814 (75.0%)
0.1839 (49.0%)
0.3095 (82.5%)
0.3149 (83.9%)
0.3066 (81.7%)
0.2991 (79.7%)
0.3149 (83.9%)
monolingual search for the Promt system, and 73.6% for FreeTranslation). Other good translation systems
found were the BabelFish, Systran and AppliedLanguage which worked well for French. For Bulgarian and
Hungarian languages, we found only a few translation tools, and unfortunately their overall performance levels
were not very good. As depicted in Table 12, we also found that lemmatizing the English queries (for both the
Bulgarian or Hungarian languages at least) would improve mean average precision.</p>
      <p>Finally, Table 14 lists the parameter settings used for 12 official runs in the bilingual task. Each experiment
uses queries written in English to retrieve documents in the other target languages. Before combining the result
lists we automatically expanded the translated queries using a pseudo-relevance feedback method (Rocchio’s
approach in the present case).</p>
    </sec>
    <sec id="sec-8">
      <title>8 Monolingual Domain-Specific Retrieval: GIRT</title>
      <p>
        In the domain-specific retrieval task (called GIRT), the three available corpora are composed of bibliographic
records extracted from various sources in the social sciences domain, see
        <xref ref-type="bibr" rid="ref5">(Kluck 2004)</xref>
        for a more complete
description of these corpora. A few statistics on these collections are given in Table 15.
      </p>
      <sec id="sec-8-1">
        <title>German English</title>
        <p>Size (in MB) 326 MB 199 MB
# of documents 151,319 151,319
# of distinct terms 698,638 151,181
Number of distinct indexing terms / document</p>
        <p>Mean 70.83 107.9
Standard deviation 32.4 94.59
Median 68 77
Maximum 386 1,422</p>
        <p>Minimum 2 2
Number of indexing terms / document</p>
        <p>Mean 89.61 142.1
Standard deviation 44.5 139.84
Median 84 95
Maximum 629 4,984</p>
        <p>Minimum 4 2
Number of queries</p>
        <p>Number rel. items
Mean rel./ request
Standard deviation
Median
Maximum
Minimum</p>
        <p>25
2,682
107.28
91.654</p>
        <p>75
318 (Q#150)
8 (Q#129)</p>
        <p>25
2,105
84.2
69.109</p>
        <p>54
242 (Q#150)
6 (Q#129)
&lt;DOC&gt; &lt;DOCNO&gt; GIRT-EN19901932
&lt;TITLE-EN&gt; The Socio-Economic Transformation of a Region : the Bergische Land from 1930 to 1960
&lt;AUTHOR&gt; Henne, Franz J.
&lt;AUTHOR&gt; Geyer, Michael
&lt;PUBLICATION-YEAR&gt; 1990
&lt;LANGUAGE-CODE&gt; EN
&lt;CONTROLLED-TERM-EN&gt; Rhenish Prussia
&lt;CONTROLLED-TERM-EN&gt; historical development
&lt;CONTROLLED-TERM-EN&gt; regional development
&lt;CONTROLLED-TERM-EN&gt; socioeconomic factors
&lt;METHOD-TERM-EN&gt; historical
&lt;METHOD-TERM-EN&gt; document analysis
&lt;CLASSIFICATION-TEXT-EN&gt; Social History
&lt;DOC&gt; &lt;DOCNO&gt; GIRT-EN19902732
&lt;TITLE-EN&gt; Ethnic Politicians in Congress: German-American Case Studies on the Interaction of Ethnicity,
Nationality and Democratic Government 1865-1930
&lt;AUTHOR&gt; Adams, Willi Paul
&lt;PUBLICATION-YEAR&gt; 1990
&lt;LANGUAGE-CODE&gt; EN
&lt;CONTROLLED-TERM-EN&gt; ethnic group
&lt;CONTROLLED-TERM-EN&gt; North America …
In total theses collections contain 397,218 documents or about 590 MB, and for the most part are written in
German. A typical record in this collection is composed of a title, an abstract, and a set of manually assigned
keyword (see Table 16 for English examples and Table 17 for their corresponding German records). Additional
information such as authors' name, publication date, or the language in which the bibliographic notice is written
may of course be less important from an IR perspective but they are made available. As depicted in the
Appendix, the topics in this domain-specific collection cover a variety of themes (e.g., “Electoral Behaviour”,
“New Art”, “Soccer and Society”, or “Churches and Money”).</p>
        <p>&lt;DOC&gt; &lt;DOCNO&gt; GIRT-DE19909343
&lt;TITLE-DE&gt; Die sozioökonomische Transformation einer Region : Das Bergische Land von 1930 bis 1960
&lt;AUTHOR&gt; Henne, Franz J.
&lt;AUTHOR&gt; Geyer, Michael
&lt;PUBLICATION-YEAR&gt; 1990
&lt;LANGUAGE-CODE&gt; DE
&lt;CONTROLLED-TERM-DE&gt; Rheinland
&lt;CONTROLLED-TERM-DE&gt; historische Entwicklung
&lt;CONTROLLED-TERM-DE&gt; regionale Entwicklung
&lt;CONTROLLED-TERM-DE&gt; sozioökonomische Faktoren
&lt;METHOD-TERM-DE&gt; historisch
&lt;METHOD-TERM-DE&gt; Aktenanalyse
&lt;CLASSIFICATION-TEXT-DE&gt; Sozialgeschichte
&lt;ABSTRACT-DE&gt; Die Arbeit hat das Ziel, anhand einer regionalen Studie die Entstehung des "modernen"
fordistischen Wirtschaftssystems und des sozialen Systems im Zeitraum zwischen 1930 und 1960 zu
beleuchten; dabei geht es auch um das Studium des "Sozial-imaginären", der Veränderung von Bewußtsein
und Selbst-Verständnis von Arbeitern durch das Erlebnis und die Erfahrung der Depression, des
Nationalsozialismus und der Nachkriegszeit, welches sich in den 1950er Jahren gemeinsam mit der
wirtschaftlichen Veränderung zu einem neuen "System" zusammenfügt.
&lt;DOC&gt; &lt;DOCNO&gt; GIRT-DE19909106
&lt;TITLE-DE&gt; Politiker einer ethnischen Gruppe im Kongreß: Deutsch-amerikanische Fallstudien zur
Interaktion von Ethnizität, Nationalität und demokratischer Regierung, 1865-1930
&lt;AUTHOR&gt; Adams, Willi Paul
&lt;PUBLICATION-YEAR&gt; 1990
&lt;LANGUAGE-CODE&gt; DE
&lt;CONTROLLED-TERM-DE&gt; ethnische Gruppe
&lt;CONTROLLED-TERM-DE&gt; Nordamerika …</p>
        <p>Query TD
Model \ # of queries
Prosit
doc=Okapi, query=npn
doc=Lnu, query=ltc
doc=dtu, query=dtn
doc=atn, query=ntc
doc=ltn, query=ntc
doc=ntc, query=ntc
doc=ltc, query=ltc
doc=lnc, query=ltc
doc=bnn, query=bnn
doc=nnn, query=nnn</p>
        <p>
          Based on the GIRT corpus we are therefore able to evaluate the impact of manually assigned descriptors as
compared to an indexing scheme, based only on the information contained in the corresponding article’s title
and abstract sections. To tackle this question we evaluated the GIRT collection using all sections (denoted “all”
in Table 18), or only using titles and abstracts from bibliographic records (under the label “TI &amp; AB”). In
related research using the Amaryllis French corpus, we found that the “TI &amp; AB” indexing scheme presents a
loss of around 45% in mean average precision
          <xref ref-type="bibr" rid="ref10 ref11">(Savoy 2005b)</xref>
          when compared to the “all” approach. In our
experiments, the decrease in mean average precision is around -14.4% for the German corpus and -36.5% for the
English GIRT collection.
        </p>
        <p>Our 12 official runs in the monolingual GIRT task are described in Table 19. For each language, we
submitted the first run using a data fusion operator (“Z-ScoreW” in this case). For all runs, we automatically
expanded the queries using a blind relevance feedback method (Rocchio in our experiments), hoping to improve
retrieval effectiveness.</p>
      </sec>
      <sec id="sec-8-2">
        <title>Run name</title>
      </sec>
      <sec id="sec-8-3">
        <title>UniNEgde1</title>
      </sec>
      <sec id="sec-8-4">
        <title>UniNEgde2</title>
      </sec>
      <sec id="sec-8-5">
        <title>UniNEgde3</title>
      </sec>
      <sec id="sec-8-6">
        <title>UniNEgen1</title>
      </sec>
      <sec id="sec-8-7">
        <title>UniNEgen2</title>
      </sec>
      <sec id="sec-8-8">
        <title>UniNEgen3</title>
      </sec>
      <sec id="sec-8-9">
        <title>UniNEgru1</title>
      </sec>
      <sec id="sec-8-10">
        <title>UniNEgru2</title>
      </sec>
      <sec id="sec-8-11">
        <title>UniNEgru3 Language Query German German</title>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>9 Conclusion</title>
      <p>In this sixth CLEF evaluation campaign, we proposed a general stopword list and a light stemming procedure
(removing only inflections attached to nouns and adjectives) for the Bulgarian and Hungarian languages (see
Table 4 and Table A.4). In order to enhance retrieval performance, we suggested using a data fusion approach
based on the Z-Score in order to combine the two probabilistic IR models (see Table 8). The results of this
evaluation campaign seem to indicate that for the French and Portuguese languages such an approach proved to
be effective (Table 8). The use of this search strategy did however require the building of two inverted files and
doubling the search time required. For both the Bulgarian and Hungarian languages, more experiments are
needed to confirm our first evaluations (especially in the design of a light stemming procedure for the Hungarian
language, see Table 4). For all languages however, the probabilistic models (either Okapi or Prosit) usually
- 12
result in better retrieval performances than do other vector-processing approaches (see Tables 3, 4, and 18 for the
GIRT corpora), while the data fusion approach did not always improve mean average precision. The automatic
decompounding of Hungarian words and its impact in IR remains an open question and our preliminary
experiments did not provide a clear and precise answer (our decompounding scheme slightly decreased retrieval
performance, as shown in Table 4).</p>
      <p>As in previous evaluation campaigns we were able to confirm that pseudo-relevance feedback based on
Rocchio’s model usually did improve mean average precision statistics for the French and Portuguese language,
even though this improvement is not always statistically significant. For the other languages (Bulgarian and
Hungarian), this blind query expansion did not improve mean average precision from the statistics point of view
(Tables 5 and 6).</p>
      <p>In the bilingual task, the freely available translation tools performed at a reasonable level for both the French
and Portuguese languages (based on the three best translation tools, the MAP compared to the monolingual
search is around 85% for the French language and 72.6% for the Portuguese). For less frequently used
languages such as Bulgarian and Hungarian, the freely available translation tools (either the bilingual dictionary
or the MT system) did not perform well. The mean average precision decreased by more than 50% (for
Hungarian) to 80% (for Bulgarian), when compared to a monolingual search.</p>
      <p>In the GIRT task (Table 18), we were able to measure the retrieval effectiveness by assigning keywords
manually, and the presence of this information improved MAP by around 36.5% for the English corpus and
14.4% for the German collection.</p>
      <p>Acknowledgments</p>
      <p>The authors would like to also thank the CLEF-2005 task organizers for their efforts in developing various
European language test-collections, and C. Buckley from SabIR for giving us the opportunity to use the
SMART system. The first author is not able to thank the computing services at UniNE, because they
consistently made no effort to be cooperative during this project. This research was supported in part by the
Swiss National Science Foundation under Grant #21-66 742.01.
wij = 1
wij = (ln(tfij) + 1) . idfj
wij = [ln(ln(tfij) + 1) + 1] . idfj
wij =
wij =
((k1 + 1) ⋅ tf i j)</p>
      <p>ln(tf i j) + 1
t
Â (ln( tf i k) +1)
k =1</p>
      <p>2
(K + tf i j)</p>
      <p>(ln(ln(tf i j) + 1) + 1) ⋅idf j
(1 - slope) ⋅ pivot + slope ⋅ nt i</p>
      <p>Table A.1: Weighting schemes</p>
      <p>To assign an indexing weight wij that reflects the importance of each single-term tj in a document Di, we
might use the various approaches shown in Table A.1, where n indicates the number of documents in the
collection, t the number of indexing terms, dfj the number of documents in which the term tj appears, the
document length (the number of indexing terms) of Di is denoted by nti, and avdl, b, k1, pivot and slope are
constants. For the Okapi weighting scheme, K represents the ratio between the length of Di measured by li (sum
of tfij) and the collection mean noted by avdl.</p>
      <p>bnn
ltn
dtn</p>
      <sec id="sec-9-1">
        <title>Okapi lnc ltc dtu</title>
        <p>wij = tfij
wij = idfj . [0.5+ 0.5.tfij / max tfi.]
wij = tfij . ln[(n-dfj) / dfj]
wij =
wij =
Ê1 + ln(tf i j) ˆ
ËÁ ln(mean tf) + 1˜¯
(1 - slope) ⋅ pivot + slope ⋅ nt i
tf i j ⋅ idf j
t
Â (tf i k ⋅idf k )
k =1
2
t
Â ((ln(tfi k ) + 1) ⋅ idf k )
2
k=1</p>
      </sec>
      <sec id="sec-9-2">
        <title>EU Agricultural Subsidies</title>
        <p>Euthanasia by Medics
Transport for Disabled
Swiss Referendums
Crime in New York
Radovan Karadzic
Prison Abuse
James Bond Films
Space Shuttle Missions
Anti-abortion Movements
Football Injuries
Hostage / Terrorist Situations
US Car Imports
Falkland Islands
Oil Price Fluctuation
EU Illegal Immigrants
Rebuilding German Cities
China-Taiwan Relations
Hurricane Force
Money Laundering
Public Performances of Liszt
Expulsion of Diplomats
Nuclear Power Stations
UN Peacekeeping Risks
Lottery Winnings
nnn
atn
npn
Lnu
ntc
C276
C277
C278
C279
C280
C281
C282
C283
C284
C285
C286
C287
C288
C289
C290
C291
C292
C293
C294
C295
C296
C297
C298
C299</p>
        <p>C300
- 14</p>
      </sec>
      <sec id="sec-9-3">
        <title>Health Economics</title>
        <p>Oil and Politics
Street Children
Advertising and Ethics
Giving up Smoking
Radio and Internet
Poverty and Wealth
Diabetes Mellitus
Soccer and Society
Russian Germans and their Language
Anti-Semitism in the Soviet Union
Television Behaviour</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Amati</surname>
            ,
            <given-names>G</given-names>
          </string-name>
          . &amp; van
          <string-name>
            <surname>Rijsbergen</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>Probabilistic models of information retrieval based on measuring the divergence from randomness</article-title>
          .
          <source>ACM-TOIS</source>
          ,
          <volume>20</volume>
          (
          <issue>4</issue>
          ),
          <fpage>357</fpage>
          -
          <lpage>389</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Braschler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Ripplinger</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2004</year>
          ).
          <article-title>How effective is stemming and decompounding for German text retrieval</article-title>
          ?
          <source>IR Journal</source>
          ,
          <volume>7</volume>
          (
          <issue>3-4</issue>
          ),
          <fpage>291</fpage>
          -
          <lpage>316</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Buckley</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singhal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mitra</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Salton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>1996</year>
          ).
          <article-title>New retrieval approaches using SMART</article-title>
          .
          <source>In Proceedings of TREC-4</source>
          , (pp.
          <fpage>25</fpage>
          -
          <lpage>48</lpage>
          ). Gaithersburg: NIST Publication #
          <fpage>500</fpage>
          -
          <lpage>236</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>E.A.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Shaw</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          (
          <year>1994</year>
          ).
          <article-title>Combination of multiple searches</article-title>
          .
          <source>In Proceedings TREC-2</source>
          , (pp.
          <fpage>243</fpage>
          -
          <lpage>249</lpage>
          ). Gaithersburg: NIST Publication #
          <fpage>500</fpage>
          -
          <lpage>215</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Kluck</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2004</year>
          ).
          <article-title>The GIRT data i the evaluation of CLIR systems - from 1997 until 2003</article-title>
          . In C. Peters,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Braschler</surname>
          </string-name>
          , M. Kluck (Eds.),
          <source>Comparative Evaluation of Multilingual Information Access Systems. LNCS #3237</source>
          . Springer-Verlag, Berlin,
          <year>2004</year>
          ,
          <fpage>376</fpage>
          -
          <lpage>390</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Robertson</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Beaulieu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>Experimentation as a way of life: Okapi at TREC</article-title>
          .
          <source>Information Processing &amp; Management</source>
          ,
          <volume>36</volume>
          (
          <issue>1</issue>
          ),
          <fpage>95</fpage>
          -
          <lpage>108</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Savoy</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>1997</year>
          ).
          <article-title>Statistical inference in retrieval effectiveness evaluation</article-title>
          .
          <source>Information Processing &amp; Management</source>
          ,
          <volume>33</volume>
          (
          <issue>4</issue>
          ),
          <fpage>495</fpage>
          -
          <lpage>512</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Savoy</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2004a</year>
          ).
          <article-title>Combining multiple strategies for effective monolingual and cross-lingual retrieval</article-title>
          .
          <source>IR Journal</source>
          ,
          <volume>7</volume>
          (
          <issue>1-2</issue>
          ),
          <fpage>121</fpage>
          -
          <lpage>148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Savoy</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2004b</year>
          ).
          <article-title>Report on CLEF-2003 monolingual tracks: Fusion of probabilistic models for effective monolingual retrieval</article-title>
          . In C. Peters,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Braschler</surname>
          </string-name>
          , M. Kluck (Eds.),
          <source>Comparative Evaluation of Multilingual Information Access Systems. LNCS #3237</source>
          . Springer-Verlag, Berlin,
          <year>2004</year>
          ,
          <fpage>322</fpage>
          -
          <lpage>336</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Savoy</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2005a</year>
          ).
          <article-title>Data Fusion for effective European monolingual information retrieval</article-title>
          . In Peters, P.D.
          <string-name>
            <surname>Clough</surname>
            ,
            <given-names>G.J.F.</given-names>
          </string-name>
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Kluck</surname>
          </string-name>
          &amp; B.
          <string-name>
            <surname>Magnini</surname>
          </string-name>
          (Eds.),
          <article-title>Multilingual Information Access for Text, Speech and Images</article-title>
          .
          <source>LNCS #3491</source>
          . Springer-Verlag, Berlin,
          <year>2005</year>
          ,
          <fpage>233</fpage>
          -
          <lpage>244</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Savoy</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2005b</year>
          ).
          <article-title>Bibliographic database access using free-text and controlled vocabulary: An evaluation</article-title>
          .
          <source>Information Processing &amp; Management</source>
          ,
          <volume>41</volume>
          (
          <issue>4</issue>
          ),
          <fpage>873</fpage>
          -
          <lpage>890</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Vogt</surname>
            ,
            <given-names>C.C.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Cottrell</surname>
            ,
            <given-names>G.W.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>Fusion via a linear combination of scores</article-title>
          .
          <source>IR Journal</source>
          ,
          <volume>1</volume>
          (
          <issue>3</issue>
          ),
          <fpage>151</fpage>
          -
          <lpage>173</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>Mean average precision Hungarian TD 50 queries 0.3420 0.3501 0.3301 0.3401 0.3215 0.2853 0.2208 0.2484 0.2395 0.1424 0</source>
          .
          <fpage>0875</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>