Report on CLEF-2003 Monolingual Tracks:
Fusion of Probabilistic Models for Effective Monolingual Retrieval
Jacques Savoy
Institut interfacultaire d'informatique, Université de Neuchâtel, Switzerland
Jacques.Savoy@unine.ch Web site: www.unine.ch/info/clef/
Abstract. For our third participation in the CLEF evaluation campaign, our first objective was to
propose more effective and general stopword lists for the Swedish, Finnish and Russian languages
along with an improved, more efficient and simpler stemming procedure for these three languages.
Our second goal was to suggest a combined search approach based on a data fusion strategy that
would work with various European languages. Included in this combined approach is a
decompounding strategy for the German, Dutch, Swedish and Finnish languages.
Introduction
Based on our experiments of last year [Savoy 2002], we participate in French, Spanish, German, Italian,
Dutch, Swedish, Finnish and Russian monolingual tasks without to rely on a dictionary. This paper presents
the approaches we used in the monolingual tracks and is organized as follows: Section 1 contains an overview of
our nine test-collections while Section 2 describes our general approach to building stopword lists and stemmers
for use with languages other than English. In Section 3, we suggest a simple decompounding algorithm that
could be used to decompound German, Dutch, Swedish and Finnish words. Section 4 evaluates two
probabilistic models and nine vector-space schemes using the nine test-collections. Finally, Section 5 presents
and evaluates various data fusion operators, together with our official runs.
1. Overview of the Test-Collections
The corpora used in our experiments included newspapers such as the Los Angeles Times (1994, English),
Glasgow Herald (1995, English), Le Monde (1994, French), La Stampa (1994, Italian), Der Spiegel (1994/95,
German) and Frankfurter Rundschau (1994, German), NRC Handelsbald (1994/95, Dutch), Algemeen Dagblad
(1995/95, Dutch) Tidningarnas Telegrambyrå (1994/95, Swedish), Aamulehti (1994/95, Finnish), and Izvestia
(1995, Russian). As an additional source of information, we included various articles edited by news agencies
such as EFE (1994/95, Spanish), and the Swiss news agency (1994/95, available in French, German and Italian
but without parallel translation).
As shown in Table 1a and 1b, these corpora are of various sizes, with the Spanish collection being the
biggest and the German, English and Dutch collections second. Ranking third are the French, Italian and
Swedish corpora, then somewhat smaller is the Finnish collection and finally the Russian collection is clearly
the smallest. Across all the corpora the mean number of distinct indexing terms per document is relatively
similar (around 112), but this number is a little bit larger for the English collection (156.9) and smaller for the
Swedish corpus (79.25).
Tables!1a and 1b compare also the number of relevant documents per request, with the mean always being
greater than the median (e.g., for the English collection, the average number of relevant documents per query is
18.63 with the corresponding median being 7). These findings indicate that each collection contains numerous
queries, yet only a rather small number of relevant items are found. For each collection, 60 queries have been
created. However, relevant documents cannot be found for each request and each language. For the English
collection, the Queries #149, #161, #166, #186, #191, and #195 do not have any relevant items; for the French
corpus, these requests are #146, #160, #161, #166, #169, #172, #191, #194; for the German collection (Queries
#144, #146, #170, #191); for the Spanish collection (Queries #169, #188, #195); for the Italian collection
(Queries #144, #146, #158, #160, #169, #170, #172, #175, #191); for the Dutch collection (Queries #160,
#166, #191, #194); for the Swedish collection (Queries #146, #160, #167, #191, #194, #197, #198); for the
Finnish corpus (Queries #141, #144, #145, #146, #160, #167, #169, #175, #182, #186, #188, #189, #191,
#194, #195). Appearing for the first time in a CLEF evaluation campaign is the Russian corpus, for which we
have only 28 requests.
During the indexing process of our automatic runs, we retained only the following logical sections from the
original documents:
, , , , , , , and .
From the topic descriptions we automatically removed certain phrases such as "Relevant document report …",
"Find documents …", "Trouver des documents qui parlent …", "Sono valide le discussioni e le decisioni …",
"Relevante Dokumente berichten …" or "Los documentos relevantes proporcionan información …".
English French German Spanish
Size (in MB) 579 MB 331 MB 668 MB 1,086 MB
# of documents 169,477 129,806 294,809 454,045
# of distinct terms 426,757 355,691 1,666,538 774,263
Number of distinct indexing terms / document
Mean 156.9 118.5 111.9 112.9
Standard deviation 118.77 95.72 100.06 55.75
Median 129 89 84 100
Maximum 1,881 1,621 2,424 642
Minimum 2 3 1 5
Number of queries 54 52 56 57
Number rel. items 1,006 946 1,825 2,368
Mean rel. / request 18.63 18.19 32.59 41.54
Standard deviation 28.61 33.16 36.95 57.37
Median 7 8 24 22
Maximum 139 (#Q:157) 193 (#Q:181) 226 (#Q:181) 303 (#Q:181)
Minimum 1 (#Q:141) 1 (#Q:141) 1 (#Q:160) 1 (#Q:175)
Table 1a: Test-collection statistics
Italian Dutch Swedish Finnish Russian
Size (in MB) 363 MB 540 MB 352 MB 137 MB 68 MB
# of documents 157,558 190,604 142,819 55,344 16,716
# of distinct terms 560,087 883,953 767,504 1,444,232 345,728
Number of distinct indexing terms / document
Mean 116.4 110 79.25 114 124.5
Standard deviation 88.24 107.03 64.00 91.35 124.53
Median 84 77 62 87 41
Maximum 1,395 2,297 1,547 1,946 1
Minimum 1 1 1 1 1,769
Number of queries 51 56 53 45 28
Number rel. items 809 1,577 889 483 151
Mean rel./ request 15.86 28.16 16.77 10.73 5.39
Standard deviation 20.32 43.10 25.09 15.78 7.11
Median 8 14.5 11 5 3
Maximum 110 (#Q:197) 226 (#Q:181) 170 (#Q:181) 82 (#Q:181) 31 (#Q:192)
Minimum 1 (#Q:145) 1 (#Q:195) 1 (#Q:141) 1 (#Q:149) 1 (#Q:147)
Table 1b: Test-collection statistics
2. Stopword Lists and Stemming Procedures
In order to define general stopword lists, we first accounted for the top 200 most frequent words found in the
various languages, together with articles, pronouns, prepositions, conjunctions or very frequently occurring verb
forms (e.g., to be, is, has, etc.). As compared to last year's stopword lists [Savoy 2002], we only modified
those for the Swedish and Finnish languages, and we created a new one for the Russian language (these lists are
available at www.unine.ch/info/clef/). For English we used the list provided by the SMART system (571
words), while for the other European languages, our stopword list contained 430 words for Italian, 463 for
French, 603 for German, 351 for Spanish, 1,315 for Dutch, 747 for Finnish, 386 for Swedish and 420 for
Russian.
Once it removes high-frequency words, an indexing procedure generally applies a stemming algorithm in an
attempt to conflate word variants into the same stem or root. In developing this procedure for various European
languages, we first wanted to remove only inflectional suffixes such as singular and plural word forms, and also
feminine and masculine forms, such that they conflate to the same root. Our suggested stemmers also try to
reduce various word declensions into the same stem, such as those used in the German, Finnish and Russian
languages.
More sophisticated schemes have already been proposed for the removal of derivational suffixes (e.g., "-ize",
"-ably", "-ship" in the English language), the stemmer developed by Lovins [1968] (based on a list of over 260
suffixes), or that of Porter [1980] (which looks for about 60 suffixes). For the French language only, our
stemming approach tried to remove some derivational suffixes (e.g., "communicateur" -> "communiquer",
"faiblesse" -> "faible"). For the Dutch language we used the Kraaij & Pohlmann's stemmer [Kraaij 1996]. Our
various stemming procedures can be found at www.unine.ch/info/clef/. Currently, it is not clear whether a
stemming procedure such ours removes only inflectional suffixes from nouns and adjectives, and better retrieval
effectiveness may be achieved by a stemming approach that also accounts for verbs or that removes both
inflectional and derivational suffixes.
Finally, diacritic characters are usually not present in English collections (with some exceptions, such as
"résumé"); and as with the Italian, Dutch, Finnish, Swedish, German, Spanish and Russian languages, these
characters are replaced by their corresponding non-accentuated letter. For this latter language, we convert and
normalize the Cyrillic Unicode characters into Latin alphabet (perl script available at www.unine.ch/clef/).
3. Decompounding Words
Most European languages manifest other morphological characteristics with compound word constructions
being just one example (e.g., handgun, worldwide). In German for example, compound words are widely used
and they may cause more difficulties than do those in English. For example, an insurance company would be
"Versicherungsgesellschaft" ("Versicherung" + "S" + "Gesellschaft"). However the morphological marker ("S")
is not always present (e.g., "Atomtests" built as "Atom" + "Tests"), and sometimes the letter "S" belongs to the
decompounded word (e.g., "Wintersports" for "Winter" + "Sports"). In Finnish, we also encounter similar
constructions as such as "rakkauskirje" ("rakkaus" + "kirje" for love & letter) or "työviikko" ("työ" + "viikko"
for work & week). Recently, Braschler [2003] shows that decompounding German words may significantly
improve retrieval performance.
Our proposed decompounding approach shares some similarity with Chen's algorithm [2002]. Before using
it, we create a word list composed of all words appearing in the given collection (without stemming).
Associated with each word, we also store the number of its occurrences in the collection (some examples are
given in Table 2).
computer 2452 port 1091
computers 79 ports 2
sicherheit 6583 sport 1483
sicher 4522 sports 199
heit 4
winter 1643
bank 9657 winters 148
bund 7032 wintersport 44
bundes 2884 wintersports 2
bundesbank 1453
präsident 24041
Table 2: Examples of German words included in our words list
In order to present an overview of our decompounding approach, we will take as an example the German
word "Computersicherheit," composed of "Computer" + "Sicherheit" (security). This compound word does not
appear in our German word list as depicted in Table 2, so our algorithm starts the decompounding process by
attempting to split a word following the k = 4 last letters (given the two strings "computersicher" and "heit").
During the entire procedure, we only consider words having a length greater than a given threshold (fixed at 3
for all languages in our experiments). If both components appear in the word list, then we have a candidate for
decompounding; otherwise the k limit is increased by one. Since, in our case, the string "computersiche" does
not appear in the German word list, splitting is rejected. When k = 9, our algorithm will find the word
"computers" in the word list, but will fail to find the word "icherheit". With k = 10, our algorithm will find
both the word "computer" and "sicherheit" in the German word list (see Table 2) and this solution becomes the
top level decompounding suggestion. Recursively, the system now tries to decompound the two parts, namely
the words "computer" and "sicherheit". During this recursive process, the system is allowed to ignore some
short sequences of letters at the end of a word (such as "-s" or "-es" in German, or "-s" for the Swedish language)
because such morphological markers may indicate the genitive form (such as "'s" in the noun phrase "John's
book").
After this generative part, the system responds a tree of possible formats in which the compound
construction can be broken down, and with each component, we find the number of its occurrences in the
corpus. In our example, the answer will be (computer 2452, sicherheit 6583 (sicher 4522, heit 4)). Thus, from
this result, we know that the word "Sicherheit" appears 6583 times in the corpus, and we may consider
decompounding this term into the words "sicher" and "heit". From this we can add (or replace) the compound
word in the document (or in the request) by all decompound candidates ("computer" + "sicherheit", and
"computer" + "sicher" + "heit" in our case) or only by decompounding only the minimum number of terms
("computer" + "sicherheit" in our case).
However, when faced with multiple candidates, our algorithm will try to select the single "best" one. To
achieve this, our system will consider the total number of occurrences for the component words and if this value
is greater than the number of occurrences for the compound construction, the decompounded candidate will be
selected. In our example, the system will not decompound the word "Sicherheit" because the number of
occurrences of the words "sicher" (4522) and "heit" (4) will not produce a total (4526) greater than the number of
occurrences of the word "sicherheit" (6583).
If we consider the German word "Bundesbankpräsident" (president of the (German) federal bank), the
generative part of our algorithm would return (bundesbank 1453 (bund 7032, bank 9657), präsident 24041) and
the final decompounding approach would return (bund 7032, bank 9657, präsident 24041). In this case, the
number of occurrences of "bundesbank" (1453) is smaller than the sum of the occurrences of the words "bund"
and "bank". However, our approach does not always generate the appropriate components of a compounded
term. For example, based on the compound construction "wintersports", the system answers with (winter 1643,
port 1091) instead of (winter 1643, sport 1483). This problem is due to the fact that the first part of our
approach ignores backtracking and will stop when it encounters the first splitting of the compound into two
parts.
4. Indexing and Searching Strategy
In order to obtain a broader view of the relative merit of various retrieval models, we first adopted a binary
indexing scheme within which each document (or request) is represented by a set of keywords, without any
weight. To measure the similarity between documents and requests, we computed the inner product (retrieval
model denoted "doc=bnn, query=bnn" or "bnn-bnn"). In order to weight the presence of each indexing term in a
document surrogate (or in a query), we could account for the term occurrence frequency (retrieval model notation:
"doc=nnn, query=nnn" or "nnn-nnn") or we might also account for their frequency in the collection (or more
precisely the inverse document frequency, denoted by idfj ). Moreover, a cosine normalization could prove
beneficial and each indexing weight could vary within the range of 0 to 1 (retrieval model notation: "ntc-ntc",
Table 3 depicts the exact weighting formulation).
Other variants might also be created. For example, the tf component may be computed as 0.5 + 0.5 · [tf /
max tf in a document] (retrieval model denoted "doc=atn"). We might also consider that a term's presence in a
shorter document provides stronger evidence than it does in a longer document, leading to more complex IR
models; for example, the IR model denoted by "doc=Lnu" [Buckley 1996], "doc=dtu" [Singhal 1999].
Besides the previous models based on the vector-space approach, we also considered probabilistic models.
In this vein, we used the Okapi probabilistic model [Robertson 2000] within with:
K = k1 · [(1 - b) + b · (li / avdl)]
represents the ratio between the length of Di measured by li (sum of tfi j) and the collection mean noted by avdl.
In Table 3, the value of nti indicates the number of distinct indexing terms including in the representation of Di .
As a second probabilistic approach, we implemented the Prosit (PRObabilistic Sift of Information Terms)
approach [Amati 2002a, 2002b] which is based on the following indexing formula:
wi j = Inf1 i j · Inf2 i j = (1 - Prob1 i j) · Inf2 i j with
Prob1 i j = tfni j / (tfni j + 1) with tfni j = tfi j · log2 [1 + ((C · mean dl) / li )]
Inf2 i j = -log2 [1 / (1+lj )] - tfni j · log2 [lj / (1+lj )] with lj = tcj / n
in which tcj indicates the number of occurrences of term tj in the collection and n the number of documents in
the corpus. In our experiments, the constants b, k1 , avdl, pivot, slope, C and mean dl are fixed according to
values listed in Table!4.
bnn wi j = 1 nnn wi j = tfi j
ltn wi j = (ln(tfi j) + 1) . idfj atn wi j = idfj . [0.5+ 0.5. tfi j / max tfi.]
dtn wi j = ln[(ln(tfi j) + 1) + 1] . idfj npn wi j = tfi j . ln[(n-dfj ) / dfj ]
Ê1 + ln(tf i j) ˆ
Á ˜
Okapi wi j =
((k1 + 1) ⋅ tf i j) Lnu wi j =
Ë ln(mean tf) + 1¯
( K + tf i j) (1- slope) ⋅ pivot + slope ⋅ nt i
ln(tf i j) + 1 tf i j ⋅ idf j
lnc wi j = ntc wi j =
t 2 t 2
 (ln( tf i k) +1)  ( tf i k ⋅idf k )
k =1 k =1
ltc wi j =
( ln(tfi j) + 1) ⋅ idf j
t 2
 (( ln(tfi k ) + 1) ⋅ idf k )
k=1
dtu wi j =
(ln(ln(tf i j) + 1) + 1) ⋅idf j
(1- slope) ⋅ pivot + slope ⋅ nt i
Table 3: Weighting schemes
Language Index b k1 avdl C mean dl
English word 0.8 2 800 1.5 167
French word 0.75 3 900 1.25 182
Spanish word 0.4 1.2 400 1.75 157
German word 0.5 1.5 600 3 152
German 5-gram 0.3 1 500 2.5 475
Italian word 0.55 1.5 800 1.25 165
Dutch word 0.8 3 600 2.25 110
Dutch 5-gram 0.6 1.2 600 1.75 362
Finnish word 0.75 2 900 1.25 114
Finnish 5-gram 0.6 1.2 800 2 539
Swedish word 0.7 2 500 3 79
Swedish 4-gram 0.75 2 900 1.75 292
Russian word 0.7 2 800 1.5 124
Russian 5-gram 0.75 1.2 750 1.75 451
Russian 4-gram 0.75 1.2 750 1.75 468
Table 4: Parameter setting for the various test-collections
To evaluate our approaches, we used the SMART system as a test bed running on an Intel Pentium III/600
(memory: 1 GB, swap: 2 GB, disk: 6 x 35 GB). To measure the retrieval performance, we adopted the non-
interpolated mean average precision (computed on the basis of 1,000 retrieved items per request by the TREC-
EVAL program). We indexed the English, French, Spanish and Italian collections using words as indexing
units. The evaluation of our two probabilistic models and nine vector-space schemes are given in Table 5a.
In order to represent German, Dutch, Swedish, Finnish and Russian documents and queries, we considered
the n-gram, decompounded and word-based indexing schemes. The resulting mean average precision for these
various indexing approaches is shown in Table 5b (German and Dutch corpora), in Table 5c (Swedish and
Finnish languages) and in Table 5d (Russian collection).
It was observed that pseudo-relevance feedback (blind-query expansion) seems to be a useful technique for
enhancing retrieval effectiveness. In this study, we adopted Rocchio's approach [Buckley 1996] with a = 0.75,
b = 0.75 whereby the system was allowed to add m terms extracted from the k best ranked documents from the
original query. To evaluate this proposition, we used the Okapi and the Prosit probabilistic models and we
enlarged the query by the 10 to 175 terms provided by the 3 or 10 best-retrieved articles.
Mean average precision
Query TD English French Spanish Italian
Model 54 queries 52 queries 57 queries 51 queries
Prosit 48.19 52.01 47.23 47.17
doc=Okapi, query=npn 48.83 51.64 48.85 48.80
doc=Lnu, query=ltc 44.51 48.26 45.79 45.32
doc=dtu, query=dtn 43.17 46.58 45.03 45.71
doc=atn, query=ntc 45.55 45.48 44.04 45.77
doc=ltn, query=ntc 34.68 39.01 42.40 42.56
doc=ntc, query=ntc 27.12 32.74 27.08 28.90
doc=ltc, query=ltc 28.14 34.41 29.74 28.63
doc=lnc, query=ltc 33.89 37.98 33.52 32.68
doc=bnn, query=bnn 15.97 24.01 26.48 25.33
doc=nnn, query=nnn 6.50 12.27 19.84 22.36
Table 5a: Mean average precision of various single searching strategies (monolingual)
Mean average precision
Query TD German German German Dutch Dutch Dutch
words decompound 5-gram words decompound 5-gram
Model 56 queries 56 queries 56 queries 56 queries 56 queries 56 queries
Prosit 42.14 45.53 42.88 47.15 48.36 39.41
doc=Okapi, query=npn 44.54 46.93 44.27 46.86 48.73 40.23
doc=Lnu, query=ltc 40.64 45.44 39.63 43.38 45.08 33.63
doc=dtu, query=dtn 42.60 43.95 39.08 42.69 43.78 33.82
doc=atn, query=ntc 40.98 43.67 40.36 41.92 43.52 36.43
doc=ltn, query=ntc 39.07 39.32 38.57 38.45 39.51 32.47
doc=ntc, query=ntc 27.40 32.64 31.59 29.27 30.36 29.42
doc=ltc, query=ltc 28.85 36.02 32.76 30.97 32.41 28.24
doc=lnc, query=ltc 30.16 35.93 32.10 31.39 33.15 28.53
doc=bnn, query=bnn 23.63 23.31 21.07 26.14 26.80 21.16
doc=nnn, query=nnn 15.97 10.85 9.78 11.35 10.64 9.82
Table 5b: Mean average precision of various single searching strategies (German & Dutch collections)
Mean average precision
Query TD Swedish Swedish Swedish Finnish Finnish Finnish
words decompound 4-gram words decompound 5-gram
Model 53 queries 53 queries 53 queries 45 queries 45 queries 45 queries
Prosit 39.26 40.86 40.23 46.35 46.96 49.03
doc=Okapi, query=npn 39.98 41.43 40.05 46.54 46.61 48.97
doc=Lnu, query=ltc 38.03 39.82 37.87 48.73 47.31 46.03
doc=dtu, query=dtn 38.14 40.32 36.40 44.44 44.78 43.54
doc=atn, query=ntc 36.56 37.85 39.95 42.91 43.99 48.56
doc=ltn, query=ntc 33.81 35.49 36.11 42.47 43.11 42.94
doc=ntc, query=ntc 25.08 26.82 26.13 32.73 33.46 35.64
doc=ltc, query=ltc 26.57 28.65 25.46 37.27 38.34 37.72
doc=lnc, query=ltc 26.91 29.17 29.03 36.93 39.18 37.21
doc=bnn, query=bnn 19.75 21.89 25.67 17.95 15.17 20.06
doc=nnn, query=nnn 11.55 11.75 12.47 13.85 13.21 14.83
Table 5c: Mean average precision of various single searching strategies (Swedish & Finnish collections)
The results depicted in Tables 6 (depicting our best results) indicate that the optimal parameter setting seems
to be collection-dependant. Moreover, performance improvement also seems to be collection dependant (or
language dependant), with no improvement for the English corpus yet an increase of 8.55% for the Spanish
corpus (from a mean average precision of 51.71 to 56.13), 9.85% for the French corpus (from 48.41 to 53.18),
12.91% for the Italian language (41.05 to 46.35) and 13.26% for the German collection (from 41.25 to 46.72,
combined model, Table 6b).
Mean average precision
Query TD Russian Russian Russian Russian
words words 5-gram 4-gram
extended stemmer light stemmer
Model 28 queries 28 queries 28 queries 28 queries
Prosit 36.69 34.89 30.44 34.43
doc=Okapi, query=npn 34.26 34.58 30.31 32.51
doc=Lnu, query=ltc 36.34 36.30 27.36 29.75
doc=dtu, query=dtn 32.67 32.95 28.49 30.55
doc=atn, query=ntc 37.06 33.22 31.29 31.41
doc=ltn, query=ntc 29.55 30.89 23.83 22.05
doc=ntc, query=ntc 33.47 30.14 28.69 27.39
doc=ltc, query=ltc 32.34 28.74 26.40 27.52
doc=lnc, query=ltc 32.58 24.47 20.65 21.88
doc=bnn, query=bnn 14.84 15.23 13.13 9.05
doc=nnn, query=nnn 12.27 11.41 7.95 5.83
Table 5d: Mean average precision of various single searching strategies (Russian collection)
Mean average precision
Query TD English French Spanish Italian
Model 54 queries 52 queries 57 queries 51 queries
doc=Okapi, query=npn 48.83 51.64 48.85 48.80
5 docs / 10 best terms 48.79 51.33 52.74 52.97
5 docs / 15 best terms 48.15 51.91 52.87 53.39
5 docs / 20 best terms 47.37 51.30 53.02 52.35
10 docs / 10 best terms 45.70 49.81 52.51 51.33
10 docs / 15 best terms 44.10 48.59 52.55 51.17
10 docs / 20 best terms 45.62 49.68 52.79 51.94
Table 6a: Mean average precision using blind-query expansion
Mean average precision
Query TD German German German Dutch Dutch Dutch
words decompound 5-gram words decompound 5-gram
Model 56 queries 56 queries 56 queries 56 queries 56 queries 56 queries
Okapi 44.54 46.93 44.27 46.86 48.73 40.23
k doc. 46.46 50.32 47.26 52.32 54.60 43.12
/ m terms 47.83 51.40 46.96 53.39 54.79 43.32
48.39 51.64 46.88 54.14 55.56 43.90
45.98 50.32 46.46 51.26 53.07 42.34
46.31 50.20 46.50 51.14 52.81 42.67
46.08 50.33 46.59 51.72 53.77 42.54
Table 6b: Mean average precision using blind-query expansion (German & Dutch collections)
Mean average precision
Query TD Swedish Swedish Swedish Finnish Finnish Finnish
words decompound 4-gram words decompound 5-gram
Model 53 queries 53 queries 53 queries 45 queries 45 queries 45 queries
Prosit 39.26 40.86 40.23 46.35 46.96 49.03
k doc. 45.93 48.01 42.13 52.50 52.03 50.98
/ m terms 44.50 46.23 42.16 52.71 53.37 49.44
42.59 43.58 42.57 50.04 52.93 49.06
43.29 47.15 39.44 49.69 48.82 52.45
43.86 46.66 41.10 47.90 47.85 52.92
43.40 46.29 41.37 49.77 48.85 52.67
Table 6c: Mean average precision using blind-query expansion (Swedish & Finnish collections)
Mean average precision
Query TD Russian Russian Russian Russian
words words 5-gram 4-gram
extended stemmer light stemmer
Model 28 queries 28 queries 28 queries 28 queries
doc=Okapi, query=npn 34.26 34.58 30.31 32.51
5 docs / 20 best terms 34.81 32.68 29.27 30.76
5 docs / 30 best terms 32.46 34.69 29.10 30.45
5 docs / 40 best terms 31.87 34.81 29.64 30.62
10 docs / 20 best terms 30.84 31.30 30.25 29.92
10 docs / 30 best terms 29.24 33.00 30.07 30.17
10 docs / 40 best terms 29.28 30.24 30.03 29.84
10 docs / 50 best terms 27.99 28.88 29.32 29.46
Table 6d: Mean average precision using blind-query expansion (Russian collection)
5. Data Fusion
For the English, French, Spanish, Italian and Russian languages, we assumed that the n-gram indexing and
word-based document representation approaches are distinct and independent sources of evidence regarding the
content of documents. For the German, Dutch, Swedish and Finnish languages, we added the decompounding
indexing approach in our documents (and queries) representation scheme.
In order to combine these two and three indexing schemes respectively, we evaluated various fusion
operators, as suggested by Fox and Shaw [Fox 1994]. Table 7 shows their precise description. For example,
the combSUM operator indicates that the combined document score (or the final retrieval status value) is simply
the sum of the retrieval status value (RSVk) of the corresponding document Dk computed by each single
indexing scheme. CombNBZ specifies that we multiply the sum of the document scores by the number of
retrieval schemes that are able to retrieve the corresponding document. In Table 7, we can see that both the
combRSV% and combRSVnorm apply a normalization procedure when combining document scores. When
combining the retrieval status value (RSVk) for various indexing schemes, we may multiply the document score
by a constant ai (usually equal to 1) in order to favor the ith more efficient retrieval scheme. In addition to use
these data fusion operators, we also considered the round-robin approach, whereby in turn we take one document
from all individual lists and remove duplicates, keeping the most highly ranked instance.
combMAX MAX (ai . RSVk)
combMIN MIN (ai . RSVk)
combSUM SUM (ai . RSVk)
combANZ SUM (ai . RSVk) / # of nonzero (RSVk)
combNBZ SUM (ai . RSVk) * (# of nonzero (RSVk))
combRSV% SUM (ai . (RSVk / MAXRSV))
combRSVnorm SUM [ai . ((RSVk-MINRSV) / (MAXRSV-MINRSV))]
Table 7: Data fusion combination operators
Mean average precision
Query TD English French Spanish Italian Russian
Model 54 queries 52 queries 57 queries 51 queries 28 queries
Okapi expand doc/term 0 / 0 48.83 1 0 / 1 0 49.81 1 0 / 1 0 52.51 1 0 / 2 0 51.94 1 0 / 2 0 31.30
Prosit expand doc/term 3/15 50.99 5/30 52.30 10/10 50.19 10/50 50.82 5/30 35.41
combMAX 48.83 52.27 50.19 50.82 35.41
combMIN 2.88 42.77 8.21 18.62 24.96
combSUM 51.13 53.58 51.89 51.87 35.68
combANZ 37.95 53.25 43.97 50.05 35.60
combNBZ 51.11 53.66 51.89 51.86 35.65
combRSV% 53.60 54.50 53.30 53.58 34.43
combRSVnorm 53.25 54.69 53.49 54.37 34.30
round-robin 50.24 52.61 53.16 54.47 34.11
Table 8a: Mean average precision using different combination operators (ai = 1, with blind-query expansion)
Run name Language Query Index Model Query expansion combined MAP
UniNEfr French TD word Okapi 10 best docs / 10 terms
TD word Prosit 5 best docs / 30 terms round-robin 52.61
UniNEfr2 French TD word Okapi 10 best docs / 10 terms
TD word Prosit 5 best docs / 30 terms RSV% 54.50
UniNEsp Spanish TD word Okapi 10 best docs / 10 terms
TD word Prosit 10 best docs / 10 terms RSVnorm 53.80
UniNEsp2 Spanish TD word Okapi 5 best docs / 10 terms
TD word Prosit 10 best docs / 10 terms RSVnorm 53.69
UniNEde German TD word Prosit 5 best docs / 20 terms
TD decomp. Prosit 10 best docs / 40 terms
TD 5-gram Prosit 5 best docs / 175 terms RSVnorm 54.58
UniNEde2 German TD word Pro+Oka 5 best docs / 20 terms
TD decomp. Pro+Oka 10 best docs / 40 terms
TD 5-gram Pro+Oka 5 best docs / 175 terms sumRSV 56.03
UniNEit Italian TD word Okapi 10 best docs / 20 terms
TD word Prosit 10 best docs / 50 terms RSV% 52.23
UniNEit2 Italian TD word Okapi 10 best docs / 20 terms
TD word Prosit 10 best docs / 50 terms sumRSV 51.56
UniNEnl Dutch TD word Okapi 10 best docs / 20 terms
TD decomp. Okapi 10 best docs / 20 terms
TD 5-gram Prosit 10 best docs / 150 terms round-robin 50.65
UniNEnl2 Dutch TD word Okapi 10 best docs / 20 terms
TD decomp. Okapi 10 best docs / 20 terms
TD 5-gram Prosit 10 best docs / 150 terms sumRSV 50.24
UniNEsv Swedish TD word Pro+Oka 3 best docs / 15 terms
TD decomp. Pro+Oka 3 best docs / 15 terms
TD 4-gram Pro+Oka 3 best docs / 40 terms RSV% 48.19
UniNEsv2 Swedish TD word Pro+Oka 5 best docs / 30 terms
TD decomp. Pro+Oka 5 best docs / 50 terms
TD 4-gram Pro+Oka 5 best docs / 30 terms RSVnorm 48.69
UniNEfi Finnish TD word Prosit 5 best docs / 30 terms
TD decomp. Prosit 5 best docs / 15 terms
TD 5-gram Prosit 3 best docs / 125 terms sumRSV 54.51
UniNEfi2 Finnish TD word Prosit 5 best docs / 30 terms
TD decomp. Prosit 5 best docs / 15 terms
TD 5-gram Prosit 3 best docs / 125 terms sumRSV 53.55
UniNEru Russian TDN word Okapi 1 0b e ds to c/ s20 terms
TDN word Prosit 5 best docs / 30 terms sumRSV 35.32
UniNEru1 Russian TD word Okapi 1 0b e ds to c/ s20 terms
TD word Prosit 5 best docs / 30 terms sumRSV 31.83
UniNEru2 Russian TD 5-gram Okapi 10 best docs / 50 terms
TD 5-gram Prosit 5 best docs / 40 terms
TD 4-gram Okapi 10 best docs / 50 terms
TD 4-gram Prosit 5 best docs / 40 terms sumRSV 32.77
UniNEru3 Russian TDN word Okapi 1 0b e ds to c/ s10 terms
TDN word Prosit 5 best docs / 20 terms sumRSV 42.24
Table 9: Description and mean average precision (MAP) of our official runs
Tables 8a and 8b depict an evaluation of various data fusion operators, comparing them to the single
approach using the Okapi and the Prosit probabilistic models. As shown in these tables, the combRSVnorm or
combRSV% fusion strategies usually improve the retrieval effectiveness over the best single retrieval model.
Mean average precision
Query TD German Dutch Swedish Finnish
Model 56 queries 56 queries 53 queries 45 queries
Prosit word doc/term 5/20 48.40 51.14 3/60 42.59 5/30 47.90
Prosit decomp doc/term 10/40 51.40 51.81 3/40 43.58 5/15 47.85
Prosit n-gram doc/term 5/175 49.46 10/150 44.23 3/40 42.16 3/125 49.06
combMAX 49.97 44.23 42.94 50.22
combMIN 35.54 6.30 33.91 33.36
combSUM 53.71 50.24 47.58 54.51
combANZ 47.85 31.90 41.14 49.25
combNBZ 53.70 50.81 47.29 55.60
combRSV% 54.46 53.99 47.95 54.49
combRSVnorm 54.58 54.30 48.12 54.16
round-robin 50.83 50.65 44.14 48.73
Table 8b: Mean average precision using different combination operators (ai = 1, with blind-query expansion)
Conclusion
In this fourth CLEF evaluation campaign, we proposed a general stopword list and stemming procedure for
eight European languages (excluding English). Currently it is not clear if a stemming procedure such as that
suggested and that only removes inflectional suffixes from nouns and adjectives, could produce better retrieval
effectiveness than a stemming approach that takes both inflectional and derivational suffixes into account. We
also suggested a simple decompounding approach for the German, Dutch, Swedish and Finnish language. In
order to achieve better retrieval performance, we used a data fusion approach, one requiring that document (and
query) representation be based on two or three indexing schemes.
Acknowledgments
The author would like to thank C. Buckley from SabIR for giving us the opportunity to use the SMART
system. This research was supported by the Swiss National Science Foundation under grant #21-66 742.01.
References
[Amati 2002a] Amati, G., Carpineto, C. & Romano, G. (2002). Italian monolingual information retrieval
with PROSIT. In Proceedings of CLEF-2002, (pp. 145-151). Roma.
[Amati 2002b] Amati, G. & van Rijsbergen, C.J. (2002). Probabilistic models of information retrieval
based on measuring the divergence from randomness. ACM TOIS, 20(4), 357-389.
[Braschler 2003] Braschler, M. & Ripplinger, B. (2003). Stemming and decompounding for German text
retrieval. In Proceedings 25th European Conference in IR (pp. 177-192). Berlin: Springer.
[Buckley 1996] Buckley, C., Singhal, A., Mitra, M. & Salton, G. (1996). New retrieval approaches using
SMART. In Proceedings of TREC'4, (pp. 25-48). Gaithersburg: NIST Publication #500-236.
[Chen 2002] Chen, A. (2002). Cross-language retrieval experiments at CLEF-2002. In Proceedings of
CLEF-2002, (pp. 5-20). Roma.
[Fox 1994] Fox, E.A. & Shaw, J.A. (1994). Combination of multiple searches. In Proceedings
TREC-2, (pp. 243-249). Gaithersburg: NIST Publication #500-215.
[Kraaij 1996] Kraaij, W. & Pohlmann, R. (1996). Viewing stemming as recall enhancement. In
Proceedings of the ACM-SIGIR'96, (pp. 40-48). New York: The ACM Press.
[Lovins 1968] Lovins, J.B. (1968). Development of a stemming algorithm. Mechanical Translation and
Computational Linguistics, 11(1), 22-31.
[Porter 1980] Porter, M.F. (1980). An algorithm for suffix stripping. Program, 14, 130-137.
[Robertson 2000] Robertson, S.E., Walker, S. & Beaulieu, M. (2000). Experimentation as a way of life:
Okapi at TREC. Information Processing & Management, 36(1), 95-108.
[Savoy 2002] Savoy J. (2002). Report on CLEF-2002 experiments: Combining multiple sources of
evidence. In Proceedings of CLEF-2002, (pp. 31-46). Roma.
[Singhal 1999] Singhal, A., Choi, J., Hindle, D., Lewis, D.D. & Pereira, F. (1999). AT&T at TREC-7.
In Proceedings TREC-7, (pp. 239-251). Gaithersburg: NIST Publication #500-242.