<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Cross-language Retrieval Experiments at CLEF-2002</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Aitao Chen School of Information Management and Systems University of California at Berkeley</institution>
          ,
          <addr-line>CA 94720</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <issue>7</issue>
      <abstract>
        <p>This paper describes monolingual, cross-language, and multilingual retrieval experiments using CLEF-2002 test collection. The paper presents a technique for incorporating blind relevance feedback into a document ranking formula based on logistic regression analysis, and a procedure for decomposing German or Dutch compounds into their component words. Multilingual text retrieval is the task of searching for relevant documents in a collection of documents in more than one language in response to a query, and presenting a unified ranked list of documents regardless of language. Multilingual retrieval is an extension of bilingual retrieval where the collection consists of documents in a single language that is different from the query language. Recent developments on multilingual retrieval were reported in CLEF-2000 [12], and CLEF-2001 [13]. Most of the multilingual retrieval methods fall into one of three groups. The first approach translates the source topics separately into all the document languages in the document collection. Then monolingual retrieval is carried out separately for each document language, resulting in one ranked list of documents for each document language. Finally the intermediate ranked lists of retrieved documents, one for each language, are merged to yield a combined ranked list of documents regardless of language. The second approach translates a multilingual document collection into the topic language. Then the topics are used to search against the translated document collection. The third one also translates topics to all document languages as in the first approach. The source topics and the translated topics are concatenated to form a set of multilingual topics. The multilingual topics are then searched directly against the multilingual document collection, which directly produces a ranked list of documents in all languages. The latter two approaches do not involve merging two or more ranked lists of documents, one for each document language, to form a combined ranked list of documents in all document languages. The merging task is hard and challenging. To the best of our knowledge, no effective technique has been developed yet. It appears most participating groups of the multilingual retrieval tasks in the TREC or CLEF evaluation conferences applied the first approach. Translating large collections of documents in multiple languages into topic languages requires the availability of machine translation systems that support the necessary language pairs, which is sometime problematic. For example, if the document collection consists of documents in English, French, German, Italian, and Spanish, and the topics are in English. To perform the multilingual retrieval task using English topics, one would have to translate the French, German, Italian, and Spanish documents into English. In this case, there exist translators, such as Babelfish, that can do the job. However, if the topics are in Chinese or Japanese, it may be more difficult or even not possible to find the translators to do the work. The availability of the translation resources and the need for extensive computation are factors that limit the applicability of the second approach. The third approach is appealing in that it does not require to translate the documents, and circumvents the difficult merging problem. However, there is some empirical evidence showing that the third approach is less effective than the first one [3]. We believe that three of the core components of the first approach are monolingual retrieval, topic translation, and merging. Performing multilingual retrieval requires many language resources such as stopwords, stemmers, bilingual dictionaries, machine translation systems, parallel or comparable corpora. At the same time, we see more and better language resources publicly available on the Internet. The end performance of multilingual retrieval can be affected by many factors such as monolingual retrieval performance of the document ranking algorithm, the quality and coverage of the translation resources, the availability of language-dependent stemmers and stopwords, and the effectiveness of merging algorithm. Since merging of ranked lists of documents is a challenging task, we</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>seek to improve multilingual retrieval performance by improving monolingual retrieval performance and exploiting
translation resources publicly available on the Internet.</p>
      <p>At CLEF 2002, we participated in the monolingual, crosss-language, and multilingual retrieval tasks. For
monolingual task, we submitted retrieval runs for Dutch, French, German, Italian, and Spanish. For cross-language task,
we submitted cross-language retrieval runs from English topics to document languages Dutch, French, German,
Italian, and Spanish, one French-to-German run, and one German-to-French run. And for multilingual task, we
submitted two runs using English topics. All of our runs used only the title and desc fields in the topics. The
document collection for multilingual task consists of documents in English, French, German, Italian and Spanish.
More details on document collections are presented below in section 5. Realizing the difficulty of merging
multiple disjoint ranked lists of retrieved documents in multilingual retrieval, we have put little effort on the merging
problem. We mainly worked on improving the performances of monolingual retrieval and cross-language retrieval
since we believe improved performances in monolingual and cross-language retrieval should ultimately lead to
better performance in multilingual retrieval. For all of our runs in cross-language and multilingual tasks, the topics
was translated into document languages. The main translation resources we used are the SYSTRAN-based online
machine translation system Babelfish translation and L&amp;H Power Translator Pro Version 7.0. We also used
parallel English/French texts in one of the English-to-French retrieval runs. The Babylon English-Dutch dictionary was
used in cross-language retrieval from English to Dutch.</p>
      <p>
        The same document ranking formula developed at Berkeley [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] back in 1993 was used for all retrieval runs
reported in this paper. It was also used in our participation in the previous CLEF workshops. It has been shown
that query expansion via blind relevance feedback can be effective in monolingual and cross-language retrieval.
The Berkeley formula based on logistic regression has been used for years without blind relevance feedback. We
developed a blind relevance feedback procedure for the Berkeley document ranking formula. All of our official
runs were produced with blind relevance feedback. We will present a brief overview of the Berkeley document
ranking formula in section 2. We will describe the blind relevance feedback procedure in section 3.
      </p>
      <p>
        At CLEF 2001, we presented a German decompounding procedure that was hastily developed. The
decompounding procedure uses a German base dictionary consisting of words that should not be further decomposed
into smaller components. When a compound can be split into component words found in the base dictionary in
more than one way, we choose to split up the compound so that the number of component words is the smallest.
However if there two or more decompositions with the smallest number of component words, we choose the
decomposition that is most likely. The probability for a decomposition of a compound is computed based on the
relative frequencies of the component words in a German collection. We reported a slight decrease in German
monolingual performance with German decompounding [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] at CLEF 2001. The slight decline in performance
may be attributed to the fact that we kept both the original compounds and the component words resulted from
decompounding in topic index. When we re-ran the same German monolingual retrieval with only the component
words of compounds in the topics were retained, the average precision was improved by 8.88% with
decompounding over without it [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Further improvements in performance brought by German decompounding were reported
in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] when a different method was used to compute the relative frequencies of component words.
      </p>
      <p>
        At CLEF 2002, we used the improved version of the German decompounding procedure first described in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. A
slightly different presentation of the same decompounding procedure is given in section 4. Two small changes were
made in performing German retrieval with decompounding. Firstly, in both topic and document indexes, only the
component words resulted from decompounding were kept. When a compound was split into component words,
the compound itself was not indexed. Secondly, additional non-German words in the German base dictionary were
removed. Our current base dictionary still has 762,342 words, some being non-German words and some being
German compounds that should be excluded. It would take a major effort to clean up the base dictionary so that it
contains only the German words that should not be further decomposed. The decompounding procedure initially
developed for splitting up German compounds was also used to decompose Dutch compounds with a Dutch base
dictionary.
      </p>
      <p>For the submitted two official multilingual runs, one used unnormalized raw score to re-rank the documents
from intermediate runs to produce the unified ranked list of documents. The other run used normalized score
in the same way to produce the final list. To measure the effectiveness of different mergers, we developed an
algorithm for computing the best performance that could possibly be achieved by merging multiple ranked lists of
documents under the conditions that the relevances of the documents are known, and that the relative ranking of
the documents in individual ranked lists is preserved in the unified ranked list. That is, if document is ranked
higher than document in some ranked list, then in the unified ranked list, document should also be ranked
higher that document . The simple mergers based on unnormalized raw score, normalized raw score, or rank all
preserve the relative ranking order. This procedure cannot be used to predict merging, however it should be useful
for measuring the performance of merging algorithms. The procedure for producing optimal performance given</p>
    </sec>
    <sec id="sec-2">
      <title>Relevance Feedback</title>
    </sec>
    <sec id="sec-3">
      <title>Document Ranking</title>
      <p>
        All of our retrieval runs used the same document ranking formula developed at Berkeley [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to rank documents in
response to a query. The log odds of relevance of document with respect to query , denoted by ,
is given by
document relevances is presented in section 6.3.
      </p>
      <p>
        2
h a word contingency table, where is the number of documents in the collection, the number of top-ranked
It is well known that blind (also called pseudo) relevance feedback can substantially improve retrieval effectiveness.
It is commonly implemented in research text retrieval systems. For example, see the papers of the groups who
participated in the Ad Hoc tasks in TREC-7 [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and TREC-8 [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Blind relevance feedback is typically performed
in two stages. First, an initial search using the original queries is performed, after which a number of terms are
selected from the top-ranked documents that are presumed relevant. The selected terms are merged with the
original query to formulate a new query. Finally the new query is searched against the document collection to
produce a final ranked list of documents. The techniques for deciding the number of terms to be selected, the
number of top-ranked documents from which to extract terms, and ranking the terms varies.
      </p>
      <p>
        The Berkeley document ranking formula has been in use for many years without blind relevance feedback. In
this paper we present a technique for incorporating blind relevance feedback into the logistic regression-based
document ranking framework. Some of the issues involved in implementing blind relevance feedback include
determining the number of top ranked documents that will be presumed relevant and from which new terms will be
extracted, ranking the selected terms and determining the number of terms that should be selected, and assigning
weight to the selected terms. We refer readers to [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] for a survey of relevance feedback techniques.
      </p>
      <p>
        Two factors are import in relevance feedback. The first one is how to select the terms from top-ranked documents
after the initial search, the second is how to assign weight to the selected terms with respect to the terms in the
initial query. For term selection, we assume the top-ranked documents in the initial search are relevant, and the
rest of the documents in the collection are irrelevant. For the terms in the documents that are presumed relevant,
we compute the odds ratio of seeing a term in the set of relevant documents and in the set of irrelevant documents.
This is the term relevance weighting formula proposed by Robertson and Sparck Jones in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Table 1 presents
irrelevant
- Xm +
X m
      </p>
      <p>m
indexed
non-indexed</p>
      <sec id="sec-3-1">
        <title>Initial Query</title>
      </sec>
      <sec id="sec-3-2">
        <title>Selected Terms Table 1: A contingency table for a word. Table 2: Query expansion.</title>
        <p>X h n
h #B
n - mX
relevant
X h n
n
computing relevance probability using the initial query. For all the experiments reported below, we selected the
top 10 terms ranked by from 10 top-ranked documents in the initial search.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Decompounding</title>
      <p>
        It appears most German compounds are formed by directly joining two or more words. Such examples are
Computerviren (computer viruses), which is the concatenation of Computer and Viren, and Sonnenenergie (solar energy),
which is formed by joining sonnen and Energie together. Sometimes a linking element such as s or e is inserted
between two words. For example, the compound Scho¨nheitsko¨nigin (beauty queen) is derived from Scho¨nheit and
ko¨nigin with s inserted between them. There are also cases where compounds are formed with the final letter e of
the first word elided. For example, the compound Erdbeben (earthquake) is derived from Erde (earth) and Beben
(trembling). When the word Erde is combined with the word Atmoshpa¨re to create a compound, the compound is
not Erdeatmoshpa¨re, but Erdatmoshpa¨re. The final letter e of the word Erde is elided from the compound. We
refer readers to, for example, [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for discussions of German compounds formations. The example earthquake shows
compounds are also used in English, just not nearly as commonly used as in German.
      </p>
      <p>We present a German decompounding procedure in this section which will only address the cases where the
compounds are directly formed by joining words and the cases where the linking element s is inserted. The
procedure is described as follows:
1. Create a German base dictionary consisting of German words in various forms, but not compounds.
2. Decompose a German compound with respect to the base dictionary. That is, find all possible ways to break
up a compound with respect to the base dictionary.
3. Choose the decomposition of the minimum number of component words.
4. If there are more than one decompositions that have the smallest number of component words, choose the
one with the highest probability of decomposition. The probability of a decomposition is estimated by
product of the relative frequency of the component words. More details are presented below.</p>
      <p>For example, when the German base dictionary contains ball, europa, fuss, fussball, meisterschaft and others,
the German compound fussballeuropameisterschaft can be decomposed into component words with respect to the
base dictionary in two different ways as shown in Table 3. The last decomposition has the smallest number of
compound wintersports has three decompositions with respect to the base dictionary. Because two decompositions
have the smallest number of component words, the rule of selecting the decomposition with the smallest number
of component words cannot be applied here. We have to compute the probability of the decomposition for the
decompositions with the smallest number of component words. The last column in Table 4 shows the log of the
decomposition probability for all three decompositions that were computed using relative frequencies of the
components words in the German test collection. According to the rule of selecting the decomposition of the highest</p>
      <p>where the probability of component word
S</p>
      <p>
        is computed as follows:
h p&amp; 7 ?9’q’q’ P into component words, . The probability of a decomposition is computed as follows:
p is, the compound wintersports should be split into winter and sports. Consider the decomposition of compound
probability, the second decomposition should be chosen as the decomposition of the compound wintersports. That
k]mKpN where is the number of occurrences of word in a collection, is the number of unique words, including
compounds, in the collection. The occurrence frequency of a word is the number of times the word occurs alone in
the collection. The frequency count of a word does not include the cases where the word is a component word in a
larger compound. Also, the base dictionary does not contain any words that are three-letter long or shorter except
for the letter s. We created a German base dictionary of about 762,000 words by combining a lexicon extracted
from Morphy, a German morphological analyzer [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], German wordlists found on the Internet, and German words
in the CLEF-2001 German collection. In our implementation, we considered only the case where a compound is
the concatenation of component words, and the case where the linking element s is present. Note that the number
of possible decompositions of a compound is determined by what is in the base dictionary. For example, when the
word mittagessen (lunch) is not in the base dictionary, the compound mittagessenzeit (lunch time) would be split
into three component words mittag (noon), essen (meal), and zeit (time).
      </p>
      <p>It is not always desirable to split up German compounds into their component words. Consider again the
compound Erdbeben. In this case, it is probably better not to split up the compound. But in other cases like
Gemu¨seexporteure (vegetable exporters), Fußballweltmeisterschaft (World Soccer Championship), splitting up
the compounds probably is desirable since the use of the component words might retrieval additional relevant
documents which are otherwise likely to be missed if only the compounds are used. In fact, we noticed that the
compound Gemu¨seexporteure does not occur in the CLEF-2001 German document collection.</p>
      <p>In general, it is conceivable that breaking up compounds is helpful. The same phrase may be spelled out in
words sometimes, but as one compound other times. When a user formulate a German query, the user may not
know if a phrase should appear as multi-word phrase or as one compound. An example is the German
equivalent of the English phrase “European Football Cup”, in the title of topic 113, the German equivalent is spelled as
one compound Fussballeuropameisterschaft, but in the description field, it is Europameisterschaft im Fußball, yet
in the narrative field, it is Fußballeuropameisterschaft. This example brings out two points in indexing German
texts. First, it should be helpful to split compounds into component words. Second, normalizing the spelling of
ss and ß should be helpful. Two more such examples are Scheidungsstatistiken and Pra¨sidentschaftskandidaten.
The German equivalent of “divorce statistics” is Scheidungsstatistiken in the title field of topic 115, but
Statistiken u¨ber die Scheidungsraten in the description field. The German equivalent of “presidency candidates” is
Pra¨sidentschaftskandidaten in title field of topic 135, but Kandidat fu¨r das Pra¨sidentenamt in the description field
of the same topic. The German equivalent for “Nobel price winner for literature” is Literaturnobelpreistra¨ger, in
the “Der Spiegel” German collection, we find variants of Literatur-Nobelpreistra¨ger, Literaturnobelpreis-Trgerin.
Literaturnobelpreis sometimes appears as “Nobelpreis fu¨r Literatur”.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Test Collection</title>
      <p>
        The document collection for the multilingual IR task consists of documents in five languages: English, French,
German, Italian, and Spanish. The collection has about 750,000 documents which are newspaper articles published
in 1994 except that part of the Der Spiegel was published in 1995. The distribution of documents among the five
document languages is presented in Table 5. A set of 50 topics was developed and released in more than 10
languages, including Dutch, English, French, German, Italian, and Spanish. A topic has three parts: 1) title, a short
description of information need; 2) description, a sentence-long description of information need; and 3) narrative,
specifying document relevance criteria. More details about the test collection are presented in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The
multilingual IR task at CLEF 2002 was concerned with searching the collection consisting of English, French, German,
Italian, and Spanish documents for relevant documents, and returning a combined, ranked list of documents in any
document language in response to a query.
      </p>
      <sec id="sec-5-1">
        <title>Language</title>
      </sec>
      <sec id="sec-5-2">
        <title>Name</title>
      </sec>
      <sec id="sec-5-3">
        <title>English French</title>
      </sec>
      <sec id="sec-5-4">
        <title>German</title>
      </sec>
      <sec id="sec-5-5">
        <title>Italian</title>
      </sec>
      <sec id="sec-5-6">
        <title>Spanish Dutch</title>
      </sec>
      <sec id="sec-5-7">
        <title>Los Angeles Times</title>
        <p>Le Monde
SDA French
Frankfurter Rundschau
Der Spiegel
SDA German
La Stampa
SDA Italian
EFE
RC Handelsblad
Algemeen Dagblad
All retrieval runs reported in this paper used only the title and description fields in the topics. The ids and average
precision values of the official runs are presented in bold face, other runs are unofficial ones.
In this section we present the results of monolingual retrieval. We created a stopwords list for each document
language. In indexing, the stopwords were removed from both documents and topics. Additional words such as
relevant and document were removed from topics. The words in all six languages were stemmed using Muscat
stemmers downloaded from http://open.muscat.com. For automatic query expansion, the top-ranked 10 terms from
the top-ranked 10 documents after the initial search were combined with the original query to create the expanded
query. For Dutch and German monolingual runs, the compounds were split into their component words, and only
their component words were retained in document and topic indexing. All the monolingual runs included automatic
query expansion via the relevance feedback procedure described in section 3. Table 6 presents the monolingual
retrieval results for six document languages. The last column labeled change shows the improvement of average
precision with blind relevance feedback over without it. As table 6 shows, query expansion increased the average
precision of the monolingual runs for all six languages, the improvement ranging from 6.42% for Spanish to
19.42% for French. There are no relevant Italian documents for topic 120, and no relevant English documents for
run id
bky2moen
bky2monl
bky2mofr
bky2mode
bky2moit
bky2moes
language
English
Dutch
French
German
Italian
Spanish
without expansion
recall precision
765/821 0.5084
1633/1862 0.4446
1277/1383 0.4347
1696/1938 0.4393
994/1072 0.4169
2531/2854 0.5016</p>
        <p>with expansion
recall precision
793/821 0.5602
1734/1862 0.4847
1354/1383 0.5191
1807/1938 0.5234
1024/1072 0.4750
2673/2854 0.5338
change
10.19%
9.02%
19.42%
19.14%
13.94%
6.42%
topics 93, 96, 101, 110, 117, 118, 127 and 132.</p>
        <p>For the German monolingual runs, compounds were decomposed into their component words by applying the
decompounding procedure described above. Only component words of the decomposed compounds were kept
in document and topic indexing. Table 7 presents the performance of German monolingual retrieval with three
different features which are decompounding, stemming, and query expansion. The features are implemented in
the order of decompounding, stemming, and query expansion. For example, when decompounding and stemming
are present, the compounds are split into component words first, then the components are stemmed. The table
shows when any one of the three features is present, the average precision improves from 4.94% to 19.73% over
2
decomp
0.3859
1577
+11.47%
the baseline performance when none of the features is present. When two of the three features are included in
retrieval, the improvement in precision ranges from 26.89% to 30.47%. And when all three features are present,
the average precision is 51.18% better than the baseline performance. It is interesting to see the three features are
complementary. That is, the improvement brought by each individual feature is not diminished by the presence
of the other two features. Without decompounding, stemming alone improved the average precision by 4.94%.
However with decompounding, stemming improved the average precision from 0.3859 to 0.4393, an increase
of 13.84%. Stemming became more effective because of decompounding. Decompounding alone improved the
average precision by 11.47% for German monolingual retrieval.</p>
        <p>Table 8 presents the German words in the title or desc fields of the topics that were split into component words
using the decompounding procedure described in section 4. The column labeled component words shows the
component words of the decomposed compounds. The German word eurofighter was split into euro and fighter
since both component words are in the base dictionary, and the word eurofighter is not. Including the word
eurofigher in the base dictionary will prevent it from being split into component words. The word geographos was
decomposed into geog, rapho, and s for the same reason that the component words are in the base dictionary. Two
topic words, lateinamerika and zivilbevo¨ lkerung, were not split into component words because both are present in
our base dictionary which is far from being perfect. For the same reason, the preistra¨ gers was not decomposed into
preis and tra¨ gers. An ideal base dictionary should contain all and only the words that should not be further split
into smaller component words. Our current decompounding procedure does not split words in the base dictionary
into smaller component words. When the two compounds, lateinamerika and zivilbevo¨ lkerung, are removed from
the base dictionary, lateinamerika is split into latein and amerika, and zivilbevo¨ lkerung into zivil and bevo¨ lkerung.
The topic word su¨ djemen was not split into su¨ d and jemen because our base dictionary does not contain words that
are three-letter long or shorter. The majority of the errors in decompounding are caused by the incompleteness of
the base dictionary or the presence of compound words in the base dictionary.</p>
        <p>We used a Dutch stopword list of 1326 words downloaded from http://clef.iei.pi.cnr.it:2002/ for Dutch
monolingual retrieval. After removing stopwords, the Dutch words were stemmed using the muscat Dutch stemmer.
For Dutch decompounding, we used a Dutch wordlist of 223,557 words 1. From this wordlist we created a Dutch
base dictionary of 210,639 by manually breaking up the long words that appear to be compounds. It appears that
many Dutch compound words remain in the base dictionary. Like the German base dictionary, an ideal Dutch base
dictionary should include all and only the words that should not be further decomposed into smaller component
words. The Dutch words in the topics or desc fields of the topics were split into component words using the same
procedure as for German decompounding. Like German decompounding, the words in the Dutch base dictionary
are not decomposed. The source wordlist files contain a list of country names, which should have beed added to the
Dutch base dictionary. The Dutch words frankrijk and duitsland were split into component words because they are
not in the base dictionary. For the same reason, the word internationale was decomposed. It appears compound
words in Dutch are not as common as in German. Like in German indexing, when a compound was split into
component words, only the component words were retained in the index. Table 10 presents the performance of</p>
        <p>Dutch monolingual retrieval under various conditions. With no stemming and expansion, Dutch decompounding
improved the average precision by 4.10%. Together the three features improved the average precision by 20.54%
over the base performance when none of the features is implemented.</p>
        <p>features
avg prec
change
1
none
0.3239
baseline
2
decomp
0.3676
+13.49%
3
stem
0.3587
+10.74%
4
expan
0.3471
+7.16%
5
decomp+stem
0.4165
+28.59%
6
decomp+expan
0.3822
+18.00%
7
stem+expan
0.3887
+20.01%
8
decomp+stem+expan
0.4372
+34.98%</p>
        <p>For comparison, table 11 presents the Dutch monolingual performance on the CLEF 2001 test set.
Decompounding alone improved the average precision by 13.49%. Topic 88 of CELF 2001 is about mad cow diseases in
1downloaded from ftp://archive.cs.ruu.nl/pub/UNIX/ispell/words.dutch.gz
Europe. The Dutch equivalent of mad cow diseases is gekkekoeienziekte in the topic, but never occurs in the Dutch
collection. Without decompounding, the precision for this topic is 0.1625, and with decompounding, the precision
increased to 0.3216. The precision for topic 90 which vegetable exporters is 0.0128 without decompounding. This
topic contains two compound words, Groentenexporteurs and diepvriesgroenten. The former one which is perhaps
the most important term for this topic never occurs in the Dutch document collection. After decompounding, the
precision for this topic increased to 0.3443. Topic 55 contains two important compound words, Alpenverkeersplan
and Alpeninitiatief. Both never occur in the Dutch document collection. The precision for this topic is 0.0746
without decompounding, and increased to 0.2137 after decompounding.
6.2</p>
        <p>
          Cross-language Retrieval Experiments
A major factor affecting the end performance of cross-language retrieval and multilingual retrieval is the quality
of translation resources. In this section, we evaluate the effectiveness of three different translation resources:
automatic machine translation systems, parallel corpora, and bilingual dictionaries. Two of the issues in translating
topics are 1) determining the number of translations to retain when multiple candidate translations are available;
and 2) assigning weights to the selected translations [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. When machine translation systems are used to translate
topics, these two issues are resolved automatically by the machine translation systems, since they provides only
one translation for each word. However, when bilingual dictionaries or parallel corpora are used to translate topics,
often for a source word, there may be several alternative translations.
6.2.1
        </p>
        <sec id="sec-5-7-1">
          <title>CLIR Using MT</title>
          <p>In this section, we evaluate two machine translation systems, online Babelfish translation 2 and L&amp;H Power
Translator Pro, version 7.0, for translating topics in CLIR. We used both translators to translate the 50 English topics
into French, Italian, German, and Spanish. For each language, both sets of translations were preprocessed in the
same way. Table 12 presents the CLIR retrieval performances for all the official runs and additional runs. The
run id
bky2bienfr
bky2bienfr2
bky2bienfr3
bky2bienfr4
bky2bienfr5
bky2bidefr
bky2biende
bky2biende1
bky2biende2
bky2bifrde
bky2bienit
bky2bienit1
bky2bienit2
bky2bienes
bky2bienes1
bky2bienes2
bky2biennl
topic
English
English</p>
        </sec>
      </sec>
      <sec id="sec-5-8">
        <title>English</title>
        <p>English
English
German
English
English
English
French
English
English
English
English
English
English
English
document
French
French</p>
      </sec>
      <sec id="sec-5-9">
        <title>French</title>
        <p>French
French
French
German
German
German
German
Italian
Italian
Italian
Spanish
Spanish
Spanish
Dutch
resources
Babelfish + L&amp;H
Systran + L&amp;H +
Parallel Texts
Babelfish
L&amp;H
Parallel texts
Babelfish
Babelfish + L&amp;H
Babelfish
L&amp;H
Babelfish
Babelfish + L&amp;H
Babelfish
L&amp;H
Babelfish + L&amp;H
Babelfish
L&amp;H
Babylon
without
expansion
precision
0.4118
0.4223
with
expansion
precision
0.4773
0.4744
ids and average precision values for the official runs are in bold face. Last column in table 12 shows the
improvement of average precision with query expansion over without it. When both L&amp;H Translator and Babelfish
2publicly available at http://babelfish.altavista.com/
were used in cross-language retrieval from English to French, German, Italian and Spanish, the translation from
L&amp;H Translator and the translation from Babelfish were combined by topic. The term frequencies in the
combined topics were reduced by half so that the combined topics were comparable in length to the source English
topics. Then the combined translations were used to search the document collection for relevant documents as in
monolingual retrieval. For example, for the English-to-Italian run bky2bienit, we first translated the source English
topics into Italian using L&amp;H Translator and Babelfish. The Italian translations produced by L&amp;H Translator and
the Italian translations produced by Babelfish were combined by topic. Then the combined, translated Italian
topics with term frequencies reduced by half were used to search the Italian document collections. The bky2bienfr,
bky2biende, bky2bienes CLIR runs from English were all produced in the same way as the bky2bienit run. For
English or French to German cross-language retrieval runs, the words in title or desc fields of the translated German
topics were decompounded. For all cross-language runs, words were stemmed after removing stopwords like in
monolingual retrieval. The English-to-French run bky2bienfr2 was produced by merging the bky2bienfr run and
the bky2bienfr5 run which used parallel corpora as the sole translation resource. More discussion about the use of
parallel corpora will be presented below.</p>
        <p>All the cross-language runs applied blind relevance feedback. The top-ranked 10 terms from the top-ranked
10 documents after the initial search were combined with the initial query to formulate an expanded query. The
results presented in table 12 show that query expansion improved the average precision for the official runs from
10.85% to 29.36%. The L&amp;H Translator performed better than Babelfish for cross-language retrieval from English
to French, German, Italian and Spanish. Combining the translations from L&amp;H Translator and Babalfish performed
slightly better than using only the translations from L&amp;H translator.</p>
        <p>We notices a number of error in translating English to Italian using Babelfish. For example, the English text
Super G which was translated into Superg, U.N. and U.S.-Russian were not translated. While the phrase Southern
Yemen in the desc field was correctly translated into Su¨dyemen, the same phrase in the title field became Su¨dcYemen.
Decompounding is helpful in monolingual retrieval, it is also helpful in cross-language retrieval to German from
source
English
English
French
target
German
German
German
translator
L&amp;H Translator
Babelfish
Babelfish
other languages such as English. An English phrase of two words may be translated into a German phrase of
two words, or into a compound. For examples, in topic 111, the English phrase computer animation in title
became ComputerAnimation, and Computer Animation in desc. In topic 109, the English phrase Computer Security
became Computer-Sicherheit in the title, but the same phrase in lower case in desc became Computersicherheit.
Table 13 shows the performances of three cross-language retrieval to German with and without decompounding.
The improvement in average precision ranges from 8.4% to 13.78%.
6.2.2</p>
        <sec id="sec-5-9-1">
          <title>English-French CLIR Using Parallel Corpora</title>
          <p>
            We created a French-English bilingual lexicon from the Canadian Hansard (the recordings of the debates of the
House for the period of 1994 to 2001). The texts are in English and French. We first aligned the Hansard corpus at
the sentence level using the length-based algorithm proposed by Gale and Church [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ], resulting in about two million
aligned French/English sentence pairs. To speed up the training (i.e, estimating word translation probabilities), we
extracted and used only the sentence pairs that contain at least one English topic word in CLEF-2001 topics. A
number of preprocessing steps were carried out prior to the training. First, we removed the English stopwords
from the English sentences, and French stopwords from the French sentences. Secondly, we changed the variants
of a word into its base form. For English, we used a morphological analyzer described in [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]. For French, we used
a French morphological analyzer named DICO. Each of the packages contains a list of words together with their
morphological analyses. Thirdly, we discarded the sentence pairs in which one of the sentence has 40 or more
words after removing stopwords, and the sentence pairs in which the length ratio of the English sentence over the
French sentence is below .7 or above 1.5. The average length ratio of English text over French text is approximately
1.0. Since sentence alignment is not perfect, some mis-alignments are unavoidable. Hence there may be sentence
pairs in which the length ratios that deviate far from the average length ratio. After the preprocessing, only 706,210
pairs of aligned sentences remained. The remaining aligned sentence pairs were fed to GIZA++ for estimating
English-to-French word translation probabilities. GIZA++ toolkit is an extension to the EGYPT toolkit [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] which
was based on the statistical machine translation models described in [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ]. Readers are referred to [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] for more
details on GIZA++. The whole training phase took about 24 hours on a Sun Microsystem Sparc server machine.
Table 14 shows the first three French translations produced by GIZA++ for some of the words in the English topics.
          </p>
          <p>The French translations are ranked in descending order by the probability of translating from an English word into
French words. In translating an English word into French, we selected only one French word, the one of the highest
translation probability, as the translation. The English topics were translated into French word-by-word, then the
translated French topics were used in producing the English-to-French run labeld bky2bienfr5 in table 12. Without
query expansion, the parallel corpus-based English-French CLIR performance was slightly better than that of using
Babelfish, but slightly lower than that of using L&amp;H translator.</p>
          <p>The CLEF 2002 English topics contain a number of polysemous words such as cup, fall, interest, lead, race,
right, rock, star, and the like. The word fall in the context of fall in sale of cars in topic 106 has the meaning
of declining. However, the most likely French translation for fall as table 14 shows is automne, meaning autumn
in English. The word race in ski race in topic 102 or in race car in topic 121 has the meaning of contest or
competition in speed. Again the French word of the highest translation probability is race, meaning human race in
English. The corrent French translation for the sense of race in ski race or car race should be course. The word
star in topic 129 means a plant or celestial body, while in topic 123 in the context of pop star, it means a famous
performer. The correct translation for star in topic 129 should be e´toile, instead of the most likely translation star,
which is the correct French word for the sense of star in pop star. The word rock in topic 130 has the same sense
as rock in rock music, not the sense of stone. The correct translation for rock in topic 130 should be rocke. In the
same topic, the word lead in lead singer means someone in the leading role, not the metal. These examples show
that taking the French word of the highest translation probability as the translation for an English word is overly
simplified. Choosing the right French translations would require word sense disambiguation.
7 Dutch words of the English word received the same weight , i.e., the translated Dutch words were weighted
For the only English-to-Dutch run bky2biennl, the English topics were translated into Dutch by looking up each
English topic word, excluding stopwords, in the online English-Dutch dictionary Babylon 3. All the Dutch words
in the dictionary lookup results were retained except for Dutch stopwords. The Dutch compound words were
split into component words. If translating an English topic word resulted in Dutch words, then all translated
uniformly. The average precision of the English-to-Dutch run is 0.3199, which is much lower than 0.4847 for
Dutch monolingual retrieval.
In this section, we describe our multilingual retrieval experiments using the English topics (only title and
description fields were indexed). As mentioned in the cross-language experiments section above, we translated the English
topics into the other four document languages which are French, German, Italian, and Spanish using Babelfish and
L&amp;H Translator. A separate index was created for each of the five document languages. For the multilingual
retrieval runs, we merged five ranked lists of documents, one resulted from English monolingual retrieval and four
resulted from cross-language retrieval from English to the other four document languages, to produce a unified
ranked list of documents regardless of language.</p>
          <p>A fundamental difference between merging in monolingual retrieval or cross-language retrieval and merging
in multilingual retrieval is that in monolingual or cross-language retrieval, documents for individual ranked lists
are from the same collection, while in multilingual retrieval, the documents for individual ranked lists are from
different collections. For monolingual or cross-language retrieval, if we assume that documents appearing on more
than one ranked list are more likely to be relevant than the ones appearing on a single ranked list, then we should
rank the documents appearing on multiple ranked lists in higher position in the merged ranked list of documents.
A simple way to accomplish this is to sum the probability of relevance for the documents appearing on multiple
ranked lists while the probabilities of relevance for the documents appearing on a single list remain the same.
After summing up the probabilities, the documents are re-ranked in descending order by combined probability of
relevance. In multilingual retrieval merging, since the documents on the individual ranked lists are all different,
we cannot use multiple appearances of a document in the ranked lists as evidence to promote its rank in the final
ranked list. The problem of merging multiple ranked lists of documents in multilingual retrieval is closely linked
to estimating probability of relevance. If the estimates of probability of relevance are accurate and well calibrated,
then one can simply combine the individual ranked lists and then re-rank the combined list by the raw probability
of relevance. In practice, estimating relevance probabilities is a hard problem.</p>
          <p>We looked at the estimated probabilities of relevance produced using the ranking formula described in section 2
for the CLEF 2001 topics to see if there is a linear relationship between the number of relevant documents and the
number of documents whose estimated probabilities of relevance are above some threshold. Figure 1 shows the
scatter plot of the number of retrieved documents whose estimated relevance probabilities are above 0.37 versus the
number relevant documents for the same topic. Each dot in the figure represents one French topic. The ranked list
of documents was produced using the 50 French topics of CLEF 2001 to search against the French collection with
query expansion. The top-ranked 10 terms from top-ranked 10 documents in the initial search were merged with
initial query to create the expanded query. The threshold of 0.37 was chosen so that the total number of documents
for all 50 topics whose estimated relevance probabilities are above the threshold is close to the total number of
relevant documents for the same set of topics. If the estimated probabilities are good, the dots in the figure would
appear along the diagonal line. The figure shows there is no linear relationship between the number of retrieved
documents whose relevance probabilities are above the threshold and the number of relevant documents for the
same topic. This implies one cannot use the raw relevance probabilities to directly estimate the number of relevant
documents for a topic in a test document collection.</p>
          <p>There are a few simple ways to merge ranked lists of documents from different collections. Here we will
evaluate two of them. The first method is to combine all ranked lists, sort the combined list by the raw relevance
score, then take the top 1000 documents per topic. The second method is to normalize the relevance score for
each topic, dividing all relevance scores by the relevance score of the top most ranked document for the same
topic. Table 15 presents the multilingual retrieval performance with different merging strategies. The multilingual
runs were produced by merging from five runs: bky2moen (English-English, 0.5602), bky2bienfr (English-French,
0.4773), bky2biende (English-German, 0.4479), bky2bienit (English-Italian, 0.4090), and bky2bienes
(EnglishSpanish, 0.4567). The run bky2muen1 was produced by ranking the documents by the unnormalized relevance
3available at http://www.babylon.com
Mh]o S S ]o Q 7Dq’q’D’qMo N o n entry in table 17 has the form , where is the id of the document ranked in the th
o Mo Q 7Dq’D’q’qMo N position in the original ranking. denotes a set of consecutive irrelevant and relevant documents
probabilities after combining the individual runs. And the run bky2muen2 was produced in the same way except
that the relevance probabilities were normalized before merging. For each topic, the relevance probabilities of
the documents was divided by the relevance probability of the highest-ranked document for the same topic. The
simplest direct merging outperformed the score normalizing merging strategy. We did two things to make the
relevance probabilities of documents from different language collections comparable to each other. Firstly, as
mentioned in section 6.2.3, after concatenating the topic translations from two translators, we reduced the term
frequencies by half so that the translated topics are close to the source English topics in length. Secondly, in query
expansion, we took the same number of terms (i.e, 10) from the same number of top-ranked documents (i.e, 10)
after the initial search for all five individual runs that were used to produce the multilingual runs.</p>
          <p>In the remainder of this section, we present a procedure for computing the optimal performance that could
possibly be achieved under the constraint that the relative ranking of the documents in the individual ranked lists
is preserved. This procedure assumes that the relevances of documents are known, thus it is not useful to to
predict ranks of documents in the final ranked list for multilingual retrieval. However, knowing the upper-bound
performance for a set of ranked lists of documents and the related document relevances is useful in measuring the
performance of different merging strategies. We will use an example to explain the procedure. Let us assume we
are going to merge three runs labeled , and , as shown in table 16. The relevant documents are marked with
an ‘*’. We want to find a combined ranked list such that the average precision is maximized without changing
the relative rank order of the documents on the same ranked list. First we transform the individual runs shown
in table 16 into the form shown in table 1S 7 bS y grouping the consecSutive irrelevant and relevant documents. Each
E *
-J *
_7
A? *
*
,? ,4E
7 7 ument. For the example presented in table 17, the initial active set is (0,1) , (2,1) ,? ,4E , (1,3)
implemented in four steps.</p>
          <p>Step 1: Let the active set consist of the first set in the individual lists that contains at least one relevant
doch sort the active set by as the major order in increasing order, and by as the minor order in decreasing order,
l&gt;7 ,? ,AE ,J</p>
          <p>Step 2: Choose the element in the active set with the smallest number of irrelevant documents. If there are two
or more elements with the smallest number of irrelevant documents, then choose the element that also contains
the largest number of relevant documents. If there are two or more elements with the same smallest number of
irrelevant documents and the same largest number of relevant documents in the current active set, randomly choose
one of them. Append the selected element to the final ranked list. If the next set appearing immediately after the
selected element contains at least one relevant document, then add the next set to the current active set. That is,
then take out the first element and put it in the final ranked list.</p>
          <p>Step 3: Repeat step 2 until the current active set is empty.</p>
          <p>Step 4: If the final ranked list has less than 1000 documents, append more irrelevant documents drew from any
individual list to the final ranked list.</p>
          <p>The optimal ranking after reordering the sets is presented in table 18</p>
          <p>The upper-bound average precision for the set of runs used for producing our official multilingual runs is 0.5177
with overall recall of 6392/8068. The performances of the direct merging and score-normalizing merging are far
below the upper-bound performance that could possibly be achieved.
7</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>We have presented a technique for incorporating blind relevance feedback into a document ranking formula based
on logistic regression analysis. The improvement in average precision brought by query expansion via blind
relevance feedback ranges from 6.42% to 19.42% for monolingual retrieval runs, and from 10.85% to 29.36% for
cross-language retrieval runs. We have also presented a procedure to decompose German compounds and Dutch
compounds. German decompounding improved the the average precision of German monolingual retrieval by
11.47%. Decompounding increased the average precision for cross-language retrieval to German from English or
French. The increase ranges from 8.4% to 11.46%. For Dutch monolingual retrieval, decompounding increased the
average precision by 4.10%, which is much lower than the improvement of 13.49% on CLEF 2001 test set. In
summary, both blind relevance feedback and decompounding in German or Dutch have been shown to be effective in
monolingual and cross-language retrieval. The amount of improvement of performance by decompounding varies
from one set of topics to another. Three different translation resources, machine translators, parallel corpora, and
bilingual dictionaries, were evaluated on cross-language retrieval. We found that the English-French CLIR
performance of using parallel corpora was competitive with that of using commercial machine translators. Two different
merging strategies in multilingual retrieval were evaluated. The simplest strategy of merging individual ranked
lists of documents by unnormalized relevance score worked better than the one first normalizing the relevance
score. To make the relevance scores of the documents from different collections as closely comparable as possible,
we selected the same number of terms from the same number of top-ranked documents after the initial search for
query expansion in all the runs that were combined to produce the unified ranked lists of documents in multiple
languages. We used two machine translators to translate English topics to French, German, Italian and Spanish,
and combined by topic the translations from the two translators. We reduced the term frequencies in the combined
translated topics by half so that the combined translated topics are close in length to the source English topics.
We presented an algorithm for generating the optimal ranked list of documents when the document relevances are
known. The optimal performance can then be used to measure the performances of different merging strategies.
8</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We would like to thank Vivien Petras for improving the German base dictionary. This research was supported
by DARPA under research grant N66001-00-1-8911 (PI: Michael Buckland) as part of the DARPA Translingual
Information Detection, Extraction, and Summarization Program (TIDES).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Al-Onaizan</surname>
          </string-name>
          et al.
          <source>Statistical machine translation</source>
          ,
          <source>final report, JHU workshop</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A. D.</given-names>
            <surname>Pietra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. J. D.</given-names>
            <surname>Pietra</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Mercer</surname>
          </string-name>
          .
          <article-title>The mathematics of statistical machine translation: Parameter estimation</article-title>
          .
          <source>Computational linguistics</source>
          ,
          <volume>19</volume>
          :
          <fpage>263</fpage>
          -
          <lpage>312</lpage>
          ,
          <year>June 1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>Multilingual information retrieval using english and chinese queries</article-title>
          . In C. Peters,
          <string-name>
            <given-names>M.</given-names>
            <surname>Braschler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          , and M. Kluck, editors,
          <source>Evaluation of Cross-Language Information Retrieval Systems: Second Workshop of the CrossLanguage Evaluation Forum</source>
          , CLEF-2001, Darmstadt, Germany,
          <year>September 2001</year>
          , pages
          <fpage>44</fpage>
          -
          <lpage>58</lpage>
          . Springer Computer Scinece Series LNCS 2406,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>W. S.</given-names>
            <surname>Cooper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F. C.</given-names>
            <surname>Gey</surname>
          </string-name>
          .
          <article-title>Full text retrieval based on probabilistic equations with coefficients fitted by logistic regression</article-title>
          . In D. K. Harman, editor,
          <source>The Second Text REtrieval Conference (TREC-2)</source>
          , pages
          <fpage>57</fpage>
          -
          <lpage>66</lpage>
          ,
          <year>March 1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M. Z.</given-names>
            <surname>Daniel Karp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Yves</given-names>
            <surname>Schabes</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Egedi</surname>
          </string-name>
          .
          <article-title>A freely available wide coverage morphological analyzer for english</article-title>
          .
          <source>In Proceedings of COLING</source>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fox</surname>
          </string-name>
          .
          <article-title>The Structure of German</article-title>
          . Clarendon Press, Oxford,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>W. A.</given-names>
            <surname>Gale</surname>
          </string-name>
          and
          <string-name>
            <given-names>K. W.</given-names>
            <surname>Church</surname>
          </string-name>
          .
          <article-title>A program for aligning sentences in bilingual corpora</article-title>
          .
          <source>Computational linguistics</source>
          ,
          <volume>19</volume>
          :
          <fpage>75</fpage>
          -
          <lpage>102</lpage>
          ,
          <year>March 1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Grefenstette</surname>
          </string-name>
          , editor.
          <source>Cross-language information retrieval</source>
          . Kluwer Academic Publishers, Boston, MA,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Harman</surname>
          </string-name>
          .
          <article-title>Relevance feedback and other query modification techniques</article-title>
          . In W. Frakes and R. Baeza-Yates, editors,
          <source>Information Retrieval: Data Structures &amp; Algorithms</source>
          , pages
          <fpage>241</fpage>
          -
          <lpage>263</lpage>
          . Prentice Hall,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>W.</given-names>
            <surname>Lezius</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rapp</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Wettler</surname>
          </string-name>
          .
          <article-title>A freely available morphological analyzer, disambiguator and context sensitive lemmatizer for german</article-title>
          .
          <source>In COLING-ACL'98</source>
          , pages
          <fpage>743</fpage>
          -
          <lpage>748</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>F. J.</given-names>
            <surname>Och</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Ney</surname>
          </string-name>
          .
          <article-title>A comparison of alignment models for statistical machine transl ation</article-title>
          .
          <source>In COLING00</source>
          , pages
          <fpage>1086</fpage>
          -
          <lpage>1090</lpage>
          , Saarbru¨cken, Germany,
          <year>August 2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Peters</surname>
          </string-name>
          , editor.
          <source>Cross Language Information Retrieval and Evaluation: Proceedings of the CLEF 2000 Workshop</source>
          . Springer Computer Scinece Series LNCS
          <year>2069</year>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C.</given-names>
            <surname>Peters</surname>
          </string-name>
          , editor.
          <source>Working Notes of the CLEF 2001 Workshop 3 September</source>
          , Darmstadt, Germany.
          <year>September 2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Robertson</surname>
          </string-name>
          and
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Jones</surname>
          </string-name>
          .
          <article-title>Relevance weighting of search terms</article-title>
          .
          <source>Journal of the American Society for Information Science</source>
          , pages
          <fpage>129</fpage>
          -
          <lpage>146</lpage>
          , May-June
          <year>1976</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>E.</given-names>
            <surname>Voorhees</surname>
          </string-name>
          and D. Harman, editors.
          <source>The Seventh Text Retrieval Conference (TREC-7)</source>
          . NIST,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>E.</given-names>
            <surname>Voorhees</surname>
          </string-name>
          and D. Harman, editors.
          <source>The Eighth Text Retrieval Conference (TREC-8)</source>
          . NIST,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>