<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Sisay Fissaha Adafre Willem Robert van Hage Jaap Kamps∗ Gustavo Lacerda de Melo Maarten de Rijke</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Currently at Archives and Information Studies, Faculty of Humanities, University of Amsterdam</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Informatics Institute, University of Amsterdam Kruislaan 403</institution>
          ,
          <addr-line>1098 SJ Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2004</year>
      </pub-date>
      <abstract>
        <p>We describe the official runs of our team for the CLEF 2004 ad hoc tasks. We took part in the monolingual task (for Finnish, French, Portuguese, and Russian), in the bilingual task (for Amharic to English, and English to Portuguese), and, finally, in the multilingual task. In the CLEF 2004 evaluation exercise we participated in all three ad hoc retrieval tasks. We took part in the monolingual tasks for four non-English languages, Finnish, French, Portuguese, and Russian. The Portuguese language was new for CLEF 2004. Our participation in the monolingual task was a further continuation of our earlier efforts to monolingual retrieval [11, 5, 6]. Our first aim was to continue our experiments with a number of language-dependent techniques, in particular stemming algorithms for all European languages [14], and compound splitting for the compound rich Finnish language. A second aim was to continue our experiments with languageindependent techniques, in particular the use of character n-grams, where we may also index leading and ending character sequences, and retain the original words. Our third aim was to experiment with combinations of runs. We took part in the bilingual task, this year focusing on Amharic into English, and on English to Portuguese. Our bilingual runs were motivated by the following aims. Our first aim was to experiment with a language for which resources are few and far between, Amharic, and to see how far we could get by combining the scarcely available resources. Our second aim was to experiment with the relative effectiveness of a number of translation resources: machine translation [16] versus a parallel corpus [7], and query translation versus collection translation. Our third aim was to evaluate the effectiveness of our monolingual retrieval approaches for imperfectly translated queries, shedding light on the robustness of these approaches. Finally, we continued our participation for the multilingual task, where we experimented with straightforward ways of query translation, using machine translation whenever available, and a translation dictionary otherwise. We also experimented with combination methods using runs made on varying types of indexes. In Section 2 we describe the FlexIR system as well as the approaches used for each of the tasks in which we participated. Section 3 describes our official retrieval runs for CLEF 2004. In Section 4 we discuss the results we have obtained. Finally, in Section 5, we offer some conclusions regarding our document retrieval efforts.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>2.1</p>
      <p>Text normalization. We do some limited text normalization by removing punctuation, applying case-folding,
and mapping diacritics to the unmarked characters. The Cyrillic characters used in Russian can appear in a variety
of font encodings. The collection and topics are encoded using the UTF-8 or Unicode character encoding. We
converted the UTF-8 encoding into KOI8 (Kod Obmena Informatsii), a 1-byte per character encoding. We did
all our processing, such as lower-casing, stopping, stemming, and n-gramming, on documents and queries in this
KOI8 encoding. Finally, to ensure proper indexing of the documents using our standard architecture, we converted
the resulting documents into the Latin alphabet using the Volapuk transliteration. We processed the Russian queries
similar to the documents.</p>
      <p>
        Morphological Normalization. We carried out extensive experiments with different forms of morphological
normalizations for monolingual retrieval [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. These include the following:
      </p>
      <p>
        Stemming — For all languages we used a stemming algorithm to map word forms to their underlying stems.
Stemming is a language-dependent approach to morphological normalization. We used the family of Snowball
stemming algorithms, available for all the languages of the CLEF collections. Snowball is a small string processing
language designed for creating stemming algorithms for use in information retrieval [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>Decompounding — For the compound-rich Finnish language, we also apply a decompounding algorithm. We
treat all the words occurring in the Finnish collection as potential base words for decompounding, and also use
their associated collection frequencies. We ignore words of length less than four as potential compound parts, thus
a compound must have at least length eight. As a safeguard against oversplitting, we only regard compound parts
that have a higher collection frequency than the compound itself. We retain the original compound words, and add
their parts to the documents; the queries are processed similarly.</p>
      <p>n-Gramming — For all languages, we used character n-gramming to index all character-sequences of a given
length that occur in a word. n-Gramming is a language-independent approach to morphological normalization. We
used three different ways of forming n-grams of length 4. First, we index pure 4-grams. For example, the word
Information will be indexed as 4-grams info nfor form orma rmat mati atio tion. Second, we index
4grams with leading and ending 3-grams. For the example this will give inf info nfor form orma rmat mati
atio tion ion . Third, we index 4-grams plus original words. For the example this gives info nfor form
orma rmat mati atio tion information.</p>
      <p>
        Stopwords. Both topics and documents were stopped using the stopword lists from the Snowball stemming
algorithms [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]; for Finnish we used the Neuchaˆtel-stoplist [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Additionally, we removed topic specific phrases
such as ‘Find documents that discuss . . . ’ from the queries. We did not use a stop stem or stop n-gram list, but we
first used a stop word list, and then stemmed/n-grammed the topics and documents.
      </p>
      <p>
        Blind Feedback. Blind feedback was applied to expand the original query with related terms. We experimented
with different schemes and settings, depending on the various indexing methods and retrieval models used. For our
Lnu.ltc runs term weights were recomputed by using the standard Rocchio method [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], where we considered the
top 10 documents to be relevant and the bottom 500 documents to be non-relevant. We allowed at most 20 terms
to be added to the original query.
      </p>
      <p>
        Combined Runs. We combined various ‘base’ runs using either a weighted or unweighted combination methods.
The weighted interpolation was produced as follows. First, we normalized the retrieval status values (RSVs),
since different runs may have radically different RSVs. For each run we reranked these values in [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] using
RSVi0 = (RSVi − mini)/(maxi − mini); this is the Min Max Norm considered in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Next, we assigned new weights
to the documents using a linear interpolation factor λ representing the relative weight of a run: RSVnew = λ · RSV1 +
(1 − λ) · RSV2. For λ = 0.5 this is similar to the simple (but effective) combSUM function used by Fox and Shaw [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
The interpolation factors λ were loosely based on experiments on earlier CLEF data sets. When we combine more
than two runs, we give all runs the same relative weight, effectively resulting in the familiar combSUM.
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>Runs</title>
      <p>We submitted a total of 24 retrieval runs: 12 for the monolingual task, 7 for the bilingual task, and 5 for the
multi-lingual task. Below we discuss these runs in some detail.
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Monolingual Runs</title>
      <p>All our monolingual runs used the title and description fields of the topics. We constructed five different indexes
for each of the languages using Words, Stems, 4-Grams, 4-Grams+start/end, and 4-Grams+Words:
• Words — no morphological normalization is applied, although for Finnish Split indicates that words are
decompounded.
• Stems — topic and document words are stemmed using the morphological tools described in Section 2. For</p>
      <p>Finnish, Split+stem indicates that compounds are split, where we stem the words and compound parts.
• n-Grams — both topic and document words are n-grammed, using the settings discussed in Section 2.</p>
      <p>We have three different indexes: 4-Grams; 4-Grams+words where also the words are retained; and
4Grams+start/end with beginning and ending 3-grams.</p>
      <p>
        On all these indexes we made runs using the Lnu.ltc retrieval model; on the Words and on the Stems index we
also made runs with a language model, resulting in 7 base runs for French, Portuguese, and Russian. In addition,
for the compound rich Finnish language we also applied a decompounding algorithm [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], on words and on stems,
from which we produced base runs with both the Lnu.ltc retrieval model and a language model, leading to a total
of 11 base runs for Finnish.
      </p>
      <p>All our official submissions were combinations of the base runs just described. For each of the four languages
we constructed two combinations of stemmed and n-grammed base runs, as well as a “grand” combination of all
base runs. Table 1 provides an overview of the runs that we submitted for the monolingual task. The third column
in Table 1 indicates the type of run, and for two-way combinations the interpolation factor λ used is given in the
fourth column.</p>
      <p>Run
UAmsC04FiFi4GiSb
UAmsC04FiFi4GiWd
UAmsC04FiFiAll
UAmsC04FrFr4GiSb
UAmsC04FrFr4GiWd
UAmsC04FrFrAll
UAmsC04PoPo4GiSb
UAmsC04PoPo4GiWd
UAmsC04PoPoAll
UAmsC04RuRu4GiSb
UAmsC04RuRu4GiWd
UAmsC04RuRuAll</p>
      <sec id="sec-3-1">
        <title>Language</title>
        <p>FI
FI
FI
FR
FR
FR
PT
PT
PT
RU
RU
RU</p>
      </sec>
      <sec id="sec-3-2">
        <title>Type</title>
        <p>4-Grams+words;Split+stem
4-Grams+start/end;Split
Grand combination of 11 runs
4-Grams+words;Stems
4-Grams+start/end;Words
Grand combination of 7 runs
4-Grams+words;Stems
4-Grams+start/end;Words
Grand combination of 7 runs
4-Grams+words;Stems
4-Grams+start/end;Words
Grand combination of 7 runs
Factor
0.4
0.4
0.6
0.6
0.4
0.4
0.5
0.5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Bilingual Runs</title>
      <p>
        For the bilingual task, we focused on Amharic to English, and English to Portugues. We submitted a total of 7
runs; all of them used the title and description fields of the topics. For our bilingual runs, we experimented with
the WorldLingo machine translation [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] for translations into Portuguese, with a parallel corpus for translations
into Portuguese, and with a variety of techniques for the Amharic topics, as we will now explain.
3.2.1
      </p>
      <sec id="sec-4-1">
        <title>English to Portuguese</title>
        <p>
          Machine Translation. We used the WorldLingo machine translation [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] for translating the English topics into
Portuguese. The translation is actually in Brazilian Portuguese, but the linguistic differences between Portuguese
and Brazilian are fairly limited.
        </p>
        <p>
          Parallel Corpus. We used the sentence-aligned parallel corpus [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], based on the Official Journal of the European
Union [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. We built a Portuguese to English translation dictionary, based on a word alignment in the parallel
corpus. Since the word order in English and Portuguese are not very different, we only considered potential
alignments with words in the same position, or one or two positions off. We ranked potential translations with a
score based on:
• Cognate matching — Rewarding similarity in word forms, by looking at the number of leading characters
that agree in both languages.
• Length matching — Rewarding similarity in word lengths in both languages.
        </p>
        <p>• Frequency matching — Rewarding similarity in word frequency in both languages.</p>
        <p>To further aid the alignment, we constructed a list of 100 most frequent Portuguese words in the corpus, and
manually translated these to English. The alignments of these highly frequent words were resolved before the word
alignment phase. We built a Portuguese to English translation dictionary by choosing the most likely translation,
where we only include words that score above a threshold. The length of the translation dictionary is 19,554 words.
We use the translation dictionary resulting from the parallel corpus for two different purposes. Firstly, we translate
the English topics into Portuguese. Secondly, we translate the Portuguese collection into English.
3.2.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Amharic to English</title>
        <p>
          Amharic, which belongs to the Semitic family of languages, is one of the most widely spoken languages in
Ethiopia. In Amharic, word formation involves affixation, reduplication, Semitic stem interdigitation, among
others. The most characteristic feature of Amharic morphology is root-pattern phenomena. This is especially true
of Amharic verbs, which rely heavily on the arrangement of consonants and vowels in order to code different
morphosyntactic properties (such as perfect, imperfect, jussive etc.). Consonants, which mostly carry the semantic
core of the word, form the root of the verb. Consonants and vowel patterns together constitute the stems, and stems
take different types of affixes (prefixes and suffixes) to form the fully inflected words; see [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>For our bilingual Amharic to English runs, we attempted to show how the (minimal) available resources for
Amharic can be used in (Amharic-English) bilingual information retrieval settings. Since English is used on the
document side, it is interesting to see how the existing retrieval techniques can be optimized in order to make the
best use of the output of the error-prone translation component.</p>
        <p>Resources and Query Translation. Our Amharic to English query translation is based mainly on dictionary
look up. We used an Amharic-English bilingual dictionary which consists of 15,000 fully inflected words. Due
to the morphological complexity of the language, we expected the dictionary to have limited coverage. In order
to improve on the coverage, two further dictionaries, root-based and stem-based, were derived from the original
dictionary. We also tried to augment the dictionary with a bilingual lexicon extracted from aligned
AmharicEnglish Bible text. However, most of the words are old English words and are also found in the dictionary. The
word dictionary also contains commonly used Amharic collocations. Multiword collocations were identified and
marked in the topics. For this purpose, we used a list of multiword collocations extracted from an Amharic text
corpus. The dictionaries were searched for a translation of Amharic words in the following order: word-dictionary,
stem dictionary, root dictionary.</p>
        <p>Total no. of words
1,893</p>
        <p>Word dictionary
813</p>
        <p>Root dictionary
178</p>
        <p>English spell checker
57
Leaving aside the ungrammaticality of the output of the above translation, there are a number of problems. One is
the problem of unknown words. The words may be Amharic words not included in the dictionary or foreign words.
Some foreign words and their transliteration have the same spelling or are nearly identical. To take advantage of
this fact, a word is checked using an English spellchecker (Aspell); if found, it is returned as a translation. In some
cases, there may be typographical variations between the English word and its transliteration; to address this, the
first word among the suggestions will be checked for string similarity. If it falls above some threshold, it is taken
as translation. Other unknown words are simply passed over to the English translation. Another problem relates
to the selection of the appropriate translation from among the possible translations found in the dictionary. In the
absence of frequency information, which allows selecting the right translation, the most frequently used English
word is selected as a translation of the corresponding Amharic word. This is achieved by querying the web. The
coverage of the translation is 55%. The number of correct translations is still lower. Table 2 gives some idea of the
performance of the translation strategy.</p>
        <p>For both English and Portuguese we used a similar set of indexes as for the monolingual runs described earlier
(Words, Stems, 4-Grams, 4-Grams+start/end, 4-Grams+words); for all of these, Lnu.ltc runs were produced, and
for the Word and Stems indexes we also produced a language model run, leading to 7 base runs for the Amharic to
English task. Additionally, for the English to Portuguese task we used three types of translation: query translation
using machine translation (WorldLingo), query translation using a parallel corpus (query EU), and collection
translation using a parallel corpus (collection EU). This gave rise to a total of 21 base runs for the English to Portuguese
task.</p>
        <p>Table 3 provides an overview of the runs that we submitted for the bilingual task. The fourth column in Table 3
indicates the type of run.</p>
        <p>Run
UAmsC04EnPo4GiSb
UAmsC04EnPo4iSPC
UAmsC04EnPo4iSWL
UAmsC04EnPoAll
UAmsC04AmEnWrd
UAmsC04AmEn4GiSb
UAmsC04AmEnAll</p>
        <sec id="sec-4-2-1">
          <title>Topics</title>
          <p>EN
EN
EN
EN
AM
AM
AM</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>Documents</title>
          <p>PT
PT
PT
PT
EN
EN
EN</p>
        </sec>
        <sec id="sec-4-2-3">
          <title>Type</title>
          <p>
            4-Grams+words;Stems (collection EU)
4-Grams+words;Stems (query EU)
4-Grams+words;Stems (WorldLingo)
Grand combination of 21 runs
Words
4-Grams+words;Stems
Grand combination of 7 runs
We submitted a total of 4 multilingual runs, all using the title and description of the English topic set. The
multilingual runs were based on the following mono- and bilingual runs:
• English to English – This is just a monolingual run, similarly processed as the other monolingual runs above.
• English to Finnish — We translated the English topics into Finnish using the Mediascape on-line
dictionary [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ]. For words present in the dictionary, we included all possible translations available. For words not
present in the dictionary, we simply retained the original English words.
• English to French — We translated the English topics into French using the WorldLingo machine
translation [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ].
• English to Russian — Again, we translated the English topics into Russian using the WorldLingo machine
translation [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ].
          </p>
          <p>Results of the mono- and bilingual runs just described were combined using unweighted combSUM. We also
translated topics using another Russian on-line translator. However, the resulting translations were identical those
provided by WorldLingo. We submitted a fifth multilingual run, UAmsC04EnMuAll2, including English to
Russian results using both translations. This run scored inferior due to the overweighting of the Russian documents.</p>
          <p>Table 4 provides an overview of the runs that we submitted for the multilingual task. The fourth column in
Table 4 indicates the document sets used.</p>
          <p>Run
UAmsC04EnMu4Gr
UAmsC04EnMuWSLM
UAmsC04EnMu3Way
UAmsC04EnMuAll</p>
        </sec>
        <sec id="sec-4-2-4">
          <title>Topics</title>
          <p>EN
EN
EN
EN</p>
        </sec>
        <sec id="sec-4-2-5">
          <title>Documents</title>
          <p>EN, FI, FR, RU
EN, FI, FR, RU
EN, FI, FR, RU
EN, FI, FR, RU</p>
        </sec>
        <sec id="sec-4-2-6">
          <title>Type</title>
          <p>4 × 4-Grams+words
8 × Words LM, Stems LM
12 × Words, Stems, 4-Grams+start/end
Grand combination of 7 runs per language
This section summarizes the results of our CLEF 2004 submissions.
4.1</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Monolingual Results</title>
      <p>Finally, Table 7 lists the MAP scores for our official runs. For these, the grand combination of all base runs always
outperforms the combination of a single (non)stemmed run and a single n-grammed run. When comparing with
the best scoring base runs in Tables 5, we see that there is only a substantial improvement for Russian. There is
a moderate improvement for French and Portuguese. The best Finnish n-gram run even outperforms the grand
combination.</p>
      <sec id="sec-5-1">
        <title>4-Grams+words;(Split+)stem</title>
      </sec>
      <sec id="sec-5-2">
        <title>4-Grams+start/end;(Split+)words</title>
      </sec>
      <sec id="sec-5-3">
        <title>All base runs</title>
      </sec>
      <sec id="sec-5-4">
        <title>Finnish</title>
        <p>0.4787
0.5007
0.5203</p>
      </sec>
      <sec id="sec-5-5">
        <title>French</title>
        <p>0.4410
0.4092
0.4499</p>
      </sec>
      <sec id="sec-5-6">
        <title>Portuguese</title>
        <p>0.4110
0.4180
0.4326</p>
      </sec>
      <sec id="sec-5-7">
        <title>Russian</title>
        <p>0.4227
0.4058
0.4412
In this paper we documented the University of Amsterdam’s participation in the CLEF 2004 ad hoc retrieval tasks:
monolingual, bilingual, and multilingual retrieval. For the monolingual task, we conducted experiments on the
effectiveness of morphological normalization approaches and combination methods. Our results shed further light
on the effectiveness of language-dependent and language-independent approached to morphological normalization.
As to the bilingual task, we experimented with bilingual retrieval in a resource-poor language, Amharic, and
examined the relative effectiveness of different translation resources and of query versus collection translation. Our
results indicate interesting differences between the bilingual approaches. The effectiveness of combining different
translation methods was highlighted by the fact that the best bilingual score outperformed the best monolingual
score. Finally, for the multilingual task, we experimented with straightforward query translations and combination
methods, and showed the effectiveness of combining a wide range of base runs.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>We want to thank Valentin Jijkoun for help with the Russian collection. Sisay Fissaha Adafre was supported by the
Netherlands Organization for Scientific Research (NWO) under project number 220-80-001. Jaap Kamps was
supported by a grant from NWO under project number 612.066.302. Maarten de Rijke was supported by grants from
NWO, under project numbers 612-13-001, 365-20-005, 612.069.006, 612.000.106, 220-80-001, 612.000.207,
612.066.302, and 264-70-050.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Buckley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singhal</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Mitra</surname>
          </string-name>
          .
          <article-title>New retrieval approaches using SMART: TREC 4</article-title>
          . In D.K. Harman, editor,
          <source>The Fourth Text REtrieval Conference (TREC-4)</source>
          , pages
          <fpage>25</fpage>
          -
          <lpage>48</lpage>
          .
          <article-title>National Institute for Standards and Technology</article-title>
          .
          <source>NIST Special Publication 500-236</source>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.A.</given-names>
            <surname>Fox</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.A.</given-names>
            <surname>Shaw</surname>
          </string-name>
          .
          <article-title>Combination of multiple searches</article-title>
          . In D.K. Harman, editor,
          <source>The Second Text REtrieval Conference (TREC-2)</source>
          , pages
          <fpage>243</fpage>
          -
          <lpage>252</lpage>
          .
          <article-title>National Institute for Standards and Technology</article-title>
          .
          <source>NIST Special Publication 500-215</source>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hiemstra</surname>
          </string-name>
          .
          <article-title>Using Language Models for Information Retrieval</article-title>
          .
          <source>PhD thesis</source>
          , Center for Telematics and Information Technology, University of Twente,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V.</given-names>
            <surname>Hollink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Monz</surname>
          </string-name>
          , and M. de Rijke.
          <article-title>Monolingual document retrieval for European languages</article-title>
          .
          <source>Information Retrieval</source>
          ,
          <volume>7</volume>
          (
          <issue>1</issue>
          ):
          <fpage>33</fpage>
          -
          <lpage>52</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Monz</surname>
          </string-name>
          , and M. de Rijke.
          <article-title>Combining evidence for cross-language information retrieval</article-title>
          . In C. Peters,
          <string-name>
            <given-names>M.</given-names>
            <surname>Braschler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          , and M. Kluck, editors,
          <source>Advances in Cross-Language Information Retrieval</source>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2002</year>
          , volume
          <volume>2785</volume>
          <source>of LNCS</source>
          . Springer,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Monz</surname>
          </string-name>
          , M. de Rijke, and
          <string-name>
            <given-names>B.</given-names>
            <surname>Sigurbjo</surname>
          </string-name>
          <article-title>¨rnsson. Language-dependent and language-independent approaches to cross-lingual text retrieval</article-title>
          . In C. Peters,
          <string-name>
            <given-names>M.</given-names>
            <surname>Braschler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          , and M. Kluck, editors,
          <source>Cross-Language Information Retrieval, CLEF 2003, Lecture Notes in Computer Science</source>
          . Springer,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Koehn</surname>
          </string-name>
          .
          <source>European parliament proceedings parallel corpus 1996-2003</source>
          ,
          <year>2004</year>
          . http://people.csail.mit. edu/people/koehn/publications/europarl/.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.H.</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>Combining multiple evidence from different properties of weighting schemes</article-title>
          . In E.A.
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Ingwersen</surname>
          </string-name>
          , and R. Fidel, editors,
          <source>Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , pages
          <fpage>180</fpage>
          -
          <lpage>188</lpage>
          . ACM Press, New York NY, USA,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Mediascape</surname>
          </string-name>
          .
          <article-title>English-Finnish-English on-line dictionary</article-title>
          ,
          <year>2004</year>
          . http://efe.scape.net/.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <article-title>Neuchaˆtel</article-title>
          . CLEF resources at the University of Neuchaˆtel,
          <year>2004</year>
          . http://www.unine.ch/info/clef.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Monz and M. de Rijke</surname>
          </string-name>
          .
          <article-title>Shallow morphological analysis in monolingual information retrieval for Dutch, German and Italian</article-title>
          . In C. Peters,
          <string-name>
            <given-names>M.</given-names>
            <surname>Braschler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          , and M. Kluck, editors,
          <source>Evaluation of CrossLanguage Information Retrieval Systems</source>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2001</year>
          , volume
          <volume>2406</volume>
          <source>of LNCS</source>
          , pages
          <fpage>262</fpage>
          -
          <lpage>277</lpage>
          . Springer,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nega</surname>
          </string-name>
          .
          <article-title>Development of Stemming Algorithm for Amharic Text Retrieval</article-title>
          .
          <source>PhD thesis</source>
          , University of Sheffield,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.J.</given-names>
            <surname>Rocchio</surname>
          </string-name>
          , Jr.
          <article-title>Relevance feedback in information retrieval</article-title>
          . In G. Salton, editor,
          <source>The SMART Retrieval System: Experiments in Automatic Document Processing</source>
          , Prentice-Hall Series in Automatic Computation, chapter
          <volume>14</volume>
          , pages
          <fpage>313</fpage>
          -
          <lpage>323</lpage>
          . Prentice-Hall, Englewood Cliffs NJ,
          <year>1971</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Snowball</surname>
          </string-name>
          .
          <article-title>Stemming algorithms for use in information retrieval</article-title>
          ,
          <year>2004</year>
          . http://www.snowball.tartarus. org/.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>European</given-names>
            <surname>Union</surname>
          </string-name>
          .
          <source>Official Journal of the European Union</source>
          ,
          <year>2004</year>
          . http://europa.eu.
          <source>int/eur-lex/ .</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Worldlingo</surname>
          </string-name>
          . Online translator,
          <year>2004</year>
          . http://www.worldlingo.com/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>