<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Two Stages Refinement of Query Translation for Pivot Language Approach to Cross Lingual Information Retrieval: A Trial at CLEF 2003</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kazuaki KISHIDA</string-name>
          <email>kishida@surugadai.ac.jp</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Noriko KANDO</string-name>
          <email>kando@nii.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Institute of Informatics</institution>
          ,
          <country country="JP">JAPAN</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Surugadai University / National Institute of</institution>
          ,
          <addr-line>Informatics</addr-line>
          ,
          <country country="JP">JAPAN</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper reports experimental results of cross-lingual information retrieval from German to Italian. The authors are concerned with CLIR in the case that available language resources are very limited. Thus transitive translation of queries using English as a pivot language was used to search Italian document collections for German queries without any direct bilingual dictionary or MT system of these two languages. In order to remove irrelevant translations produced by the transitive translation, we propose a disambiguation technique, in which two stages of refinement of query translation are executed. Basically, this refinement is based on the idea of pseudo relevance feedback. In the first stage, for each source query term, we select a translation candidate that appears most frequently in the set of top-ranked documents searched for a set of terms provided via transitive translation of the source query. Next, in the second stage, a standard query expansion based on pseudo relevance feedback is conducted. Our experiment result showed that the two stages refinement method is able to improve significantly search performance of bilingual IR using a pivot language. However, it also turned out that performance of this method is inferior to that of machine translation method.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <sec id="sec-1-1">
        <title>1.1 Purpose</title>
        <p>This paper aims at reporting our experiment of
cross language IR (CLIR) from German to Italian at
CLEF 2003. Our fundamental interest is CLIR
between languages with very limited translation
resource, and then we attempt to explore a new
approach for transitive query translation using English
as a pivot. It is because translation resource between
English and each language is often easily obtained,
and it is true not only in European environment but
also in East Asian environment. Although East Asian
languages are completely different from English,
pivot language approach using English is very
important because of limited availability of resources
for direct translation among them.</p>
        <p>Thus the basic premise we made for the
experiment in CLEF 2003 is that only very limited language
resources are available for executing CLIR runs. For
example, it was supposed that there is
- no bilingual dictionary,
- no machine translation (MT) system, and
- no parallel corpus,
between German and Italian directly, and
- no corpus written in the language of query (i.e.,
no German corpus).</p>
        <p>We decided to employ only two relatively small
dictionaries of German to English (G to E) and English
to Italian (E to I), which are easy to be available
through the Internet. As mentioned above, our
research purpose is fundamentally to develop an
effective method for enhancing performance of CLIR in
the situation that language resource is very poor.</p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2 Basic idea</title>
        <p>
          According to this presupposition, our method for
CLIR is to be characterized as
- dictionary-based approach (query translation),
- pivot language approach (English is a pivot).
As well-known, it is possible that some extraneous or
irrelevant translations are unjustly produced by the
dictionary-based approach [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Particularly, in the
case of pivot language approach, consecutive two
steps of translation (e.g., German to English and
English to Italian) yield often much more extraneous
translation candidates because of double
replacements of each word. Therefore, a term
disambiguation technique is dispensable.
        </p>
        <p>In the presupposition of our study, the resource to
be used for translation disambiguation is only the
target document collection (i.e., Italian document sets
included in the CLEF test collection). We will
propose a disambiguation technique in which pseudo
relevance feedback is repeated for refining query
translations. The basic procedure is as follows:
- Initial search: the target document collection is
searched for all translation candidates produced
by dictionary-based replacements via pivot
language.
- First feedback (disambiguation stage): the set of
translation candidates are reduced by using term
occurrence statistics within a set of some
top-ranked documents obtained by the initial
search.
- Second search: the target document collection is
searched for the reduced set of translations.
- Second feedback (query expansion stage): the
reduced set of translations is expanded by using
a standard pseudo relevance feedback
technique.
- Final search: the target document collection is
searched for the extended set of search terms.
The two stages of refinement of translation
candidates would enable us to obtain better performance of
CLIR in the situation that available language resource
is poor. The purpose of this paper is to verify
experimentally effectiveness of the two stages refinement
technique using the test collection of CLEF 2003.</p>
        <p>This paper is organized as follows. In the section
2, we will review some previous works on translation
disambiguation techniques and pivot language
approach. In the section 3, the technique of two stages
refinement of query translation will be introduced.
The section 4 will describe our system used in the
experiment of CLEF 2003. In the section 5, the result
will be reported.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2 Previous works</title>
      <sec id="sec-2-1">
        <title>2.1 Translation disambiguation techniques</title>
        <p>
          In the CLIR field, various ideas or techniques for
translation disambiguation have been proposed.
Among them, some researchers have explored
methods of employing the target document collection for
identifying extraneous or irrelevant translations. The
typical approach is to use co-occurrence statistics of
translation candidates according to an assumption that
“the correct translations of query terms should
co-occur in target language documents and incorrect
translation should tend not to co-occur” (Ballestellos
and Croft [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]). Many works have been attempted
basically in line with the idea [
          <xref ref-type="bibr" rid="ref3 ref4 ref5 ref6 ref7 ref8 ref9">3-9</xref>
          ].
        </p>
        <p>The fundamental procedure is as follows:
- Computing similarity degrees for all pairs of
translation candidates based on co-occurrence
frequencies in the target document collection,
- Selecting ‘correct’ pairs of translations
according to the similarity degrees.</p>
        <p>
          One of the difficulties for implementing the
procedure is that computational complexity in selecting
correct translations is increasing as the number of
translations becomes large. For alleviating the
problem, Gao, et al.[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] have proposed an approximate
algorithm for choosing optimal translations.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2 Pivot language approach</title>
        <p>So many languages are spoken in the world, while
the bilingual resources are limited. There is no
guarantee that useful resources are always available for
the combination of two languages that we need in the
real situation. For example, it may be difficult to find
bilingual resources in machine-readable form
between Dutch and Japanese. One of the solutions is to
employ English as an intermediate (pivot) language,
since English is an international language and it is
reasonably expected that bilingual dictionaries or MT
systems with English are prepared for many
languages.</p>
        <p>
          The basic approach is transitive translation of
query by using two bilingual resources (see
Ballesteros [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]). If two MT systems or two bilingual
dictionaries of Dutch to English and English to Japanese
are available, we can translate Dutch query into
Japanese without any direct Dutch-Japanese
dictionary. This approach has already been attempted by
some researchers [
          <xref ref-type="bibr" rid="ref11 ref12 ref13 ref14 ref15">11-15</xref>
          ].
        </p>
        <p>In the case of using successively two bilingual
dictionaries for query translation, it is crucial to solve
translation ambiguity because possibly so many
extraneous or irrelevant search terms are generated by
the two steps of translation. Suppose that an English
term obtained from a bilingual dictionary of from the
source language to English was irrelevant translation.
Inevitably, all terms listed under the English term in
the bilingual dictionary from English to the target
language would be also irrelevant. Therefore, much
more extraneous translations are to be generated in
pivot language approach than in standard single-step
translation process.</p>
        <p>
          To the disambiguation, Ballesteros [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] has
attempted to apply co-occurrence frequency-based
method, query expansion and so on. Meanwhile,
Gollins and Sanderson [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] proposed a technique of
“lexical triangulation”, in which two pivot languages
are used independently and removal of error
translation is tried by taking only translations in common
from two ways of transitive translation using two
pivot languages.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 Two Stages Refinement of Translations</title>
      <sec id="sec-3-1">
        <title>3.1 Translation disambiguation stage</title>
        <p>
          Translation disambiguation technique based on
term co-occurrence statistics may be useful in the
situation that our study is presupposing, since the
technique makes use of only the target document
collection as source of disambiguation. However, as
already mentioned, the computational complex is fairly
high. Also, it should be noted that term co-occurrence
frequencies can be considered as macro-level
statistics on the entire document collection. This means
that the disambiguation based on the statistics may
lead to false combination of translation candidates
(see Yamabana et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]). Even if two terms A and B
are statistically associated in general (i.e., in the
entire collection), the association is not always valid in
a given query.
        </p>
        <p>Therefore, in the study, the authors decided to use
an alternative disambiguation technique, which is not
based on term co-occurrence statistics. First, we
define some mathematical notations such that
s j : terms in the source query ( j = 1,2,..., m ),
Tj : a set of translations in the pivot language for
the j-th term s j ,
Tj′ : a set of translations in the target language for
all terms included in the set Tj .</p>
        <p>By transitive translation process using two
bilingual dictionaries, it is easy to obtain a set of
translated query terms in the target language with no
disambiguation,</p>
        <p>T = T1′ ∪ T2′ ∪ ... ∪ Tm′ . (1)
The procedure of disambiguation we propose is to
search the target document collection for the set of
terms T , and then to select the most frequently
appearing term in the top-ranked documents, from each
set of T j′ respectively (see Figure 1). The basic
assumption is that ‘correct’ combination of each
translation from distinct original search terms tends to
occur together in a single document in the target
collection. If so, such documents are expected to be
ranked higher in the result of search for the set T .</p>
        <sec id="sec-3-1-1">
          <title>Source terms</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>Translation into pivot language</title>
        </sec>
        <sec id="sec-3-1-3">
          <title>Translation into target language</title>
        </sec>
        <sec id="sec-3-1-4">
          <title>Selected terms</title>
          <p>Suppose that we have three sets of translations in
the target language as follows:</p>
          <p>T1′ : term A, term B, term C,
T2′ : term D, term E,</p>
          <p>T3′ : term F, term G.</p>
          <p>Also, it is assumed that a combination of term A, D
and F is correct and the other terms are irrelevant. In
such situation, we can expect reasonably that the
irrelevant terms do not appear together in each
document because the probability that such irrelevant
terms have relations each other is low. Meanwhile,
the ‘correct’ combination of term A, D and F would
tend to appear in documents more than any
combinations including irrelevant translation. Therefore, the
documents containing the ‘correct’ combination
possibly have higher score for ranking.</p>
          <p>For detecting such combination from the result of
the initial search for the set T , it would be enough
that we use document frequency of each translation in
the set of top-ranked documents. That is, we can
choose a term ~tj for each T j′ ( j = 1,2,..., m ) such
that
~tj = arg max rt , t ∈ T j′
(2)
where rt is the number of top-ranked documents
including the term t . Finally, we obtain that a set of
m translations through the disambiguation process
T~ = {~t1, ~t2 ,..., ~tm}. (3)
Ideally, we should make use of co-occurrence
frequencies of all combinations of translation
candidates in the set of top-ranked documents. However,
the computational cost is expected to be fairly high
since we need to compile the statistics dynamically
for each search run. A solution for avoiding the
complexity is to count only simple frequencies
instead of co-occurrence. That is, if the ‘correct’
combination of translations often appears, naturally the
simple frequency of each translation would also
become high. Equation (2) is based on this
hypothesis.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Query expansion stage</title>
        <p>In the previous stage, translation ambiguity was
resolved, and final m search terms in the target
language remain. We can consider the stage as a process
for improving precision of search. In next stage,
enhancement of recall should be attempted since some
synonyms or related terms would have been removed
in the previous stage.</p>
        <p>
          According to Ballestellos and Croft[
          <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
          ], we
execute a standard post-translation query expansion
using a pseudo relevance feedback (PRF) technique, in
which new terms to be added to the query is selected
based on its weight wt , calculated by a formula of
standard probabilistic model,
        </p>
        <p>(r + 0.5)(N − R − nt + rt + 0.5) , (4)
wt = rt × log t</p>
        <p>(N − nt + 0.5)(R − rt + 0.5)
where N is the total number of documents, R is
the number of relevant documents, and nt is the
number of documents including term t . It should be
noted that, in PRF, the set of relevant documents is
assumed to be the set of some top-ranked documents
by the initial search. Therefore, rt is defined as the
same before (see Equation (2)). We denote the
ex~
panded term set by the method as T ′ .</p>
        <p>Original query</p>
        <p>Translation
Translation candidates</p>
        <p>Disambiguation
Selected terms</p>
        <p>Expansion
Final set of search terms</p>
        <p>dictionaries</p>
        <sec id="sec-3-2-1">
          <title>Target document collection</title>
          <p>To sum up, the method for refining the result of
query translation we propose consists of two stages:
(a) translation disambiguation and (b) post-translation
query expansion. The detailed procedure is as follows
(see also Figure 2):
- Obtaining a set of translations T (see
Equation (1)) by transitive translation,
- Searching the target document collection for the
set T (i.e., initial search),
- Selecting a single translation from each T j′
respectively, according to the document
frequency in the top-ranked documents by the
ini~
tial search, and obtaining a new set T (see
Equation (3)) (i.e., disambiguation),
- Searching the target document collection for the
~
set T (i.e., second search),
- Adding terms according to the weight shown as</p>
          <p>Equation (4) (i.e., query expansion),
- Searching finally the target document collection
~
for the expanded set of terms T ′ (i.e., third
search).</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. System Description</title>
      <sec id="sec-4-1">
        <title>4.1 Purpose of the system</title>
        <p>The system enables us to search an Italian
document collection for a German query automatically, i.e.,
it is an automatic CLIR system from German to
Italian.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2 Text processing</title>
        <p>Both of German and Italian texts (in documents
and queries) were basically processed by the
following steps: (1) identifying tokens, (2) removing
stopwords, (3) lemmatization, (4) stemming. In addition,
for German text, decomposition of compound words
was attempted based on an algorithm of longest
matching with headwords included in the German to
English dictionary in machine readable form.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3 Language resources</title>
        <p>
          We downloaded free dictionaries (German to
English and English to Italian) from the Internet1.
Also, stemmers and stopword lists for German and
Italian were available through the Snowball project2.
Stemming for English was conducted by the original
Porter’s algorithm [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
        <p>Furthermore, in order to evaluate performance of
our two stages refinement method comparatively, we
decided to use commercial MT software produced by
a Japanese company.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4 Transitive translation procedure</title>
        <p>1 http://www.freelang.net/
2 http://snowball.tartarus.org/</p>
        <p>Before executing transitive translation by two
bilingual dictionaries, all terms included in the
bilingual dictionaries were normalized through stemming
and lemmatization steps with the same procedure
applied to texts of documents and queries. Actual
translation process is a simple replacement, i.e., each
normalized German term in a query was replaced
with a set of corresponding normalized English words,
and similarly, each English word was replaced with
the corresponding Italian words. As a result, for each
query, a set of normalized Italian words, i.e., T in
Equation (1), was obtained. If no corresponding
headword was included in the dictionaries
(German-English or English-Italian), the unknown word
was straightforwardly sent to the next step without
any change.</p>
        <p>Next, refinement of the set T through two
stages described in the previous section was executed.
The number of top-ranked documents was set to 100
in both stages, and in the query expansion stage,
top-ranked 30 terms in the decreasing order of term
weights (Equation (4)) were added.</p>
        <p>If the top-ranked term is already included in the
~
set of search terms, T , term frequency in the query
is changed into 1.5 × yt . If not, the term frequency is
set to 0.5 (i.e., yt = 0.5 ).</p>
        <p>On the other hand, in the case of using MT
software, first of all, the original German query was input
to the software. The software we used is
automatically executing German to English translation and
then English to Italian translation (i.e., a kind of
transitive translation). The resulting Italian text from the
MT system was processed according to the procedure
described in the section 4.2, and finally, a set of
normalized Italian words was obtained for each query. In
the case of MT translation, only post-translation
query expansion was executed with the same
procedure and parameters in the case of dictionary-based
translation.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5 Search algorithm</title>
        <p>
          The well-known Okapi formula [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] was used for
computing each document score in all searches of this
study, i.e.,
        </p>
        <p> 3.0xt ×
z = ∑t∈Ω  (0.5 + 1.5l / l ) + xt
yt × log N n−t n+t 0+.50.5  ,
where z is the score of a particular document, xt
is the frequency of occurrence of term t in the
document, l is the document length, l is an
average of the document length over the entire document
collection, and yt is the frequency of occurrence of
term t in the query. It should be noted that the value
of each yt is always fixed at the frequency of the
corresponding original search terms in the source
query. Also, Ω is a set of query term, i.e., in the
~
first search Ω = T , in the second search Ω = T ,
~
and in the third search Ω = T ′ . Finally, the
documents were ranked in the decreasing order of the
values of z .</p>
      </sec>
      <sec id="sec-4-6">
        <title>4.6 Type of runs executed</title>
        <p>In CLEF 2003, we executed three runs (see Table 1),
in which only &lt;DESCRIPTION&gt; filed in each query
was used.</p>
        <p>ID</p>
      </sec>
      <sec id="sec-4-7">
        <title>5.1 Basic statistics</title>
        <p>The Italian collections include 157,558
documents in total. The average document length is
181.86.</p>
      </sec>
      <sec id="sec-4-8">
        <title>5.2 System error</title>
        <p>Unfortunately, a non-trivial system error was
detected after submission of results, i.e., by a bug in our
source code, only a last term within the set of search
terms has contributed to the calculation of document
scores. Inevitably, search performance of all runs
shown in Table 1 was very low.</p>
      </sec>
      <sec id="sec-4-9">
        <title>5.3 Results of runs conducted after submission</title>
        <p>Therefore, the authors have corrected the source
code and attempted to perform again some search
runs after submission of results to the organizers of
CLEF. Six types of run were conducted as shown in
Table 2, which also indicates each value of mean
average precision calculated by using the relevance
judgment file. Furthermore, recall-precision curves of
the six runs are presented as Figure 3. It should be
noted that each value in represented in Table 2 and
Figure 3 was calculated for 51 topics to which one or
more relevant documents are included in the Italian</p>
        <p>MT
Dictionary</p>
        <p>As shown in Table 2, MT outperforms
dictionary-based translation significantly. Also, it turns out
that the disambiguation technique based on term
frequency moderately improves effectiveness of
dictionary-based translation method, i.e., the mean
average precision with disambiguation is .207 in
comparison with .190 in the case of no disambiguation.
Especially, Table 2 indicates that our technique of two
stages refinement has a large effect on enhancement
of search performance since the mean average
preci0.6
0.5
0.4
n
o
i
isc0.3
e
r
P
0.2
0.1
0
0.0
sion of search with no disambiguation and no
expansion by PRF is only .143, which is significantly lower
than .207 in the case of searches through the two
stages refinement.</p>
        <p>However, we can also point out that there is a
large difference of performance between MT and the
two stage refinement. The reason may be attributed to
difference of quality and coverage between the
commercial MT software and free dictionaries
downloaded from the Internet. Even if it is true, we
need to modify the two stages refinement method so
that its performance level is approaching to that of
MT system.</p>
        <p>For example, in Figure 3, at the levels of recall
over 0.7, searches with no disambiguation is
reversely superior to those with disambiguation. This
may be due to that our disambiguation method selects
only one translation and consequently may remove
some useful synonyms or related terms. A simple
solution is possibly to choose two or more
translations instead of using directly Equation (2). Although
it is difficult to determine the optimal number of
translations to be selected, multiple translations for
MT with Expansion
MT with no Expansion
Dic. with Disambiguation and Expansion
Dic. with Disambiguation and no Expansion
Dic. with Expansion and no Disambiguation
Dic. with no Expansion and no Disambiguation
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
0.5</p>
        <p>Recall
each source term may improve recall of searches.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Concluding Remarks</title>
      <p>This paper reported results of our experiment on
CLIR from German to Italian, in which English was
used as a pivot language. In particular, two stages
refinement of query translation was employed for
removing irrelevant terms in the target language
produced by transitive translation using successively two
bilingual dictionaries.</p>
      <p>As a result, it turned out that
- our two stages refinement method improves
significantly retrieval performance of bilingual IR
using a pivot language, and
- the performance is inferior to that by MT-base
searches.</p>
      <p>By choosing two or more search terms in the
disambiguation stage, it is possible that our method
becomes more effective.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Ballesteros</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Croft</surname>
            ,
            <given-names>W. B.</given-names>
          </string-name>
          (
          <year>1997</year>
          ).
          <article-title>Phrasal translation and query expansion techniques for cross-language information retrieval</article-title>
          .
          <source>In Proceedings of the 20st ACM SIGIR conference on Research and Development in Information Retrieval</source>
          . (pp.
          <fpage>84</fpage>
          -
          <lpage>91</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Ballesteros</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Croft</surname>
            ,
            <given-names>W.B.</given-names>
          </string-name>
          (
          <year>1998</year>
          ).
          <article-title>Resolving ambiguity for cross-language retrieval</article-title>
          .
          <source>In Proceedings of the 21st ACM SIGIR conference on Research and Development in Information Retrieval</source>
          (pp.
          <fpage>64</fpage>
          -
          <lpage>71</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Yamabana</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muraki</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Kamei</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>1998</year>
          ).
          <article-title>A language conversion front-end for cross-language information retrieval</article-title>
          . In G. Grefenstette (ed.)
          <article-title>Cross-language Information retrieval</article-title>
          (pp.
          <fpage>93</fpage>
          -
          <lpage>104</lpage>
          ). Boston, MA: Kluwer.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nie</surname>
            ,
            <given-names>J. Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xun</surname>
            ,
            <given-names>E X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2001b</year>
          ).
          <article-title>Improving query translation for cross-language information retrieval using statistical models</article-title>
          .
          <source>In Proceedings of 24th ACM SIGIR conference on Research and Development in Information Retrieval</source>
          (pp.
          <fpage>96</fpage>
          -
          <lpage>104</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>W. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bian</surname>
            ,
            <given-names>G. W.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>H. H.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>Description of the NTU Japanese-English cross-lingual information retrieval system used for NTCIR workshop</article-title>
          .
          <source>In Proceedings of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition</source>
          . Tokyo: National Institute of Informatics. http://research.nii.ac.jp/ntcir/workshop/
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Sadat</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maeda</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Yoshikawa</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Uemura</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>Query expansion techniques for the CLEF Bilingual track</article-title>
          . In C. Peters et al. (
          <source>Eds.) Evaluation of Cross-Language Information Retrieval Systems: LNCS 2406</source>
          (pp.
          <fpage>177</fpage>
          -
          <lpage>184</lpage>
          ) Berlin: Springer.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Adriani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>English-Dutch CLIR using query translation techniques</article-title>
          . In C. Peters et al. (
          <source>Eds.) Evaluation of Cross-Language Information Retrieval Systems: LNCS 2406</source>
          (pp.
          <fpage>219</fpage>
          -
          <lpage>225</lpage>
          ) Berlin: Springer.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Qu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grefenstette</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Evans</surname>
            ,
            <given-names>D. A.</given-names>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>Resolving translation ambiguity using monolingual corpora: a report on Clairvoyance CLEF-2002 experiments</article-title>
          . In Working Notes for the CLEF-2002
          <source>Workshop</source>
          (pp.
          <fpage>115</fpage>
          -
          <lpage>126</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Seo</surname>
            ,
            <given-names>H. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S. B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>B. I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rim</surname>
            ,
            <given-names>H. C.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S. Z.</given-names>
          </string-name>
          (
          <year>2003</year>
          ).
          <article-title>KUNLP system for NTCIR-3 English-Korean cross-language information retrieval</article-title>
          .
          <source>In Proceedings of the Third NTCIR Workshop on research in information Retrieval</source>
          ,
          <source>Automatic Text Summarization and Question Answering</source>
          . Tokyo, National Institute of Informatics. ttp://research.nii.ac.jp/ntcir/workshop/
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Ballesteros</surname>
            ,
            <given-names>L. A.</given-names>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>Cross-language retrieval via transitive translation</article-title>
          . In W.B.
          <string-name>
            <surname>Croft</surname>
          </string-name>
          (Ed.)
          <article-title>Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval</article-title>
          (pp.
          <fpage>203</fpage>
          -
          <lpage>234</lpage>
          ). Boston, MA: Kluwer.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Frantz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCarley</surname>
            ,
            <given-names>J. S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Roukos</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>Ad hoc and multilingual information retrieval at IBM</article-title>
          .
          <source>In Proceedings of the TREC-7</source>
          , Gaithersburg, MD:
          <article-title>National Institute of Standards and Technology</article-title>
          . http://trec.nist.gov/pubs/
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Gey</surname>
            ,
            <given-names>F. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Larson</surname>
            ,
            <given-names>R. R.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>Manual queries and machine translation in cross-language retrieval at TREC-7</article-title>
          .
          <source>In Proceedings of the TREC-7</source>
          , Gaithersburg: MD,
          <article-title>National Institute of Standards and Technology</article-title>
          . http://trec.nist.gov/pubs/
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Hiemstra</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Kraaij</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>Twenty-one at TREC-7: ad-hoc and cross-language track</article-title>
          .
          <source>In Proceedings of the TREC-7</source>
          , Gaithersburg, MD:
          <article-title>National Institute of Standards and Technology</article-title>
          . http://trec.nist.gov/pubs/
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Gey</surname>
            ,
            <given-names>F. C.</given-names>
          </string-name>
          (
          <year>2003</year>
          ).
          <article-title>Experiments on cross-language and patent retrieval at NTCIR-3 workshop</article-title>
          .
          <source>In Proceedings of the Third NTCIR Workshop on research in information Retrieval</source>
          ,
          <source>Automatic Text Summarization and Question Answering</source>
          . Tokyo, National Institute of Informatics. http://research.nii.ac.jp/ntcir/workshop/
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>W. C.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>H.H.</given-names>
          </string-name>
          (
          <year>2003</year>
          ).
          <article-title>Description of NTU approach to Multilingual Information retrieval</article-title>
          .
          <source>In Proceedings of the Third NTCIR Workshop on research in information Retrieval</source>
          ,
          <source>Automatic Text Summarization and Question Answering</source>
          . Tokyo, National Institute of Informatics http://research.nii.ac.jp/ntcir/workshop/
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Gollins</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Sanderson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2001</year>
          ).
          <article-title>Improving cross language information retrieval with triangulated translation</article-title>
          .
          <source>In Proceedings of the 24th ACM SIGIR conference on Research and Development in Information Retrieval</source>
          (pp.
          <fpage>90</fpage>
          -
          <lpage>95</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Porter</surname>
            ,
            <given-names>M.F.</given-names>
          </string-name>
          (
          <year>1980</year>
          ).
          <article-title>An algorithm for suffix stripping</article-title>
          .
          <source>Program</source>
          ,
          <volume>14</volume>
          (
          <issue>3</issue>
          ), pp.
          <fpage>130</fpage>
          -
          <lpage>137</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Roberson</surname>
            ,
            <given-names>S. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hancock-Beaulieu</surname>
            ,
            <given-names>M. M.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Gatford</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>1995</year>
          ).
          <article-title>Okapi at TREC-3</article-title>
          .
          <source>In Proceedings of TREC-3</source>
          , Gaithersburg: MD,
          <article-title>National Institute of Standards and Technology</article-title>
          . http://trec.nist.gov/pubs/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>