=Paper= {{Paper |id=Vol-1169/CLEF2003wn-adhoc-KishidaEt2003 |storemode=property |title=Two Stages Refinement of Query Translation for Pivot Language Approach to Cross Lingual Information Retrieval: A Trial at CLEF 2003 |pdfUrl=https://ceur-ws.org/Vol-1169/CLEF2003wn-adhoc-KishidaEt2003.pdf |volume=Vol-1169 |dblpUrl=https://dblp.org/rec/conf/clef/KishidaK03a }} ==Two Stages Refinement of Query Translation for Pivot Language Approach to Cross Lingual Information Retrieval: A Trial at CLEF 2003== https://ceur-ws.org/Vol-1169/CLEF2003wn-adhoc-KishidaEt2003.pdf
   Two Stages Refinement of Query Translation for Pivot Language Approach
         to Cross Lingual Information Retrieval: A Trial at CLEF 2003
                    Kazuaki KISHIDA                                    Noriko KANDO
         Surugadai University / National Institute of           National Institute of Informatics,
                   Informatics, JAPAN                                        JAPAN
                 kishida@surugadai.ac.jp                               kando@nii.ac.jp

                                                     Abstract
       This paper reports experimental results of cross-lingual information retrieval from German to
       Italian. The authors are concerned with CLIR in the case that available language resources are
       very limited. Thus transitive translation of queries using English as a pivot language was used to
       search Italian document collections for German queries without any direct bilingual dictionary or
       MT system of these two languages. In order to remove irrelevant translations produced by the
       transitive translation, we propose a disambiguation technique, in which two stages of refinement
       of query translation are executed. Basically, this refinement is based on the idea of pseudo rele-
       vance feedback. In the first stage, for each source query term, we select a translation candidate
       that appears most frequently in the set of top-ranked documents searched for a set of terms pro-
       vided via transitive translation of the source query. Next, in the second stage, a standard query
       expansion based on pseudo relevance feedback is conducted. Our experiment result showed that
       the two stages refinement method is able to improve significantly search performance of bilingual
       IR using a pivot language. However, it also turned out that performance of this method is inferior
       to that of machine translation method.



                                                            - no parallel corpus,
1 Introduction
                                                          between German and Italian directly, and
1.1 Purpose                                                 - no corpus written in the language of query (i.e.,
    This paper aims at reporting our experiment of              no German corpus).
cross language IR (CLIR) from German to Italian at        We decided to employ only two relatively small dic-
CLEF 2003. Our fundamental interest is CLIR be-           tionaries of German to English (G to E) and English
tween languages with very limited translation re-         to Italian (E to I), which are easy to be available
source, and then we attempt to explore a new ap-          through the Internet. As mentioned above, our re-
proach for transitive query translation using English     search purpose is fundamentally to develop an effec-
as a pivot. It is because translation resource between    tive method for enhancing performance of CLIR in
English and each language is often easily obtained,       the situation that language resource is very poor.
and it is true not only in European environment but       1.2 Basic idea
also in East Asian environment. Although East Asian           According to this presupposition, our method for
languages are completely different from English,          CLIR is to be characterized as
pivot language approach using English is very im-           - dictionary-based approach (query translation),
portant because of limited availability of resources        - pivot language approach (English is a pivot).
for direct translation among them.                        As well-known, it is possible that some extraneous or
    Thus the basic premise we made for the experi-        irrelevant translations are unjustly produced by the
ment in CLEF 2003 is that only very limited language      dictionary-based approach [1]. Particularly, in the
resources are available for executing CLIR runs. For      case of pivot language approach, consecutive two
example, it was supposed that there is                    steps of translation (e.g., German to English and Eng-
 - no bilingual dictionary,                               lish to Italian) yield often much more extraneous
 - no machine translation (MT) system, and                translation candidates because of double replace-
ments of each word. Therefore, a term disambigua-         translation candidates according to an assumption that
tion technique is dispensable.                            “the correct translations of query terms should
    In the presupposition of our study, the resource to   co-occur in target language documents and incorrect
be used for translation disambiguation is only the        translation should tend not to co-occur” (Ballestellos
target document collection (i.e., Italian document sets   and Croft [2]). Many works have been attempted ba-
included in the CLEF test collection). We will pro-       sically in line with the idea [3-9].
pose a disambiguation technique in which pseudo               The fundamental procedure is as follows:
relevance feedback is repeated for refining query           - Computing similarity degrees for all pairs of
translations. The basic procedure is as follows:                translation candidates based on co-occurrence
  - Initial search: the target document collection is           frequencies in the target document collection,
      searched for all translation candidates produced      - Selecting ‘correct’ pairs of translations accord-
      by dictionary-based replacements via pivot                ing to the similarity degrees.
      language.                                           One of the difficulties for implementing the proce-
  - First feedback (disambiguation stage): the set of     dure is that computational complexity in selecting
      translation candidates are reduced by using term    correct translations is increasing as the number of
      occurrence statistics within a set of some          translations becomes large. For alleviating the prob-
      top-ranked documents obtained by the initial        lem, Gao, et al.[4] have proposed an approximate
      search.                                             algorithm for choosing optimal translations.
  - Second search: the target document collection is      2.2 Pivot language approach
      searched for the reduced set of translations.           So many languages are spoken in the world, while
  - Second feedback (query expansion stage): the          the bilingual resources are limited. There is no guar-
      reduced set of translations is expanded by using    antee that useful resources are always available for
      a standard pseudo relevance feedback tech-          the combination of two languages that we need in the
      nique.                                              real situation. For example, it may be difficult to find
  - Final search: the target document collection is       bilingual resources in machine-readable form be-
      searched for the extended set of search terms.      tween Dutch and Japanese. One of the solutions is to
The two stages of refinement of translation candi-        employ English as an intermediate (pivot) language,
dates would enable us to obtain better performance of     since English is an international language and it is
CLIR in the situation that available language resource    reasonably expected that bilingual dictionaries or MT
is poor. The purpose of this paper is to verify experi-   systems with English are prepared for many lan-
mentally effectiveness of the two stages refinement       guages.
technique using the test collection of CLEF 2003.             The basic approach is transitive translation of
    This paper is organized as follows. In the section    query by using two bilingual resources (see Balles-
2, we will review some previous works on translation      teros [10]). If two MT systems or two bilingual dic-
disambiguation techniques and pivot language ap-          tionaries of Dutch to English and English to Japanese
proach. In the section 3, the technique of two stages     are available, we can translate Dutch query into
refinement of query translation will be introduced.       Japanese without any direct Dutch-Japanese diction-
The section 4 will describe our system used in the        ary. This approach has already been attempted by
experiment of CLEF 2003. In the section 5, the result     some researchers [11-15].
will be reported.                                             In the case of using successively two bilingual
                                                          dictionaries for query translation, it is crucial to solve
                                                          translation ambiguity because possibly so many ex-
2 Previous works
                                                          traneous or irrelevant search terms are generated by
2.1 Translation disambiguation techniques                 the two steps of translation. Suppose that an English
    In the CLIR field, various ideas or techniques for    term obtained from a bilingual dictionary of from the
translation disambiguation have been proposed.            source language to English was irrelevant translation.
Among them, some researchers have explored meth-          Inevitably, all terms listed under the English term in
ods of employing the target document collection for       the bilingual dictionary from English to the target
identifying extraneous or irrelevant translations. The    language would be also irrelevant. Therefore, much
typical approach is to use co-occurrence statistics of    more extraneous translations are to be generated in
pivot language approach than in standard single-step        sumption is that ‘correct’ combination of each trans-
translation process.                                        lation from distinct original search terms tends to
                                                            occur together in a single document in the target col-
    To the disambiguation, Ballesteros [10] has at-
                                                            lection. If so, such documents are expected to be
tempted to apply co-occurrence frequency-based              ranked higher in the result of search for the set T .
method, query expansion and so on. Meanwhile, Gol-
lins and Sanderson [16] proposed a technique of                                      Source terms
“lexical triangulation”, in which two pivot languages
are used independently and removal of error transla-                       Translation into pivot language
tion is tried by taking only translations in common
from two ways of transitive translation using two
pivot languages.                                                          Translation into target language


3 Two Stages Refinement of Translations
3.1 Translation disambiguation stage                                                                Selected terms
     Translation disambiguation technique based on
term co-occurrence statistics may be useful in the            Figure 1 Outline of translation disambiguation
situation that our study is presupposing, since the
technique makes use of only the target document col-            Suppose that we have three sets of translations in
lection as source of disambiguation. However, as al-        the target language as follows:
ready mentioned, the computational complex is fairly           T1′ : term A, term B, term C,
high. Also, it should be noted that term co-occurrence         T2′ : term D, term E,
frequencies can be considered as macro-level statis-
                                                              T3′ : term F, term G.
tics on the entire document collection. This means
                                                            Also, it is assumed that a combination of term A, D
that the disambiguation based on the statistics may
lead to false combination of translation candidates         and F is correct and the other terms are irrelevant. In
(see Yamabana et al. [3]). Even if two terms A and B        such situation, we can expect reasonably that the ir-
                                                            relevant terms do not appear together in each docu-
are statistically associated in general (i.e., in the en-
tire collection), the association is not always valid in    ment because the probability that such irrelevant
a given query.                                              terms have relations each other is low. Meanwhile,
                                                            the ‘correct’ combination of term A, D and F would
     Therefore, in the study, the authors decided to use
an alternative disambiguation technique, which is not       tend to appear in documents more than any combina-
based on term co-occurrence statistics. First, we de-       tions including irrelevant translation. Therefore, the
                                                            documents containing the ‘correct’ combination pos-
fine some mathematical notations such that
                                                            sibly have higher score for ranking.
    s j : terms in the source query ( j = 1,2,..., m ),
                                                                For detecting such combination from the result of
  T j : a set of translations in the pivot language for     the initial search for the set T , it would be enough
     the j-th term s j ,                                    that we use document frequency of each translation in
                                                            the set of top-ranked documents. That is, we can
  T j′ : a set of translations in the target language for   choose a term ~  t j for each T j′ ( j = 1,2,..., m ) such
     all terms included in the set T j .                    that
                                                                   ~
   By transitive translation process using two bilin-              t j = arg max rt , t ∈ T j′                       (2)
gual dictionaries, it is easy to obtain a set of trans-     where rt is the number of top-ranked documents
lated query terms in the target language with no dis-       including the term t . Finally, we obtain that a set of
ambiguation,                                                m translations through the disambiguation process
    T = T1′ ∪ T2′ ∪ ... ∪ Tm′ .                     (1)        ~ ~~ ~
                                                               T = {t1 , t2 ,..., tm } .                             (3)
    The procedure of disambiguation we propose is to
search the target document collection for the set of            Ideally, we should make use of co-occurrence
terms T , and then to select the most frequently ap-        frequencies of all combinations of translation can-
pearing term in the top-ranked documents, from each
                                                            didates in the set of top-ranked documents. However,
set of T j′ respectively (see Figure 1). The basic as-
the computational cost is expected to be fairly high     (a) translation disambiguation and (b) post-translation
since we need to compile the statistics dynamically      query expansion. The detailed procedure is as follows
for each search run. A solution for avoiding the         (see also Figure 2):
complexity is to count only simple frequencies in-        - Obtaining a set of translations T (see Equa-
stead of co-occurrence. That is, if the ‘correct’ com-          tion (1)) by transitive translation,
bination of translations often appears, naturally the     - Searching the target document collection for the
simple frequency of each translation would also                 set T (i.e., initial search),
become high. Equation (2) is based on this hypothe-       - Selecting a single translation from each T j′
sis.                                                                respectively, according to the document fre-
3.2 Query expansion stage                                           quency in the top-ranked documents by the ini-
                                                                                                            ~
     In the previous stage, translation ambiguity was               tial search, and obtaining a new set T (see
resolved, and final m search terms in the target lan-               Equation (3)) (i.e., disambiguation),
guage remain. We can consider the stage as a process         -      Searching the target document collection for the
                                                                         ~
for improving precision of search. In next stage, en-               set T (i.e., second search),
hancement of recall should be attempted since some           -      Adding terms according to the weight shown as
synonyms or related terms would have been removed                   Equation (4) (i.e., query expansion),
                                                             -      Searching finally the target document collection
in the previous stage.                                                                                 ~
     According to Ballestellos and Croft[1,2], we exe-              for the expanded set of terms T ′ (i.e., third
cute a standard post-translation query expansion us-                search).
ing a pseudo relevance feedback (PRF) technique, in
which new terms to be added to the query is selected
                                                         4. System Description
based on its weight wt , calculated by a formula of
standard probabilistic model,                            4.1 Purpose of the system
             (rt + 0.5)( N − R − nt + rt + 0.5)               The system enables us to search an Italian docu-
  wt = rt × log                                 , (4)
                                                         ment collection for a German query automatically, i.e.,
                ( N − nt + 0.5)( R − rt + 0.5)
                                                         it is an automatic CLIR system from German to Ital-
where N is the total number of documents, R is
                                                         ian.
the number of relevant documents, and nt is the
                                                         4.2 Text processing
number of documents including term t . It should be
                                                              Both of German and Italian texts (in documents
noted that, in PRF, the set of relevant documents is
assumed to be the set of some top-ranked documents       and queries) were basically processed by the follow-
by the initial search. Therefore, rt is defined as the   ing steps: (1) identifying tokens, (2) removing stop-
same before (see Equation (2)). We denote the ex-        words, (3) lemmatization, (4) stemming. In addition,
                                 ~                       for German text, decomposition of compound words
panded term set by the method as T ′ .
                                                         was attempted based on an algorithm of longest
                  Original query                         matching with headwords included in the German to
                                                         English dictionary in machine readable form.
                   Translation
                                        dictionaries     4.3 Language resources
           Translation candidates                             We downloaded free dictionaries (German to
                                                         English and English to Italian) from the Internet1.
                  Disambiguation                         Also, stemmers and stopword lists for German and
                                                         Italian were available through the Snowball project2.
              Selected terms
                                                         Stemming for English was conducted by the original
                    Expansion         Target document    Porter’s algorithm [17].
                                      collection              Furthermore, in order to evaluate performance of
          Final set of search terms                      our two stages refinement method comparatively, we
                                                         decided to use commercial MT software produced by
                                                         a Japanese company.
  Figure 2 Two stages refinement of translation
                                                         4.4 Transitive translation procedure
   To sum up, the method for refining the result of
                                                         1
query translation we propose consists of two stages:             http://www.freelang.net/
                                                         2
                                                                 http://snowball.tartarus.org/
    Before executing transitive translation by two             is the frequency of occurrence of term t in the
bilingual dictionaries, all terms included in the bilin-       document, l is the document length, l is an aver-
gual dictionaries were normalized through stemming             age of the document length over the entire document
and lemmatization steps with the same procedure                collection, and yt is the frequency of occurrence of
applied to texts of documents and queries. Actual              term t in the query. It should be noted that the value
translation process is a simple replacement, i.e., each        of each yt is always fixed at the frequency of the
normalized German term in a query was replaced                 corresponding original search terms in the source
with a set of corresponding normalized English words,          query. Also, Ω is a set of query term, i.e., in the
                                                                                                                 ~
and similarly, each English word was replaced with             first search Ω = T , in the second search Ω = T ,
                                                                                            ~
the corresponding Italian words. As a result, for each         and in the third search Ω = T ′ . Finally, the docu-
query, a set of normalized Italian words, i.e., T in           ments were ranked in the decreasing order of the val-
                                                               ues of z .
Equation (1), was obtained. If no corresponding
                                                               4.6 Type of runs executed
headword was included in the dictionaries (Ger-
                                                               In CLEF 2003, we executed three runs (see Table 1),
man-English or English-Italian), the unknown word
                                                               in which only  filed in each query
was straightforwardly sent to the next step without
                                                               was used.
any change.
    Next, refinement of the set T through two
                                                                       Table 1 Runs submitted in CLEF 2003
stages described in the previous section was executed.
                                                                  ID        Translation               Refinement
The number of top-ranked documents was set to 100
                                                                              method      Disambigua-        Expansion
in both stages, and in the query expansion stage,
                                                                                          tion
top-ranked 30 terms in the decreasing order of term
                                                               NiiMt01      MT                    -                done
weights (Equation (4)) were added.
    If the top-ranked term is already included in the          NiiDic01     Dictionary           done              done
                       ~                                       NiiDic02     Dictionary            -                done
set of search terms, T , term frequency in the query
is changed into 1.5 × y t . If not, the term frequency is
set to 0.5 (i.e., y t = 0.5 ).                                 5. Results
     On the other hand, in the case of using MT soft-
                                                               5.1 Basic statistics
ware, first of all, the original German query was input
                                                                   The Italian collections include 157,558 docu-
to the software. The software we used is automati-
                                                               ments in total. The average document length is
cally executing German to English translation and
                                                               181.86.
then English to Italian translation (i.e., a kind of tran-
                                                               5.2 System error
sitive translation). The resulting Italian text from the
                                                                   Unfortunately, a non-trivial system error was de-
MT system was processed according to the procedure
                                                               tected after submission of results, i.e., by a bug in our
described in the section 4.2, and finally, a set of nor-
                                                               source code, only a last term within the set of search
malized Italian words was obtained for each query. In
                                                               terms has contributed to the calculation of document
the case of MT translation, only post-translation
                                                               scores. Inevitably, search performance of all runs
query expansion was executed with the same proce-
                                                               shown in Table 1 was very low.
dure and parameters in the case of dictionary-based
                                                               5.3 Results of runs conducted after submission
translation.
                                                                   Therefore, the authors have corrected the source
4.5 Search algorithm
                                                               code and attempted to perform again some search
     The well-known Okapi formula [18] was used for
                                                               runs after submission of results to the organizers of
computing each document score in all searches of this
                                                               CLEF. Six types of run were conducted as shown in
study, i.e.,
                                                               Table 2, which also indicates each value of mean av-
                     3.0 xt
   z = ∑                            ×                        erage precision calculated by using the relevance
       t∈Ω  ( 0 .5 + 1.5l / l ) + xt                          judgment file. Furthermore, recall-precision curves of
                                            N − nt + 0.5  ,   the six runs are presented as Figure 3. It should be
                                 yt × log                
                                              nt + 0.5       noted that each value in represented in Table 2 and
                                                               Figure 3 was calculated for 51 topics to which one or
where z is the score of a particular document, xt
                                                               more relevant documents are included in the Italian
collections.                                              sion of search with no disambiguation and no expan-
                                                          sion by PRF is only .143, which is significantly lower
Table 2 Mean average precision of runs executed           than .207 in the case of searches through the two
after submission (51 topics)                              stages refinement.
Translation method           Expansion by PRF                 However, we can also point out that there is a
                             done        none             large difference of performance between MT and the
MT                            .301       .281             two stage refinement. The reason may be attributed to
Diction-    With disam-       .207       .181             difference of quality and coverage between the com-
ary         biguation                                     mercial MT software and free dictionaries
            Without dis-      .190       .143             downloaded from the Internet. Even if it is true, we
            ambiguation                                   need to modify the two stages refinement method so
                                                          that its performance level is approaching to that of
    As shown in Table 2, MT outperforms diction-          MT system.
ary-based translation significantly. Also, it turns out       For example, in Figure 3, at the levels of recall
that the disambiguation technique based on term fre-      over 0.7, searches with no disambiguation is re-
quency moderately improves effectiveness of dic-          versely superior to those with disambiguation. This
tionary-based translation method, i.e., the mean av-      may be due to that our disambiguation method selects
erage precision with disambiguation is .207 in com-       only one translation and consequently may remove
parison with .190 in the case of no disambiguation.       some useful synonyms or related terms. A simple
Especially, Table 2 indicates that our technique of two   solution is possibly to choose two or more transla-
stages refinement has a large effect on enhancement       tions instead of using directly Equation (2). Although
of search performance since the mean average preci-       it is difficult to determine the optimal number of
                                                          translations to be selected, multiple translations for

                0.6
                                                          MT with Expansion
                                                          MT with no Expansion
                0.5                                       Dic. with Disambiguation and Expansion
                                                          Dic. with Disambiguation and no Expansion
                                                          Dic. with Expansion and no Disambiguation
                0.4                                       Dic. with no Expansion and no Disambiguation
    Precision




                0.3



                0.2



                0.1



                 0
                  0.0   0.1   0.2       0.3       0.4      0.5     0.6       0.7       0.8      0.9       1.0
                                                          Recall

                                           Figure 3. Recall-Precision curves
each source term may improve recall of searches.            Retrieval and Term Recognition. Tokyo: National
                                                            Institute                of               Informatics.
                                                            http://research.nii.ac.jp/ntcir/workshop/
6. Concluding Remarks
                                                         [6] Sadat, F., Maeda, A. Yoshikawa, M. & Uemura, S.
    This paper reported results of our experiment on        (2002). Query expansion techniques for the CLEF
CLIR from German to Italian, in which English was           Bilingual track. In C. Peters et al. (Eds.) Evaluation
used as a pivot language. In particular, two stages         of Cross-Language Information Retrieval Systems:
refinement of query translation was employed for            LNCS 2406 (pp.177-184) Berlin: Springer.
removing irrelevant terms in the target language pro-    [7] Adriani, M. (2002). English-Dutch CLIR using
duced by transitive translation using successively two      query translation techniques. In C. Peters et al.
bilingual dictionaries.                                     (Eds.) Evaluation of Cross-Language Information
    As a result, it turned out that                         Retrieval Systems: LNCS 2406 (pp.219-225) Ber-
 - our two stages refinement method improves sig-           lin: Springer.
      nificantly retrieval performance of bilingual IR   [8] Qu, Y., Grefenstette, G. & Evans, D. A. (2002).
      using a pivot language, and                           Resolving translation ambiguity using monolingual
 - the performance is inferior to that by MT-base           corpora: a report on Clairvoyance CLEF-2002 ex-
      searches.                                             periments. In Working Notes for the CLEF-2002
By choosing two or more search terms in the disam-          Workshop (pp.115-126).
biguation stage, it is possible that our method be-      [9] Seo, H. C., Kim, S. B., Kim, B. I., Rim, H. C. &
comes more effective.                                       Lee, S. Z. (2003). KUNLP system for NTCIR-3
                                                            English-Korean cross-language information re-
                                                            trieval. In Proceedings of the Third NTCIR Work-
Reference
                                                            shop on research in information Retrieval, Auto-
[1] Ballesteros, L.A. & Croft, W. B. (1997). Phrasal        matic Text Summarization and Question Answer-
   translation and query expansion techniques for           ing. Tokyo, National Institute of Informatics.
   cross-language information retrieval. In Proceed-        ttp://research.nii.ac.jp/ntcir/workshop/
   ings of the 20st ACM SIGIR conference on Re-          [10] Ballesteros, L. A. (2000). Cross-language re-
   search and Development in Information Retrieval.         trieval via transitive translation. In W.B.Croft (Ed.)
   (pp.84-91).                                              Advances in Information Retrieval: Recent Re-
[2] Ballesteros, L. & Croft, W.B. (1998). Resolving         search from the Center for Intelligent Information
   ambiguity for cross-language retrieval. In Pro-          Retrieval (pp.203-234). Boston, MA: Kluwer.
   ceedings of the 21st ACM SIGIR conference on          [11] Frantz, M., McCarley, J. S., & Roukos, S. (1999).
   Research and Development in Information Re-              Ad hoc and multilingual information retrieval at
   trieval (pp.64-71).                                      IBM. In Proceedings of the TREC-7, Gaithersburg,
[3] Yamabana, K., Muraki, K., Doi, S. & Kamei, S.           MD: National Institute of Standards and Technol-
   (1998). A language conversion front-end for              ogy. http://trec.nist.gov/pubs/
   cross-language information retrieval. In G. Grefen-   [12] Gey, F. C., Jiang, H. Chen, A. & Larson, R. R.
   stette (ed.) Cross-language Information retrieval        (1999). Manual queries and machine translation in
   (pp.93-104). Boston, MA: Kluwer.                         cross-language retrieval at TREC-7. In Proceedings
[4] Gao, J., Nie, J. Y., Xun, E X., Zhang, J. Zhou, M.      of the TREC-7, Gaithersburg: MD, National Insti-
   & Huang, C. (2001b). Improving query translation         tute      of      Standards       and     Technology.
   for cross-language information retrieval using sta-      http://trec.nist.gov/pubs/
   tistical models. In Proceedings of 24th ACM           [13] Hiemstra, D. & Kraaij, W. (1999). Twenty-one at
   SIGIR conference on Research and Development             TREC-7: ad-hoc and cross-language track. In Pro-
   in Information Retrieval (pp.96-104).                    ceedings of the TREC-7, Gaithersburg, MD: Na-
[5] Lin, C. J., Lin, W. C., Bian, G. W. & Chen, H. H.       tional Institute of Standards and Technology.
   (1999). Description of the NTU Japanese-English          http://trec.nist.gov/pubs/
   cross-lingual information retrieval system used for   [14] Chen, A. & Gey, F. C. (2003). Experiments on
   NTCIR workshop. In Proceedings of the First              cross-language and patent retrieval at NTCIR-3
   NTCIR Workshop on Research in Japanese Text              workshop. In Proceedings of the Third NTCIR
  Workshop on research in information Retrieval,         cross language information retrieval with triangu-
  Automatic Text Summarization and Question An-          lated translation. In Proceedings of the 24th ACM
  swering. Tokyo, National Institute of Informatics.     SIGIR conference on Research and Development
  http://research.nii.ac.jp/ntcir/workshop/              in Information Retrieval (pp.90-95).
[15] Lin, W. C. & Chen, H.H. (2003). Description of    [17] Porter, M.F.(1980). An algorithm for suffix strip-
  NTU approach to Multilingual Information re-           ping. Program, 14(3), pp.130-137.
  trieval. In Proceedings of the Third NTCIR Work-     [18] Roberson, S. E., Walker, S., Jones, S., Han-
  shop on research in information Retrieval, Auto-       cock-Beaulieu, M. M. & Gatford, M. (1995).
  matic Text Summarization and Question Answer-          Okapi at TREC-3. In Proceedings of TREC-3,
  ing. Tokyo, National Institute of Informatics          Gaithersburg: MD, National Institute of Standards
  http://research.nii.ac.jp/ntcir/workshop/              and Technology. http://trec.nist.gov/pubs/
[16] Gollins, T. & Sanderson, M. (2001). Improving