<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Cross-Lingual Information Retrieval System for Indian Languages</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Jagadeesh</forename><surname>Jagarlamudi</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Microsoft Research India</orgName>
								<address>
									<settlement>Bangalore</settlement>
									<country key="IN">INDIA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">A</forename><forename type="middle">Kumaran</forename><surname>Multilingual</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Microsoft Research India</orgName>
								<address>
									<settlement>Bangalore</settlement>
									<country key="IN">INDIA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Systems</forename><surname>Research</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Microsoft Research India</orgName>
								<address>
									<settlement>Bangalore</settlement>
									<country key="IN">INDIA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Cross-Lingual Information Retrieval System for Indian Languages</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">59D34274A832ED4F257403F1F0E8301C</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T07:08+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing</term>
					<term>H.3.3 Information Search and Retrieval</term>
					<term>H.3.4 Systems and Software</term>
					<term>H.3.7 Digital Libraries</term>
					<term>H.2.3 [Database Managment]: Languages-Query Languages Experiments, Performance Information Retrieval, Cross-lingual Information Retrieval, Statistical Machine Translation, CLEF, Bilingual Dictionary, Query Translation</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper describes our first participation in the Indian language sub-task of the main Adhoc monolingual and bilingual track in CLEF 1 competition. In this track, the task is to retrieve relevant documents from an English corpus in response to a query expressed in different Indian languages including Hindi, Tamil, Telugu, Bengali and Marathi. Groups participating in this track are required to submit a English to English monolingual run and a Hindi to English bilingual run with optional runs in rest of the languages. We had submitted a monolingual English run and a Hindi to English cross-lingual run.</p><p>We used a word alignment table that was learnt by a Statistical Machine Translation (SMT) system trained on aligned parallel sentences, to map a query in source language into an equivalent query in the language of the target document collection. The relevant documents are then retrieved using a Language Modeling based retrieval algorithm. On CLEF 2007 data set, our official cross-lingual performance was 54.4% of the monolingual performance and in the post submission experiments we found that it can be significantly improved up to 73.4%.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The rapidly changing demographics of the internet population <ref type="bibr">[7]</ref> and the plethora of multilingual content on the web <ref type="bibr" target="#b4">[5]</ref> has attracted the attention of Information Retrieval(IR) community to develop methodologies for cross-lingual information accessing. Since the past decade <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b9">11,</ref><ref type="bibr" target="#b12">14]</ref> researchers are looking at ways to retrieve documents in a language in response to a query in another language. This fundamentally assumes that users can read and understand documents written in foreign language but unable to express their information need in that language. There are arguments against this assumption as well: For example, <ref type="bibr" target="#b10">[12]</ref> argues that it is unlikely that the information in another language will be useful unless users are already fluent in that language. However, we argue that in specific cases such methodologies could still be valid. For example, in India students learn more than one language from their childhood and more than 30% of the population can read and understand Hindi apart from their native language <ref type="bibr" target="#b3">[4]</ref>. This situation exhibits great utility for systems with the capability to retrieve relevant documents in languages different from the language in which information need is expressed.</p><p>Lack of resources is still a major reason for relatively less number of efforts in the cross-lingual setting in Indian subcontinent. Research communities working in Indian Languages, especially on Machine Translation (MT) <ref type="bibr" target="#b1">[2]</ref>, have built some necessary resources like morphological analyzer and bilingual dictionaries for some languages. Even though these resources are built mainly for MT, they can still be used as a good starting point to build a Cross-Lingual Information Retrieval (CLIR) system, as we demonstrate in this paper. More specifically, in this paper we will describe our first attempt in building a CLIR system using the bilingual statistical dictionary that was learnt automatically during the training phase of a SMT <ref type="bibr" target="#b11">[13]</ref> system.</p><p>The rest of the paper is organized as follows. We will first define the problem in section 2, followed by the presentation of our approach in section 3. In sections 4.1 &amp; 4.2 we will describe data set along with the resources used and present the performance of our system (section 4.3) in the CLEF competition. Section 4 includes some analysis of the results. Section 5 presents our conclusion and identifies our plans for future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Problem Statement</head><p>Cross Language Evaluation Forum (CLEF) aims at promoting research in the design of multilingual, multi-modal retrieval systems by providing an opportunity for the research communities working in different languages to collaborate and share their experiences. Each year it organizes a series of evaluation tracks to test different aspects of cross-language information retrieval system development.</p><p>We have participated in the CLEF competition, specifically in the Indian Language sub-task of the main CLEF 2007 Ad-hoc monolingual and bilingual track. This track tests the performance of systems in retrieving the relevant documents in response to a query in the same and different language from that of the document set. In the Indian language track, documents are provided in English and queries are specified in different languages including Hindi, Telugu, Bengali, Marathi and Tamil. The system has to retrieve 1000 relevant documents as response to a query in any of the above mentioned languages. All the systems participating in this track are required to submit a English to English monolingual run and a Hindi to English bilingual run. Runs in rest of the languages are optional.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Approach</head><p>Converting the information expressed in different languages to a common representation is inherent to cross-lingual applications to build the language barrier. In CLIR, either the query or the document or both need to be mapped into the common representation to retrieve the relevant documents. Translating all documents into the query language is less desirable due to the enormous resource requirements. Usually the query is translated into the language of the target collection of documents. Typically three types of resources are exploited for translating the queries: bilingual machine readable dictionaries, parallel texts and machine translation systems. MT systems typically produce one candidate translation thus some potential information which could be of use to IR system is lost. Researchers <ref type="bibr" target="#b7">[9]</ref> have also explored considering more than one possible translation to avoid the loss of useful information. Another difficulty in using the MT system comes from the fact that most of the search queries are very short and lack necessary syntactic information required for translation. Hence most approaches use bilingual dictionaries.</p><p>In our work, we have used a statistically aligned Hindi to English word alignments that were learnt during the training phase of machine translation. The query in Hindi language is translated into English using word by word translation. For a given Hindi word, all English words which have translation probability above certain threshold are selected as candidate translations. Only top 'n' of these candidates are selected as final translations to reduce ambiguity in the translation. The aligned bilingual dictionary may not contain some of the query words because either the word is not available in the parallel corpus or the translation probabilities are less than the threshold. In such cases, we attempt to transliterate the query word into English. We have used a noisy channel model based transliteration algorithm <ref type="bibr" target="#b6">[8]</ref>. The phonemic alignments between Hindi characters and corresponding English characters are learnt automatically from a training corpus of parallel names in Hindi and English. These alignments along with their probabilities are used, during viterbi decoding, to transliterate a new Hindi word into English. As reported, this system will output the correct(fuzzy match) English word in top 10 results, with an accuracy of about 30%(80%). Target language vocabulary along with approximate string matching algorithms like soundex and edit distance measure <ref type="bibr" target="#b8">[10]</ref> were used to filter out the correct word from the incorrect ones among the possible candidate transliterations.</p><p>Once the query is translated into the language of the document collection, standard IR algorithms can be used to retrieve the relevant documents. We have used Langauge Modeling <ref type="bibr" target="#b13">[15]</ref> in our experiments. In Language Modeling framework, both query formulation and retrieval of relevant documents are treated as simple probability mechanisms. Essentially, each document is assumed as a language sample and query as a sample from this document. The likelihood of generating a query from the document (p(q|d)) is associated with the relevance of the document to the query. A document which is more likely to generate the user query is considered to be more relevant. Since a document considered as bag of words is very small when compared to whole vocabulary, most of the times the resulting document models are very sparse. Hence smoothing of the document distributions is very crucial. Many techniques have been explored and because of its simplicity and effectiveness we chose the relative frequency of a term in the entire collection to smooth the document distributions.</p><p>In a nutshell, structural query translation <ref type="bibr" target="#b12">[14]</ref> is used to translate query into English. The relevant documents are then retrieved using a Language Modeling based retrieval algorithm. Following section describes our approach applied in the CLEF 2007 participation and some further experiments to calibrate the quality of our system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experiments</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">CLEF Data set for Adhoc track</head><p>In both the Adhoc bilingual 'X' to English track and Indian language sub track, the target document collection consisted of 135,153 English news articles published in Los Angeles Times, from the year 2002. During the indexing of this document collection, only text portion (embedded in &lt;LD&gt; and &lt;TE&gt; tags) was considered. Note that the results reported in this paper does not make use of other potentially useful information present in the document, such as, the document heading (with in &lt;DH&gt; tag) and the photo caption (in &lt;CP&gt; tag), even though we believe that including such an information would improve the performance of the system. The resulting 85,994 non-empty documents were then processed to remove the stop words and the remaining words were reduced into their base form using Porter stemmer <ref type="bibr" target="#b14">[16]</ref>.</p><p>50 topics originally created in English and translated later into other languages were distributed among the participants. For processing Hindi queries, a list of stop words was formed based on the frequency of word in the monolingual corpus obtained corresponding to the Hindi part of the parallel data. This list was then used to remove any less informative words occurring in the topic statements. The processed query was then translated into English using the word alignment table.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Word Alignment Table as Bilingual Dictionary</head><p>We have used the word alignment table that was learnt by the SMT <ref type="bibr" target="#b11">[13]</ref> system trained on 100K Hindi to English parallel sentences to translate Hindi queries. Since these alignments were learnt primarily for machine translation purpose, the alignments included words that occurs in their inflectional forms. For this reason we have not converted the query words into their base form during the translation. Table <ref type="table" target="#tab_0">1</ref> shows the statistics about the coverage of the alignment table corresponding to different levels of threshold on the translation probability (column 1), note that a threshold value of 0 correspond to having no threshold at all. Columns 2 and 3 indicate the coverage of the dictionary in terms of source and target language words. The last column denote the average number of English translations for a Hindi word. It is very clear and intuitive that as the threshold increases the coverage of the dictionary decreases. It is also worth noting that, as the threshold increases, the average translations per source word decreases indicating that the target language words which are related to the source word but not synonymous are getting filtered. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Results</head><p>Each of the participating systems was required to submit 1000 relevant documents for each topic.</p><p>For each query, a pool of candidate relevant documents is created by combining the documents submitted by all systems. From the pool of such candidate documents assessors filter out actual relevant documents from non relevant ones. These relevance judgements are then used to automatically evaluate the quality of cross-lingual retrieval of participating CLIR systems.</p><p>In this section we discuss the results of our monolingual English run and Hindi to English bilingual run. In our case we specifically participated in only one Indian language -Hindi, though the data was available in 5 Indian languages. For our official submission, with the aim of reducing noise in the translated query, we have used a relatively high threshold of 0.3 for the translation probability. To avoid ambiguity, when there are many possible English translations for a given Hindi word, we allowed only 2 best possible translations according to the statistical alignments learnt by SMT. Table <ref type="table" target="#tab_1">2</ref> shows the official results of our submission. We have submitted different runs using title, description (td) and title, description and narration (tdn) as query. Narration seems to be improving the cross-lingual retrieval performance, in terms of Mean Average Precision (MAP), more than that in monolingual setting. In the rest of the experiments it is assumed that narration is included as a part of the query unless explicitly mentioned. Figure <ref type="figure" target="#fig_0">1</ref> shows a comparison of cross-lingual and mono-lingual submissions in terms of precision at different levels of interpolated recall.  In a second set of experiments, we experimented with various levels of threshold and the number of translations above the threshold and their effect on MAP score. The results obtained by monolingual system and cross-lingual system with varying threshold are shown plotted in Figure <ref type="figure" target="#fig_1">2</ref>. The x-axis represents the number of top words considered when many of the target language words have translation probabilities above the chosen threshold. Note that y-axis represents only a subset (0.15-0.45) of the entire possible range (0-1) in which a MAP score can lie. The right most bar represents the monolingual performance of the system. The figure shows that the performance increases as the threshold decrease but again it drops if you consider all the possible translations (dip when 10 and 15 translations were considered) perhaps due to the shift in query focus with the inclusion of many less synonymous target language words. For the CLEF data set, we found that, considering 4 most possible translations with out any threshold (left most bar) on the translation probability gave us the best results (73.4% of monolingual IR performance). As the threshold decrease potentially two things can happen; words which were not translated previously can get translated or new target language words whose translation probability was below the threshold will now become the candidates of the translated query. Table <ref type="table" target="#tab_2">3</ref> shows the fraction of query words that were translated corresponding to each of these thresholds. Table <ref type="table" target="#tab_2">3</ref> and Figure <ref type="figure" target="#fig_1">2</ref> show that as the threshold on the translation probability decrease, fraction of query words getting translated increases, resulting in an overall increase in system performance. But the performance increase between having no threshold and a threshold of 0.1 compared to the small fraction of new words that got translated suggest that even noisy translations, even though they are not true synonymous, might help CLIR. This perhaps due to the fact that for the purposes of IR, having a list of associated words may be sufficient to identify the context of the query <ref type="bibr" target="#b2">[3]</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Monolingual</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion and Future Work</head><p>This paper describes our first attempts in building a CLIR system with the help of a word alignment table learned, from a parallel corpora, primarily for statistical machine translation. We present our experience and results of our participation in the Indian language sub-task of the Adhoc monolingual and bilingual track of CLEF 2007. In post submission experiments we found that, on CLEF data set, a Hindi to English cross lingual information retrieval system using a simple word by word translation of the query with the help of a word alignment table, was able to achieve ∼ 73% of the performance of the monolingual system. Empirically we found that considering 4 most probable word translations with no threshold on the translation probability gave the best results.</p><p>Since the quality of the dictionary will affect the performance of the system, in future we would like to explore the effect of size and quality of the parallel data on the word alignments, and subsequently on the CLIR performance. We would also like to compare the use of a statistically learned word alignments with respect to a hand crafted dictionary of similar size for CLIR application.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Precision at different levels of interpolated recall.</figDesc><graphic coords="5,137.84,109.04,324.00,216.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Comparison of the CLIR Performance with varying threshold on translation probability.</figDesc><graphic coords="6,137.84,109.05,324.00,216.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Coverage statistics of the word alignment table</figDesc><table><row><cell cols="4">Threshold Hindi words English words Translations per word</cell></row><row><cell>0</cell><cell>57555</cell><cell>59696</cell><cell>8.53</cell></row><row><cell>0.1</cell><cell>45154</cell><cell>54945</cell><cell>4.39</cell></row><row><cell>0.3</cell><cell>14161</cell><cell>17216</cell><cell>1.59</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>monolingual and cross-lingual experiments.</figDesc><table><row><cell></cell><cell></cell><cell></cell><cell cols="2">Crosslingual</cell></row><row><cell></cell><cell cols="4">LM(td) LM(tdn) LM(td) LM(tdn)</cell></row><row><cell cols="2">MAP 0.3916</cell><cell>0.3964</cell><cell>0.1994</cell><cell>0.2156</cell></row><row><cell>p@10</cell><cell>0.456</cell><cell>0.454</cell><cell>0.216</cell><cell>0.294</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 :</head><label>3</label><figDesc>Fraction of translated words</figDesc><table><row><cell cols="2">Threshold Translated words</cell></row><row><cell>0</cell><cell>0.803</cell></row><row><cell>0.1</cell><cell>0.7892</cell></row><row><cell>0.2</cell><cell>0.711</cell></row><row><cell>0.3</cell><cell>0.6344</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://www.clef-campaign.org</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Dictionary methods for cross-lingual information retrieval</title>
		<author>
			<persName><forename type="first">Lisa</forename><surname>Ballesteros</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">Bruce</forename><surname>Croft</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">DEXA &apos;96: Proceedings of the 7th International Conference on Database and Expert Systems Applications</title>
				<meeting><address><addrLine>London, UK</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="1996">1996</date>
			<biblScope unit="page" from="791" to="801" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Machine Translation activities in India: A survey</title>
		<author>
			<persName><forename type="first">Akshar</forename><surname>Bharati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rajeev</forename><surname>Sangal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dipti</forename><forename type="middle">M</forename><surname>Sharma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Amba</forename><forename type="middle">P</forename><surname>Kulakarni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Workshop on survey on Research and Development of Machine Translation in Asian Countries</title>
				<imprint>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A review of ontology based query expansion</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bhogal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Macfarlane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Smith</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Inf. Process. Manage</title>
		<imprint>
			<biblScope unit="volume">43</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="866" to="886" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">The internet in india: better times ahead?</title>
		<author>
			<persName><forename type="first">E</forename><surname>Grey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Seymour</forename><forename type="middle">E</forename><surname>Burkhart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Arun</forename><surname>Goodman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Larry</forename><surname>Mehta</surname></persName>
		</author>
		<author>
			<persName><surname>Press</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Commun. ACM</title>
		<imprint>
			<biblScope unit="volume">41</biblScope>
			<biblScope unit="issue">11</biblScope>
			<biblScope unit="page" from="21" to="26" />
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title/>
		<author>
			<persName><surname>Globalreach</surname></persName>
		</author>
		<ptr target="http://www.global-reach.biz/globstats/evol.html" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Querying across languages: a dictionary-based approach to multilingual information retrieval</title>
		<author>
			<persName><forename type="first">David</forename><forename type="middle">A</forename><surname>Hull</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gregory</forename><surname>Grefenstette</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGIR &apos;96: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval</title>
				<meeting><address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM Press</publisher>
			<date type="published" when="1996">1996</date>
			<biblScope unit="page" from="49" to="57" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">A generic framework for machine transliteration</title>
		<author>
			<persName><forename type="first">A</forename><surname>Kumaran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tobias</forename><surname>Kellner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGIR &apos;07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval</title>
				<meeting><address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM Press</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="721" to="722" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Rich results from poor resources: Ntcir-4 monolingual and cross-lingual retrieval of korean texts using chinese and english</title>
		<author>
			<persName><forename type="first">Kui</forename><surname>Lam Kwok</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sora</forename><surname>Choi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Norbert</forename><surname>Dinstl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Asian Language Information Processing (TALIP)</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="136" to="162" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Binary codes capable of correcting deletions, insertions, and reversals</title>
		<author>
			<persName><forename type="first">I</forename><surname>Vladimir</surname></persName>
		</author>
		<author>
			<persName><surname>Levenshtein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">English translation in Soviet Physics Doklady</title>
		<imprint>
			<biblScope unit="page" from="707" to="710" />
			<date type="published" when="1966">1966</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Comparing cross-language query expansion techniques by degrading translation resources</title>
		<author>
			<persName><forename type="first">Paul</forename><surname>Mcnamee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">James</forename><surname>Mayfield</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGIR &apos;02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval</title>
				<meeting><address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM Press</publisher>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="159" to="166" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">What is the future of multi-lingual information access?</title>
		<author>
			<persName><forename type="first">Isabelle</forename><surname>Moulinier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Frank</forename><surname>Schilder</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGIR 2006 Workshop on Multilingual Information Access 2006</title>
				<meeting><address><addrLine>Seattle, Washington, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">A systematic comparison of various statistical alignment models</title>
		<author>
			<persName><forename type="first">Josef</forename><surname>Franz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hermann</forename><surname>Och</surname></persName>
		</author>
		<author>
			<persName><surname>Ney</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Comput. Linguist</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="19" to="51" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Dictionary-based crosslanguage information retrieval: Problems, methods, and research findings</title>
		<author>
			<persName><forename type="first">Ari</forename><surname>Pirkola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Turid</forename><surname>Hedlund</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Heikki</forename><surname>Keskustalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kalervo</forename><surname>Järvelin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Inf. Retr</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="issue">3-4</biblScope>
			<biblScope unit="page" from="209" to="230" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">A Language Modeling Approach to Information Retrieval</title>
		<author>
			<persName><forename type="first">Jay</forename><forename type="middle">M</forename><surname>Ponte</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">Bruce</forename><surname>Croft</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Research and Development in Information Retrieval</title>
				<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="275" to="281" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">An algorithm for suffix stripping</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">F</forename><surname>Porter</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1997">1997</date>
			<biblScope unit="page" from="313" to="316" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
