<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">TPIRS: A System for Document Indexing Reduction on WebCLEF *</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">David</forename><surname>Pinto</surname></persName>
							<email>dpinto@cs.buap.mx</email>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Computer Science</orgName>
								<orgName type="institution">BUAP</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Héctor</forename><surname>Jiménez-Salazar</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Computer Science</orgName>
								<orgName type="institution">BUAP</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Paolo</forename><surname>Rosso</surname></persName>
							<email>prosso@dsic.upv.es</email>
							<affiliation key="aff1">
								<orgName type="department">Department of Information Systems and Computation</orgName>
								<orgName type="institution">UPV</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Emilio</forename><surname>Sanchis</surname></persName>
							<email>esanchis@dsic.upv.es</email>
							<affiliation key="aff1">
								<orgName type="department">Department of Information Systems and Computation</orgName>
								<orgName type="institution">UPV</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">TPIRS: A System for Document Indexing Reduction on WebCLEF *</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">A86C8FE5D16607A8E50216A975AA6DD3</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T00:36+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing</term>
					<term>H.3.3 Information Search and Retrieval</term>
					<term>H.3.4 Systems and Software</term>
					<term>H.3.7 Digital Libraries Measurement, Performance, Experimentation Cross-Lingual Information Retrieval, Terms Reduction, Transition Point</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper we present the results of BUAP/UPV universities in WebCLEF, a particular task of CLEF 2005. Particularly, we evaluate our information retrieval system in the bilingual English to Spanish track. Our system uses a term reduction process based on the Transition Point technique. Our results show that it is possible to reduce the number of terms to index, thereby improving the performance of our system. We evaluate different percentages of reduction over a subset of EuroGOV, in order to determine the best one. We observed that after reducing the 82.55% of the corpus, a Mean Reciprocal Rank of 0.0844 was obtained, compared with 0.0465 of such evaluation with full documents.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>High volume of information in Internet leds to developed novel techniques for managing of data, specially when we deal with information in multiple languages. There are sufficient example scenarios in which users may be interested in information which is in a language other than their own native language. A common language scenario is where a user has some comprehension ability for a given language but s/he is not sufficiently proficient to confidently specify a search request in that language. Thus, a search system that can deal with this problem should be of a high benefit. The World Wide Web (WWW) is a natural setting for cross-lingual information retrieval; the European Union is a typical example of a multilingual scenario, where multiple users have to deal with information published in at least 20 languages.</p><p>In order to reinforce research in this area, CLEF (Cross-Language Evaluation Forum) has been compiling a set of multi-lingual corpora and promoting the evaluation of multiple multi-lingual information retrieval systems for diverse kinds of data <ref type="bibr" target="#b3">[4]</ref>. A particular track for the evaluation of such systems that deal with information on the web has been set up this year as a part of CLEF. This forum was named WebCLEF, and the best description of this particular task can be seen in <ref type="bibr" target="#b9">[10]</ref>. In WebCLEF, three subtasks were defined within this year: mixed monolingual, multilingual, and bilingual English to Spanish.</p><p>This paper reports results on the evaluation of a Cross-Language Information Retrieval System (CLIRS) for the bilingual English to Spanish subtask of WebCLEF 2005. A document indexing reduction is proposed, in order to improve precision of CLIRS and to diminish the storing space on such systems. Our proposal is based on the use of the Transition Point (TP) technique, which is somehow a method that obtains important terms from a document. We evaluate different percentages of TP over a subset of EuroGOV corpus <ref type="bibr" target="#b8">[9]</ref>, and we observed that it is possible to improve precision results reducing the number of terms for a given corpus.</p><p>The next section describes our information retrieval system in detail. Section 3 briefly introduces the corpus used in our experiments, and the results obtained after evaluation. Finally, a discussion of our experiments is presented.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Description of TPIRS</head><p>We used a boolean model with Jaccard similarity formula for our CLIRS. Our goal was to determine the behaviour of document indexing reduction in an information retrieval environment. In order to reduce the terms from every document treated, we applied a technique named Transition Point, which is described as follows.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Transition Point Technique</head><p>The transition point is a frequency value that splits the vocabulary of a document in two sets of terms (low and high frequency). This technique is based on Zipf Law of Word Ocurrences <ref type="bibr" target="#b13">[14]</ref> and refined from studies of Booth <ref type="bibr" target="#b0">[1]</ref> and, recently, Urbizagástegui <ref type="bibr" target="#b12">[13]</ref>. These studies are meant to demonstrate that terms of medium frequency are closely related to the conceptual content of the document. Thus, it is possible to form the hypothesis that terms closer to TP can be used as indexes of a document. A typical formula used to obtain this value is given in equation 1.</p><formula xml:id="formula_0">T P = √ 8 * I 1 + 1 − 1 2 ,<label>(1)</label></formula><p>where I 1 represents the number of words with frequency equal to 1 [8] <ref type="bibr" target="#b12">[13]</ref>.</p><p>Alternatively, TP can be localized by identifying the lowest frequency (from the highest frequencies) that it is not repeated; this characteristic comes from properties of Booth's law of low frequency words <ref type="bibr" target="#b0">[1]</ref>.</p><p>Let us consider a frequency-sorted vocabulary of a document; i.e.,</p><formula xml:id="formula_1">V T P = [(t 1 , f 1 ), ..., (t n , f n )], with f i ≥ f i−1 , then T P = f i−1 , iif f i = f i+1 .</formula><p>The most important words are those that obtain the closest frequency values to TP, i.e.,</p><formula xml:id="formula_2">T P SET = {t i |(t i , f i ) ∈ V T P , U 1 ≤ f i ≤ U 2 },<label>(2)</label></formula><p>where U 1 is a lower threshold obtained by a given neighbourhood percentage of TP (NTP), thus, U 1 = (1 − N T P ) * T P . U 2 is the upper threshold and it is calculated in a similar way (U 2 = (1 + N T P ) * T P ).</p><p>We have used TP technique in diverse areas of natural language processing (NLP), like: clustering of short texts <ref type="bibr" target="#b4">[5]</ref>, categorization of texts <ref type="bibr" target="#b5">[6]</ref>, keyphrases extraction <ref type="bibr" target="#b6">[7]</ref> [12], summarization <ref type="bibr" target="#b1">[2]</ref>, and weighting models for information retrieval systems <ref type="bibr" target="#b2">[3]</ref>. Thus, we believe that there exist enough evidence to utilize this technique as a terms reduction process.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Information Retrieval Model</head><p>Our information retrieval is based on the Boolean Model, and, in order to rank the documents retrieved, we used the Jaccard similarity function, applied to the query and every document of the corpus used. Previously, each document was preprocessed and its index terms were selected (the preprocessing phase is described in section 3.1). For this purpose, several values of a neighbourhood of TP were used as thresholds, as equation 2 indicates.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Evaluation</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Corpus</head><p>We used a subset of EurGov corpus for our evaluation. This subset was composed by a set of Spanish Internet pages, originally obtained from European government-related sites. We named this corpus BiEnEs.</p><p>In order to construct this corpus, for every page compiled in the EuroGOV corpus, we determine its language by using TexCat <ref type="bibr" target="#b10">[11]</ref>, a guesser language program widely used. We construct our evaluation corpus with those documents identified as Spanish language.</p><p>The preprocessing of the BiEnEs corpus consisted of elimination of punctuation symbols, Spanish stopwords, numbers, html tags, script codes and style cascade sheets codes.</p><p>For the evaluation of BiEnEs, a set of 134 queries was composed and refined, in order to provide gramatically correct "English" queries. Queries and assessments were created by the participants in the WebCLEF track, and the particular case of the queries were later reviewed and in some cases corrected in their English translation by the NLP Group at UNED. Queries were distributed in the following way: 67 homepages and 67 named page findings.</p><p>We applied a preprocessing phase to this set of queries. First, we used an online translation system<ref type="foot" target="#foot_0">1</ref> in order to translate every query from English to Spanish. After that, an elimination of punctuation symbols, spanish stopwords and numbers was done.</p><p>We did not apply a rigorous method of translation, due to the fact that our main goal in our first participation on WebCLEF was to determine the quality of terms reduction in our CLIRS.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Indexing reduction</head><p>In order to determine the behaviour of document indexing reduction on CLIRS, we submit to the contest, a set of five runs, which are described as follows.</p><p>First Run: This run used "Full documents" as evaluation corpus, and conformed the baseline for our experiments. We named it the "Full" evaluation.</p><p>Second Run: This run used an evaluation corpus composed of the reduction of every document, using the TP technique with a neighbourhood of 10% around TP. We named it the "TP10" evaluation.</p><p>Third Run: This run used an evaluation corpus composed of the reduction of every document, using the TP technique with a neighbourhood of 20% around TP. We named it the "TP20" evaluation.</p><p>Fourth Run: This run used an evaluation corpus composed of the reduction of every document, using the TP technique with a neighbourhood of 40% around TP. We named it the "TP40" evaluation.</p><p>Fifth Run: This run used an evaluation corpus composed of the reduction of every document, using the TP technique with a neighbourhood of 60% around TP. We named it the "TP60" evaluaton.</p><p>Table <ref type="table" target="#tab_0">1</ref> shows the size of every evaluation corpus used, as well as the percentage of reduction obtained for each one. As can be seen, the TP technique obtained a big percentage of reduction (between 75 and 89%), which also implies a reduction in time for indexing process, in a CLIRS. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Results</head><p>Table <ref type="table" target="#tab_1">2</ref> shows the results for every run submitted. First column indicates the name of each run.</p><p>Last column shows the Mean Reciprocal Rank (MRR) obtained for each run. Additionally, the average success at different number of documents retrieved is shown, by instance, second column indicates the average success of the CLIRS at the first answer. The "TP20" approach, obtained a total of 49 answers, and therefore, it does not has average success at 50. As can be seen, an important improvement was done by using an evaluation corpus obtained with a neighbourhood of 40% of TP. We were hoping to obtain comparable results with the "Full" run, but as can be seen, the "TP40" approach duplicated "Full" MRR. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Average Success at</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Discussion</head><p>We proposed an index reduction method for a cross-lingual information retrieval system. Our proposal is based on the transition point technique.</p><p>After submitting five runs on the bilingual English to Spanish subtrack from WebCLEF, we observed that it is possible to reduce terms in the documents that conform the corpus of a CLIRS, not only by reducing the time needed for indexing but also by improving the precision of the results obtained by CLIRS.</p><p>Our method is linear in computational time, and therefore it can be used in practical tasks. Until now, results obtained in terms of MRR are very low, but findings show that by applying better techniques of English to Spanish translation of queries, results can be dramatically improved.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Evaluation corpora</figDesc><table><row><cell cols="3">Corpus Size (Kb) % of Reduction</cell></row><row><cell>Full</cell><cell>117,345</cell><cell>0%</cell></row><row><cell>TP10</cell><cell>12,616</cell><cell>89.25%</cell></row><row><cell>TP20</cell><cell>19,660</cell><cell>83.25%</cell></row><row><cell>TP40</cell><cell>20,477</cell><cell>82.55%</cell></row><row><cell>TP60</cell><cell>28,903</cell><cell>75.37%</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Evaluation results</figDesc><table><row><cell>Corpus</cell><cell>1</cell><cell>5</cell><cell>10</cell><cell>20</cell><cell>50</cell><cell>Mean Reciprocal Rank</cell></row><row><cell>Full</cell><cell>0.0224</cell><cell>0.0672</cell><cell>0.1119</cell><cell>0.1418</cell><cell>0.1866</cell><cell>0.0465</cell></row><row><cell>TP10</cell><cell>0.0224</cell><cell>0.0373</cell><cell>0.0672</cell><cell>0.0821</cell><cell>0.1119</cell><cell>0.0331</cell></row><row><cell>TP20</cell><cell>0.0299</cell><cell>0.0448</cell><cell>0.0672</cell><cell>0.1045</cell><cell>-</cell><cell>0.0446</cell></row><row><cell>TP40</cell><cell cols="3">0.0597 0.0970 0.1119</cell><cell cols="2">0.1418 0.2164</cell><cell>0.0844</cell></row><row><cell>TP60</cell><cell>0.0522</cell><cell cols="4">0.1045 0.1269 0.1642 0.2090</cell><cell>0.0771</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://www.freetranslation.com</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We were concerned with the impact of indexing reduction on CLIRS, and in the future we hope to improve other components of our CLIRS, for instance, the use of vector space model, in order to improve the MRR.</p><p>The TP technique has shown an effective use on diverse areas of NLP, and its best features for NLP, are mainly two: a high content of semantic information and the sparseness that can be obtained on vectors for document representation on models based on the vector space model. On the other hand, its language independence allows to use this technique in CLIRS, that is the matter of WebCLEF.</p></div>
			</div>


			<div type="funding">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>* This work was partially supported by BUAP-VIEP 3/G/ING/05, R2D2 (CICYTTIC2003-07158-C04-03), ICT EU-India (ALA/95/23/2003/077-054) research projects</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">A Law of Ocurrences for Words of Low Frequency</title>
		<author>
			<persName><forename type="first">A</forename><surname>Booth</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1967">1967</date>
			<publisher>Information and control</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">El párrafo virtual en la generación de extractos</title>
		<author>
			<persName><forename type="first">C</forename><surname>Bueno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Pinto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jimenez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Research on Computing Science Journal</title>
		<idno type="ISSN">1665-9899</idno>
		<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Una nueva ponderación para el modelo de espacio vectorial de recuperación de información</title>
		<author>
			<persName><forename type="first">R</forename><surname>Cabrera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Pinto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jimenez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Vilariño</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Research on Computing Science Journal</title>
		<idno type="ISSN">1665-9899</idno>
		<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<ptr target="http://www.clef-campaign.org/" />
		<title level="m">Cross-Language Evaluation Forum</title>
				<imprint>
			<date type="published" when="2005">2005. 2005</date>
		</imprint>
	</monogr>
	<note>CLEF</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Selección de Términos No Supervisada para Agrupamiento de Resúmenes</title>
		<author>
			<persName><forename type="first">H</forename><surname>Jimenez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Pinto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">proceedings of Workshop on Human Language</title>
				<meeting>Workshop on Human Language<address><addrLine>ENC05</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">An Analysis on Frequency of Terms for Text Categorization</title>
		<author>
			<persName><forename type="first">E</forename><surname>Moyotl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jimenez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of XX Conference of Spanish Natural Language Processing Society</title>
				<meeting>XX Conference of Spanish Natural Language Processing Society<address><addrLine>SEPLN-</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="volume">04</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Pinto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Pérez</surname></persName>
		</author>
		<title level="m">Una Técnica para la Identificación de Términos MultipalabrIn p, Proceedings of 2nd</title>
				<meeting><address><addrLine>Mexico</addrLine></address></meeting>
		<imprint>
			<publisher>National Conference on Computer Science</publisher>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Reducción de Términos Indice Usando el Punto de Transición</title>
		<author>
			<persName><forename type="first">B</forename><surname>Reyes-Aguirre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Moyotl-Hernández</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jiménez-Salazar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">proceedings of Facultad de Ciencias de Computación XX Anniversary Conferences</title>
				<meeting>Facultad de Ciencias de Computación XX Anniversary Conferences<address><addrLine>BUAP</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">EuroGOV: Engineering a Multilingual Web Corpus</title>
		<author>
			<persName><forename type="first">B</forename><surname>Sigurbjörnsson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kamps</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>De Rijke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of CLEF</title>
				<meeting>CLEF</meeting>
		<imprint>
			<date type="published" when="2005">2005. 2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">WebCLEF 2005: Cross-Lingual Web Retrieval</title>
		<author>
			<persName><forename type="first">B</forename><surname>Sigurbjörnsson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kamps</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>De Rijke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of CLEF 2005</title>
				<meeting>CLEF 2005</meeting>
		<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<ptr target="http://odur.let.rug.nl/vannord/TextCat/" />
		<title level="m">TextCat: Language identification tool</title>
				<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Combining Keyword Identification Techniques</title>
		<author>
			<persName><forename type="first">M</forename><surname>Tovar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Carrillo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Pinto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jimenez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Research on Computing Science Journal</title>
		<idno type="ISSN">1665-9899</idno>
		<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Urbizagástegui</surname></persName>
		</author>
		<title level="m">Las posibilidades de la Ley de Zipf en la indización automática</title>
				<imprint>
			<date type="published" when="1999">1999</date>
		</imprint>
		<respStmt>
			<orgName>Research report of the California Riverside University</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Human Behavior and the Principle of Least-Effort</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">K</forename><surname>Zipf</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1949">1949</date>
			<publisher>Addison-Wesley</publisher>
			<pubPlace>Cambridge MA</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
