<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">The University of Alicante at CL-SR track</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Rafael</forename><forename type="middle">M</forename><surname>Terol</surname></persName>
							<email>rafael@dlsi.ua.es</email>
							<affiliation key="aff0">
								<orgName type="department">Departamento de Lenguajes y Sistemas Informáticos</orgName>
								<orgName type="institution">Universidad de Alicante Carretera de San Vicente del</orgName>
								<address>
									<addrLine>Raspeig</addrLine>
									<settlement>Alicante</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Manuel</forename><surname>Palomar</surname></persName>
							<email>mpalomar@dlsi.ua.es</email>
							<affiliation key="aff0">
								<orgName type="department">Departamento de Lenguajes y Sistemas Informáticos</orgName>
								<orgName type="institution">Universidad de Alicante Carretera de San Vicente del</orgName>
								<address>
									<addrLine>Raspeig</addrLine>
									<settlement>Alicante</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Patricio</forename><surname>Martinez-Barco</surname></persName>
							<email>patricio@dlsi.ua.es</email>
							<affiliation key="aff0">
								<orgName type="department">Departamento de Lenguajes y Sistemas Informáticos</orgName>
								<orgName type="institution">Universidad de Alicante Carretera de San Vicente del</orgName>
								<address>
									<addrLine>Raspeig</addrLine>
									<settlement>Alicante</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Fernando</forename><surname>Llopis</surname></persName>
							<email>llopis@dlsi.ua.es</email>
							<affiliation key="aff0">
								<orgName type="department">Departamento de Lenguajes y Sistemas Informáticos</orgName>
								<orgName type="institution">Universidad de Alicante Carretera de San Vicente del</orgName>
								<address>
									<addrLine>Raspeig</addrLine>
									<settlement>Alicante</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Rafael</forename><surname>Muñoz</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Departamento de Lenguajes y Sistemas Informáticos</orgName>
								<orgName type="institution">Universidad de Alicante Carretera de San Vicente del</orgName>
								<address>
									<addrLine>Raspeig</addrLine>
									<settlement>Alicante</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Elisa</forename><surname>Noguera</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Departamento de Lenguajes y Sistemas Informáticos</orgName>
								<orgName type="institution">Universidad de Alicante Carretera de San Vicente del</orgName>
								<address>
									<addrLine>Raspeig</addrLine>
									<settlement>Alicante</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">The University of Alicante at CL-SR track</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">1C7992988FB583ED731D0B976B7A0CF7</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T00:39+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Speech Retrieval</term>
					<term>Information Retrieval</term>
					<term>Logic Forms</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper describes the participation of the University of Alicante in the new CL-SR Track at CLEF conference. In this track we introduce a set of features in the topics processing applied by our IR-n system. This set of features are based in the application of logic forms to topics and in the increment of the terms weight of the topics according to a set of syntactic rules.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In the same line of active participation of University of Alicante in previous CLEF conferences, IR-n system takes part in this new Cross-Language Speech Retrieval (CL-SR) Track at the present CLEF 2005 conference. As novelty, IR-n system includes a new module that increments the terms weights of the topics applying a set of rules based on the representation of the topics in the way of logic forms <ref type="bibr" target="#b6">[7]</ref>.</p><p>Following section shows the main features of this new release of IR-n system. The logic form derivation module and the rules applied to these logic forms are also presented in following sections. Finally, we describe each one of the submitted runs, the scores obtained by the IR-n system in these submitted runs, the conclusions and the future works in our research activity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">IR-n as Passage Retrieval System</head><p>IR-n is a passage retrieval system (RP). RP systems <ref type="bibr" target="#b1">[2]</ref> studies the appearance of query terms in contiguous fragments of the documents (also called passages). One of the main advantages of these systems is that these allow us to determine not only if a document is relevant or not, but also that these systems detect the relevant part of the document. These passages are usually composed for a fixed number of sentences, but the but the format of the document collection of this CL-SR track does not allow this feature. These documents are composed by a contiguous set of words without punctuation marks. Moreover, we can't know the limit between each sentence. As a result, we have chosen a fixed number of words to compose the passages. Furthermore, IR-n system uses overlapping passages in order to avoid that some documents could be considered not relevant if it appears words of the question in adjacent passages. IR-n system allows the use of distinct similarity measures (Ex. Okapi <ref type="bibr" target="#b5">[6]</ref>) to calculate the weights of the words of the topic according to the document collection.</p><p>Once the weights of the words have been calculated and with the aim of increment the weights of several words, IR-n system incorporates a new module that apply a set of heuristics to the representation of the topics in the way of logic forms.</p><p>According to others IR systems, IR-n system uses different techniques of the query expansion. Previous researches <ref type="bibr" target="#b0">[1]</ref> have shown that the approaches get better results where they are based on passage retrieval in opposition to full document retrieval.</p><p>On the other hand, in present conference and for the ad-hoc track, a new technique called variable passages <ref type="bibr" target="#b2">[3]</ref> has been implemented. It applies fusion methods which are used in multilingual tracks to combine results with different size of passages.</p><p>Following section shows in detail the main features of the treatment of topics in the way of logic forms performed by IR-n system. The process that automatically derives the logic form applying a set of inference rules to the analysis of dependencies between the words of the topic is also described.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Logic Form Derivation</head><p>To enhance the performance of our IR-n system we use the logic form of the topics. Each one of the terms of the topic in the logic form can modify its weight term according to the type of assert of the term in the logic form and the relationships between these asserts of the topic in the logic form. The logic form of a topic (or sentence) is calculated through the analysis of dependency relationships between the words of the sentence. MINIPAR <ref type="bibr" target="#b3">[4]</ref> is the toolkit that obtains this analysis of dependency relationships between the words of the sentence. Following subsections describe the process of Logic Form Derivation applying this process to a topic as example.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Analysis of dependency relationships between words</head><p>This task obtains the different relationships between the words of the sentence. These dependency relationships between words are calculated by MINIPAR <ref type="bibr" target="#b3">[4]</ref>. Figure <ref type="figure" target="#fig_0">1</ref> shows the dependence relationships between the words of the topic "The story of Variant Fly and the Emergency Rescue Committee who saved thousands in Marseille".</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Logic Form Inference</head><p>The logic form of the sentence is calculated via this analysis of dependency relationships between the words of the sentence. Our approach employs a set of rules that infer several aspects such as the assert, the type of assert, the identifier of the assert and the relationships between the different asserts in the logic form. This technique improves the Moldovan technique <ref type="bibr" target="#b4">[5]</ref> that constructs the logic form through the syntactic tree obtained from the output of the syntactic parser. Our logic form, as Moldovan logic form, is based in the format of logic form defined by eXtended WordNet <ref type="bibr" target="#b7">[8]</ref> 4 Applying rules to logic form to increment the topic terms weights</p><p>When the type of assert is a preposition (IN) which second argument instantiates an assert which type is noun (NN) or derives in a assert which type is noun, then the weight term associated to this last assert is modified. This rule generally describes those grammatically utterances that have a circumstantial behaviour in the sentence (ej. in Marseille, in concentration camps, in Sweden, of Holocaust experience and so on) and then we consider their words which POS are nouns (type of predicate NN) as very relevant words in the topic. This reason produces that we increment the weight terms of these words (terms) in 15%. Table <ref type="table" target="#tab_0">1</ref> shows the terms weights that IR-n system associates to the topic the topic "The story of Variant Fly and the Emergency Rescue Committee who saved thousands in Marseille". These terms are expressed by their stem.</p><p>According to the rule described in this section, the logic form inferred for this topic (  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Submitted Runs</head><p>This section describes the different submitted runs of our IR-n system. The differences between these five submitted runs are basically based in the treatment of the topics and in the indexation of a combination of different field of the segments in the document collection. In all submitted runs we use the indexing and searching processes developed by our IR-n system using the English as query language. There is not used any kind of thesaurus terms as keywords in the indexing and in the searching processes. Following subsections show the features of these five submitted runs according to the judgment pool priority order.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">UATDASR04FL Run</head><p>In this run IR-n system indexes the automatically created transcript using the best presently available ASR system (ASRTEXT2004A field of the segments in the document collection). The English title and description fields of the topics are used in the construction of the queries. This is the unique submitted run in which we apply the rules based on the processing of queries in the way of logic forms described in previous section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">UATDASR04 Run</head><p>In this run, as previous submitted run, our IR-n system indexes the ASRTEXT2004A field of the segments in the document collection. The English description field of the topics is used in the construction of the queries.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">UATDASR04AUTOA1 Run</head><p>In this run we index the ASRTEXT2004A field and a set of thesaurus keywords that were assigned automatically using a k-Nearest Neighbor (kNN) classifier using only words from the ASRTEXT2004A field of the segment (AUTOKEYWORD2004A1 field of the segments in the document collection). The English description field of the topic is used in the construction of the queries.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4">UATDASR04AUTOA2 Run</head><p>In this run IR-n system indexes the ASRTEXT2004A field and a a set of thesaurus keywords that were assigned using a different kNN classifier that was trained (fairly) on different data (AUTOKEYWORD2004A2 field of the segments in the document collection). The English description field of the topic is used in the construction of the queries.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.5">UATDASR04AUTOS Run</head><p>In this run our IR-n system indexes the ASRTEXT2004A, AUTOKEYWORD2004A1 and AUTOKEYWORD2004A2 fields of the segments in the document collection. The English description field of the topics is used in the construction of the queries.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Results</head><p>Table <ref type="table" target="#tab_2">3</ref> shows the results obtained by our system for each one of the submitted runs. UAT-DASR04AUTOA2 is the submitted run that better scores has obtained in comparison with the rest of our submitted runs. In this run IR-n system indexes the ASRTEXT2004A and AU-TOKEYWORD2004A2 fields of the segments in the document collection.</p><p>run map rprec bpref rr p5 p20 p100 p1000 UATDASR04LF 0,0768 0,1230 0,0949 0,4622 0,2160 0,1740 0,1088 0,0324 UATDASR04 0,0724 0,1246 0,0899 0,4377 0,1840 0,1660 0,1036 0,0313 UATDASR04AUTOA1 0,0727 0,1206 0,1018 0,4509 0,2800 0,1740 0,0916 0,0277 UATDASR04AUTOA2 0,0769 0,1181 0,0980 0,4744 0,2640 0,1920 0,0928 0,0290 UATDASR04AUTOS 0,0739 0,1274 0,1056 0,4354 0,2640 0,1880 0,0920 0,0260 </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Conclusions</head><p>In this new release of the CL-SR track at the CLEF 2005 conference we have participated applying our IR-n system to the English language. Our main aim is to evaluate the goodness of the new Logic Form Module of IR-n system. According to our foresight, the obtained scores applying this module (UATDASR04LF) are upper than the obtained scores without the use of this new module (UATDASR04).</p><p>According to the format of the document collection, each document is considered as a sentence (continuous set of words). Then, this fact produces that IR-n system runs as a document retrieval system and not as a passage retrieval system. This feature resumes that the powerful of the use of the new logic form module must be combined with the passage overlapping technique in document collections where documents are composed by many passages (see our paper in the bilingual IR track at present conference). The combination of these two techniques would obtain better scores.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Analysis of dependency relationships between words of the topic</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>"story:NN(x14) of:IN(x14, x13) varian:NN(x10) NNC(x11, x10, x12) fry:NN(x12) and:CC(x13, x11, x6) emergency:NN(x5) NNC(x6, x5, x7) rescue:NN(x8) NNC(x7, x8, x9) committee:NN(x9) who:NN(x1) save:VB(e1, x1, x2) thousand:NN(x2) in:IN(e1, x3) marseille:NN(x3)") have two asserts which types are IN. The second argument of these asserts is instantiated to x13 and x3 respectively. x13 derives in the asserts x10, x12, x5, x8 and x9 which types are NN, while the type of x3 is directly NN.According to this rule, these fact produces that the terms weight associated to all these asserts increment their value in 15%.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Table 2 shows the weight terms once this rule has been applied. Terms weights assigned by IR-n system</figDesc><table><row><cell cols="2">Term (stem) Weight</cell></row><row><cell>stori</cell><cell>1.84449</cell></row><row><cell>fry</cell><cell>6.19484</cell></row><row><cell>emerg</cell><cell>6.47296</cell></row><row><cell>rescu</cell><cell>6.19484</cell></row><row><cell>committe</cell><cell>4.08194</cell></row><row><cell>save</cell><cell>3.06725</cell></row><row><cell>thousand</cell><cell>2.33944</cell></row><row><cell>marseil</cell><cell>5.13363</cell></row><row><cell>Term (stem)</cell><cell>Weight</cell></row><row><cell>stori</cell><cell>1.84449</cell></row><row><cell>fry</cell><cell>7.124066</cell></row><row><cell>emerg</cell><cell>7.443904</cell></row><row><cell>rescu</cell><cell>7.124066</cell></row><row><cell>committe</cell><cell>4.694231</cell></row><row><cell>save</cell><cell>3.06725</cell></row><row><cell>thousand</cell><cell>2.33944</cell></row><row><cell>marseil</cell><cell>5.9036745</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Terms weights according to logic form rules</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 :</head><label>3</label><figDesc>Evaluation Results</figDesc><table /></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgment</head><p>This research work has been partially funded by the Spanish Government under project CICyT number TIC2000-0664-C02-02 and PROFIT number FIT-340100-2004-14 and by the Valencia Government under project numbers GV04B-276 and GV04B-268.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Combining Query Translation and Document Translation in Cross-Language Retrieval</title>
		<author>
			<persName><forename type="first">Aitao</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fredric</forename><forename type="middle">C</forename><surname>Gey</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">4th Workshop of the Cross-Language Evaluation Forum, CLEF 2003</title>
				<meeting><address><addrLine>Trondheim, Norway</addrLine></address></meeting>
		<imprint>
			<biblScope unit="page" from="108" to="121" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Passage Retrieval Revisited</title>
		<author>
			<persName><forename type="first">Marcin</forename><surname>Kaszkiel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Justin</forename><surname>Zobel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 20th annual International ACM Philadelphia SIGIR</title>
				<meeting>the 20th annual International ACM Philadelphia SIGIR</meeting>
		<imprint>
			<biblScope unit="page" from="178" to="185" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Combining Passages in Monolingual Experiments with IR-n system</title>
		<author>
			<persName><forename type="first">Fernando</forename><surname>Llopis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Elisa</forename><surname>Noguera</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Workshop of Cross-Language Evaluation Forum (CLEF 2005)</title>
				<meeting><address><addrLine>Vienna, Austria</addrLine></address></meeting>
		<imprint/>
	</monogr>
	<note>in this volume</note>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<ptr target="http://www.cs.ualberta.ca/lindek/minipar.htm" />
		<title level="m">MINIPAR parser</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Logic Form Transformation of WordNet and its Applicability to Question-Answering</title>
		<author>
			<persName><forename type="first">Dan</forename><surname>Moldovan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Vasile</forename><surname>Rus</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of 39th Annual Meeting of the Association for Computational Linguistics</title>
				<meeting>39th Annual Meeting of the Association for Computational Linguistics<address><addrLine>Toulouse, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2001-07">July 2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Fusion of Probabilistic Models for Effective Monolingual Retrieval</title>
		<author>
			<persName><forename type="first">Jacques</forename><surname>Savoy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">4th Workshop of the Cross-Language Evaluation Forum, CLEF 2003</title>
				<meeting><address><addrLine>Trondheim, Norway</addrLine></address></meeting>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Applying Logic Forms to Biomedical Q-A</title>
		<author>
			<persName><forename type="first">M</forename><surname>Rafael</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Patricio</forename><surname>Terol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Manuel</forename><surname>Martínez-Barco</surname></persName>
		</author>
		<author>
			<persName><surname>Palomar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Symposium on Innovations in Intelligent Systems and Applications (INISTA 2005)</title>
				<meeting><address><addrLine>Istambul, Turkey</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2004">Juny 2004</date>
			<biblScope unit="page" from="29" to="32" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<ptr target="http://xwn.hlt.utdallas.edu/" />
		<title level="m">eXtended WordNet</title>
				<imprint/>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
