<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Mono-and Bilingual Retrieval Experiments with a Social Science Document Corpus</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">René</forename><surname>Hackl</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Hildesheim</orgName>
								<address>
									<addrLine>Information Science Marienburger Platz 22</addrLine>
									<postCode>D-31141</postCode>
									<settlement>Hildesheim</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Thomas</forename><surname>Mandl</surname></persName>
							<email>mandl@uni-hildesheim.de</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Hildesheim</orgName>
								<address>
									<addrLine>Information Science Marienburger Platz 22</addrLine>
									<postCode>D-31141</postCode>
									<settlement>Hildesheim</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Mono-and Bilingual Retrieval Experiments with a Social Science Document Corpus</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">B6BE3D430E52F679C14F01489C1A6F19</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T00:36+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing</term>
					<term>H.3.3 Information Search and Retrieval</term>
					<term>H.3.4 Systems and Software Measurement, Performance, Experimentation Domain specific, Social Science, Bilingual retrieval, Thesaurus</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper reports on our participation in CLEF 2005's domain-specific retrieval track. The experiments were based on previous experiences with the GIRT document corpus and were run in parallel to the multi-lingual experiments for CLEF 2005. We optimized the parameters of the system with one corpus from 2004 and applied these settings to the domain specific task. In that manner, the robustness of our approach over different document collection was assessed.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In previous CLEF campaigns, we tested an adaptive fusion system based on the MIMOR model <ref type="bibr" target="#b7">(Mandl &amp; Womser-Hacker 2004)</ref> within the domain specific GIRT track <ref type="bibr">(Hackl et al. 2003)</ref>. For CLEF 2005, the parameter optimization was based on a French document collection. The parameter settings were applied to the four language document collection of the multilingual task of CLEF 2005 <ref type="bibr">(Hackl et al. 2005)</ref>. In addition, we applied almost the same settings to the domain specific track in order to test the robustness of our system over different collections.</p><p>Robustness has become an issue in information retrieval research recently. It has been noted often, that the variance between queries is worse than the variance between systems. There are often very difficult queries which few systems solve well and which lead to very bad results for most systems <ref type="bibr">(Harman &amp; Buckley 2004</ref>). Thorough failure analysis can lead to substantial improvement. For example, the absence of named entities are a factor which can make queries more difficult overall <ref type="bibr" target="#b7">(Mandl &amp; Womser-Hacker 2004)</ref>. As a consequence, a new evaluation track for robust retrieval has been established at the Text Retrieval Conference (TREC). This track does not only measure the average precision over all queries but also emphasizes the performance of the systems for difficult queries. To perform well in this track is more important for the systems to retrieve at least a few documents for difficult queries than to improve the performance in average <ref type="bibr" target="#b9">(Voorhees 2005)</ref>. In order to allow a system evaluation based on robustness more queries than for a normal ad-hoc track are necessary. The concept of robustness is extended in TREC 2005. Systems need to perform well over different tracks and tasks <ref type="bibr" target="#b9">(Voorhees 2005)</ref>.</p><p>For multilingual retrieval, robustness would also be an interesting evaluation concept because the performance between queries differs greatly <ref type="bibr" target="#b7">(Mandl &amp; Womser-Hacker 2004)</ref>. Robustness in multilingual retrieval could be interpreted in three ways:</p><p>• Stable performance over all topics instead of high average performance (like at TREC) • Stable performance over different tasks (like at TREC) • Stable performance over different languages (focus of CLEF)</p><p>For the participation in the domain specific track in 2005, we tested the stability of our ad-hoc system for the domain specific track.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Domain Specific Mono-and Cross-lingual Retrieval Experiments</head><p>Our system was optimized with the French collection of CLEF 2004. The optimization procedure is described in detail in <ref type="bibr">Hackl et al. 2005</ref>. The GIRT runs were produced with only slightly different settings.</p><p>Previous experiences with the GIRT corpus showed that blind relevance feedback does not lead to good results <ref type="bibr" target="#b6">(Kluck 2004)</ref>. Our test runs confirmed that fact and blind relevance feedback was not applied for the submitted runs. Instead, term expansion was based the thesaurus available for the GIRT data. This thesaurus was developed by the Social Science Information Centre <ref type="bibr" target="#b6">(Kluck 2004)</ref>. For the query terms, the fields Broader, Narrower and Related term were extracted from the thesaurus and added to the query for the second run. The topic title weights were set to ten, topic description weights to three and the thesaurus terms were weighted with one. This weighting scheme was adopted from the ad-hoc task.</p><p>For the second mono-lingual run UHIGIRT2, we added terms from the multilingual European terminology database Eurodicautom<ref type="foot" target="#foot_0">1</ref> which was also used for the ad-hoc experiments. However, Eurodicautom contributed terms for very few queries. Most often, it returned "out of vocabulary".</p><p>As bilingual GIRT run, we submitted one English-to-German run. The query and the thesaurus terms were translated by ImTranslator<ref type="foot" target="#foot_1">2</ref> . In addition, the document field "english-translation" was indexed. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Conclusion and Outlook</head><p>For next year, we intend to implement for multi-lingual runs for the domain specific task. The thesaurus use led to a drop in performance. For the future, we intend to develop a more sophisticated strategy to apply thesaurus terms.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Results from the CLEF 2005 Workshop. EDA = Euradicautom Although, our system has been tested with Russian data at earlier CLEF campaigns and at the ad-hoc task this year, the Russian social science RSSC collection could not be used because it was provided later than the rest of the data.</figDesc><table><row><cell>RunID</cell><cell>Languages</cell><cell>Run Type</cell><cell>Fields</cell><cell cols="2">Retrieved Relevant</cell><cell>Avg.</cell></row><row><cell></cell><cell></cell><cell></cell><cell>used</cell><cell></cell><cell>docs.</cell><cell>Prec.</cell></row><row><cell cols="3">UHIGIRT1 Monolingual German Lucene stemmer</cell><cell>TD</cell><cell>1400</cell><cell>2682</cell><cell>0.220</cell></row><row><cell cols="3">UHIGIRT2 Monolingual German Lucene stemmer,</cell><cell>TD</cell><cell>1335</cell><cell>2682</cell><cell>0.193</cell></row><row><cell></cell><cell></cell><cell>IZ thesaurus, EDA</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>UHIGIRT3</cell><cell>English-German</cell><cell>Lucene stemmer,</cell><cell>TD</cell><cell>1159</cell><cell>2682</cell><cell>0.178</cell></row><row><cell></cell><cell></cell><cell>IZ thesaurus, EDA</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell></cell><cell>ImTranslator</cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://europa.eu.int/eurodicautom/Controller</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">http://freetranslation.paralink.com/</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">René</forename><forename type="middle">;</forename><surname>Hackl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ralph</forename><surname>Kölle</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Domain Specific Retrieval Experiments at the University of Hildesheim with the MIMOR System</title>
		<author>
			<persName><forename type="first">Thomas</forename><forename type="middle">;</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christa</forename><surname>Womser-Hacker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Cross-Language Information Retrieval: Third Workshop of the Cross-Language Evaluation Forum, CLEF 2002</title>
				<editor>et al.</editor>
		<meeting><address><addrLine>Rome, Italy; Berlin</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2002">2003. September 2002</date>
			<biblScope unit="volume">2785</biblScope>
			<biblScope unit="page" from="343" to="348" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">René</forename><surname>Hackl</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Ad-hoc Mono-and Multilingual Retrieval Experiments at the University of Hildesheim</title>
		<author>
			<persName><forename type="first">Thomas</forename><forename type="middle">;</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christa</forename><surname>Womser-Hacker</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
	<note>In this volume</note>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">Donna</forename><surname>Harman</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">The NRRC reliable information access (RIA) workshop</title>
		<author>
			<persName><forename type="first">Chris</forename><surname>Buckley</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 27 th annual international conference on Research and development in information retrieval (SIGIR)</title>
				<meeting>the 27 th annual international conference on Research and development in information retrieval (SIGIR)</meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="528" to="529" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">The GIRT Data in the Evaluation of CLIR Systems -from 1997 until</title>
		<author>
			<persName><forename type="first">Michael</forename><surname>Kluck</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Comparative Evaluation of Multilingual Information Access Systems: 4 th Workshop of the Cross-Language Evaluation Forum, CLEF 2003</title>
				<meeting><address><addrLine>Trondheim, Norway</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2003-08-21">2004. 2003. August 21-22, 2003</date>
			<biblScope unit="page" from="376" to="390" />
		</imprint>
	</monogr>
	<note>Revised Selected Papers</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">A Framework for long-term Learning of Topical User Preferences in Information Retrieval</title>
		<author>
			<persName><forename type="first">Thomas</forename><forename type="middle">;</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christa</forename><surname>Womser-Hacker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">New Library World</title>
		<imprint>
			<biblScope unit="volume">105</biblScope>
			<biblScope unit="issue">5/6</biblScope>
			<biblScope unit="page" from="184" to="195" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">The Effect of Named Entities on Effectiveness in Cross-Language Information Retrieval Evaluation</title>
		<author>
			<persName><forename type="first">Thomas</forename><forename type="middle">;</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christa</forename><surname>Womser-Hacker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc ACM SAC Symposium on Applied Computing (SAC). Information Access and Retrieval (IAR) Track</title>
				<meeting>ACM SAC Symposium on Applied Computing (SAC). Information Access and Retrieval (IAR) Track<address><addrLine>Santa Fe, New Mexico, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2005-03-13">2005. March 13.-17</date>
			<biblScope unit="page" from="1059" to="1064" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">The TREC robust retrieval track</title>
		<author>
			<persName><forename type="first">Ellen</forename><surname>Voorhees</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM SIGIR Forum</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="11" to="20" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
