<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Ad-hoc Mono-and Bilingual Retrieval Experiments at the University of Hildesheim</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">René</forename><surname>Hackl</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Hildesheim</orgName>
								<address>
									<addrLine>Information Science Marienburger Platz 22</addrLine>
									<postCode>D-31141</postCode>
									<settlement>Hildesheim</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Thomas</forename><surname>Mandl</surname></persName>
							<email>mandl@uni-hildesheim.de</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Hildesheim</orgName>
								<address>
									<addrLine>Information Science Marienburger Platz 22</addrLine>
									<postCode>D-31141</postCode>
									<settlement>Hildesheim</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Christa</forename><surname>Womser-Hacker</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Hildesheim</orgName>
								<address>
									<addrLine>Information Science Marienburger Platz 22</addrLine>
									<postCode>D-31141</postCode>
									<settlement>Hildesheim</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Ad-hoc Mono-and Bilingual Retrieval Experiments at the University of Hildesheim</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">FF0D1E3EF6E3DB4FE8E9C550AE3E21EC</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T00:37+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing</term>
					<term>H.3.3 Information Search and Retrieval</term>
					<term>H.3.4 Systems and Software Measurement, Performance, Experimentation Multilingual Retrieval, Fusion</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper reports on our participation in CLEF 2005's ad-hoc multi-lingual retrieval track. The ad-hoc task introduced Bulgarian and Hungarian as new languages. Our experiments focus on the two new languages. Naturally, no relevance assessments are available for these collections yet. Optimization was mainly based on French data from last year. Based on experience from last year, one of our main objectives was to improve and refine the n-gram-based indexing and retrieval algorithms within our system.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In the CLEF 2004 campaign, we tested an adaptive fusion system based on the MIMOR model <ref type="bibr" target="#b7">(Womser-Hacker 1997)</ref> in the multi-lingual ad-hoc track <ref type="bibr">(Hackl et al. 2005)</ref>. In 2005, we applied our system based on Lucene<ref type="foot" target="#foot_0">1</ref> to the new multi-lingual collection: We focused on Bulgarian, French and Hungarian.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">CLEF Retrieval Experiments with MIMOR</head><p>The optimization of the retrieval system parameters was based on the French corpus of last year. The tools employed this year include Lucene and Java TM -based snowball<ref type="foot" target="#foot_1">2</ref> analyzers as well as the Egothor<ref type="foot" target="#foot_2">3</ref> stemmer. In previous CLEF results it has been pointed out, that a tri-gram index does not produce good results for French <ref type="bibr">(McNamee &amp; Mayfield 2004)</ref>. A 4-gram or 5-gram indexing approach seems more promising. Consequently, we conducted some test runs experimenting with the following parameters:</p><p>• Document fields: only officially permitted document fields were indexed. These were indexed as they were as well as in an extra Field FULLTEXT enclosing all the contents from the other fields. • Origin of query terms: query terms could come from either title or description fields or both.</p><p>• Phrase queries of ngram terms: length of phrases, Boolean operators for concatenating terms The search field FULLTEXT provided best performance overall. Searching on the other fields by themselves or in combination and with weighting did not yield as good results as the simple full-text approach. The document field employed for BRF mattered much more. Here, best results were obtained with the TEXT field. For all runs, we used the stopword lists made available by the University of Neuchatel and made a few minor changes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Query Construction for N-Gram Indexing</head><p>For phrase queries, the approach that worked best was one that constructed queries as follows: Given a query with 3 grams NG1, NG2, NG3 build the query so that q = "NG1" OR "NG2" OR "NG3" OR "NG1 NG2" OR "NG2 NG3" OR "NG1 NG2 NG3". Of course, such a query is likely to retrieve a lot of documents. Effectively, in almost all cases the system retrieved between 80% and 95% of all documents in the database. As Table <ref type="table" target="#tab_1">1</ref> shows, these results can greatly be improved by applying a more sophisticated configuration on top of the depicted query construction. One means is to allow phrases to be non-exact match phrases, i.e. allow WITHIN or NEAR-like operations, denoted by slop in the table. Here, the best setting was five, values started getting visibly worse from 10 up.  The single most important issue though are short terms. Phrase queries with only one term are of course just plain term queries. If, however, such a term query contains a term that has a smaller word length than the gram size, and taking into account that stopwords are eliminated, there is strong evidence that that term is highly important. In fact, most of these terms were acronyms or foreign words, e.g. in 2004 topics "g7", "sida" (French acronym for AIDS), "mir" (Russian space station), "lady" (Diana). Blind relevance feedback had little impact on n-gram retrieval performance. For some queries, good short terms like those mentioned above were added to the query. However, terms selected by the algorithm received no special weight, i.e. they received a weight of one. Higher weights worsened the retrieval results. Furthermore, considering more than the top five documents for blind relevance feedback did not improve performance. Table <ref type="table" target="#tab_6">4</ref> summarizes the results the best configurations achieved. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Boosting Document Fields for Stemming Approaches</head><p>Subsequently, stemming replaced the n-gram indexing procedure in another test series. Three different stemmers were used: Egothor, Lucene, and Snowball. Table <ref type="table" target="#tab_7">5</ref> shows the results of the base runs. Queries that contained terms from both title and description fields from the topic files performed better than those that were based on only one source. The weighting of these terms, however, was a major impact factor. Several experiments with different boost values and blind relevance feedback parameters were carried out for each stemmer. The following tables 6, 7 and 8 show the results for the three stemmers. Yet again, searching on structured document parts instead of the full text was worse. More importantly, even the baseline run with an Egothor-based stemmer was better than any n-gram run. Table <ref type="table" target="#tab_12">9</ref> summarizes the settings for the best runs. Boost values were applied to title, description and terms from blind relevance feedback in this order. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Results of Submitted Runs</head><p>The parameters settings optimized with the French collection of CLEF 2004 were applied to the multi-lingual collection in 2005. We submitted monolingual runs for Bulgarian, French, Hungarian and domain specific (GIRT), bilingual runs for French and GIRT. For Bulgarian and Hungarian we employed the setting outlined above for two runs each -4-gram and 5-gram: searching on full text representations, boosting single terms which were shorter than the grams length, using BRF (5 docs, 30 terms), and a slop of 5.</p><p>For French, we used the Lucene-stemmer and the settings derived above. Additionally, we carried out a 5gram based run as a comparison to Bulgarian and Hungarian. Both of these monolingual were then reshaped by adding terms tentatively derived from the multilingual European terminology database Eurodicautom<ref type="foot" target="#foot_3">4</ref> . We extracted additional terms from the top three hits from the database, if they were available. At least one of the query terms had to be present in the resulting term list, no special subject domain was chosen. These terms were assigned a weight of one.</p><p>In the ad-hoc task, we submitted two English-to-French runs, one of which was enhanced by additional Eurodicautom terms, and one Russian-to-French run, all translated by ImTranslator<ref type="foot" target="#foot_4">5</ref> . The settings were the same as for the monolingual runs. Considering the lack of experience with the new languages, the results are satisfying. However, more work with n-gram as well as stemming approaches are necessary for these languages.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusion</head><p>For the participation in CLEF 2005, we could stabilize the n-gram indexing and search. The performance remains worse than for stemming based runs. We compared three stemmers with different parameter settings.</p><p>For future participations in ad-hoc tasks, we intend to apply the RECOIN (REtrieval COmponent INtegrator)<ref type="foot" target="#foot_5">6</ref> framework <ref type="bibr" target="#b6">(Scheufen 2005)</ref>. RECOIN is an object oriented JAVA framework for information retrieval experiments. It allows the integration of heterogeneous components into an experimentation system where many experiments may be carried out.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>•</head><label></label><figDesc>Rigidity of phrase queries: non-exact phrase queries • Blind relevance feedback (BRF): relevant documents/expansion term configuration (with Robertson Selection Value as term weighting scheme) and origin of expansion terms • Weighting for all parameters mentioned above</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 .</head><label>1</label><figDesc>Effect of NEAR n term operation for boosting singles</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>4-gram 5-gram NEAR n terms (slop) Recall, max=915 Avg. prec.</head><label></label><figDesc></figDesc><table><row><cell></cell><cell></cell><cell></cell><cell>Recall,</cell><cell>Avg.</cell></row><row><cell></cell><cell></cell><cell></cell><cell>max=915</cell><cell>prec.</cell></row><row><cell>1</cell><cell>688</cell><cell>0.277</cell><cell>698</cell><cell>0.272</cell></row><row><cell>2</cell><cell>687</cell><cell>0.278</cell><cell>698</cell><cell>0.272</cell></row><row><cell>3</cell><cell>684</cell><cell>0.277</cell><cell>698</cell><cell>0.276</cell></row><row><cell>4</cell><cell>687</cell><cell>0.277</cell><cell>701</cell><cell>0.277</cell></row><row><cell>5</cell><cell>691</cell><cell>0.272</cell><cell>703</cell><cell>0.276</cell></row><row><cell>6</cell><cell>691</cell><cell>0.271</cell><cell>702</cell><cell>0.274</cell></row><row><cell>7</cell><cell>689</cell><cell>0.271</cell><cell>699</cell><cell>0.273</cell></row><row><cell>8</cell><cell>689</cell><cell>0.274</cell><cell>697</cell><cell>0.272</cell></row><row><cell>9</cell><cell>688</cell><cell>0.274</cell><cell>696</cell><cell>0.270</cell></row><row><cell>10</cell><cell>686</cell><cell>0.278</cell><cell>694</cell><cell>0.270</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 2 .</head><label>2</label><figDesc>Result overview boosting singles, slop 5</figDesc><table><row><cell>BRF</cell><cell></cell><cell cols="2">4-gram</cell><cell cols="2">5-gram</cell></row><row><cell cols="2">Documents Terms</cell><cell>Recall,</cell><cell>Avg.</cell><cell>Recall,</cell><cell>Avg.</cell></row><row><cell></cell><cell></cell><cell>max=915</cell><cell>prec.</cell><cell>max=915</cell><cell>prec.</cell></row><row><cell>5</cell><cell>10</cell><cell>685</cell><cell>0.264</cell><cell>706</cell><cell>0.275</cell></row><row><cell>5</cell><cell>20</cell><cell>689</cell><cell>0.269</cell><cell>707</cell><cell>0.280</cell></row><row><cell>5</cell><cell>30</cell><cell>694</cell><cell>0.270</cell><cell>711</cell><cell>0.282</cell></row><row><cell>5</cell><cell>40</cell><cell>691</cell><cell>0.264</cell><cell>710</cell><cell>0.283</cell></row><row><cell>10</cell><cell>10</cell><cell>645</cell><cell>0.218</cell><cell>670</cell><cell>0.222</cell></row><row><cell>10</cell><cell>20</cell><cell>649</cell><cell>0.221</cell><cell>675</cell><cell>0.230</cell></row><row><cell>10</cell><cell>30</cell><cell>646</cell><cell>0.227</cell><cell>677</cell><cell>0.234</cell></row><row><cell>10</cell><cell>40</cell><cell>641</cell><cell>0.233</cell><cell>676</cell><cell>0.232</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 3</head><label>3</label><figDesc></figDesc><table /><note>gives optimized boost values for the n-gram retrieval experiments. The ratio of these figures has been determined experimentally. It can be seen that title terms are more important than description terms. Moreover, longer phrases are better than short ones, limited by the fact that starting with phrases of length 4, performance began to drop.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 3 .</head><label>3</label><figDesc>Boost values for n-gram-based retrieval experiments</figDesc><table><row><cell></cell><cell cols="2">Boosts according to origin</cell></row><row><cell cols="2"># of terms in phrase Origin: title</cell><cell>Origin: description</cell></row><row><cell>1</cell><cell>3, if short: 10</cell><cell>1, if short: 8</cell></row><row><cell>2</cell><cell>4</cell><cell>2</cell></row><row><cell>3</cell><cell>5</cell><cell>2</cell></row><row><cell>4</cell><cell>5</cell><cell>2</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 4 .</head><label>4</label><figDesc>Recall and average precision figures for ngram-based retrieval experiments.The table lists the best performing runs for all instances and combinations.</figDesc><table><row><cell>Indexing-</cell><cell>Optimization</cell><cell>Blind relevance</cell><cell>Recall,</cell><cell>Avg.</cell></row><row><cell>Method</cell><cell></cell><cell>feedback</cell><cell>max=915</cell><cell>prec.</cell></row><row><cell>4-gram</cell><cell>base run</cell><cell>none</cell><cell>507</cell><cell>0.126</cell></row><row><cell>4-gram</cell><cell>with single term phrases</cell><cell>none</cell><cell>551</cell><cell>0.178</cell></row><row><cell>4-gram</cell><cell>boosting single term phrases</cell><cell>none</cell><cell>684</cell><cell>0.26</cell></row><row><cell>4-gram</cell><cell>boosting singles, slop 5</cell><cell>none</cell><cell>691</cell><cell>0.272</cell></row><row><cell>4-gram</cell><cell>boosting, slop 5</cell><cell>5 docs, 30 terms</cell><cell>694</cell><cell>0.27</cell></row><row><cell>5-gram</cell><cell>boosting, slop 5</cell><cell>5 docs, 30 terms</cell><cell>711</cell><cell>0.282</cell></row><row><cell>5-gram</cell><cell>boosting</cell><cell>5 docs, 30 terms</cell><cell>707</cell><cell>0.275</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_7"><head>Table 5 .</head><label>5</label><figDesc>Base runs with stemming algorithms</figDesc><table><row><cell></cell><cell cols="2">Recall, max=915 Avg. prec.</cell></row><row><cell>Lucene Stemmer, base run</cell><cell>817</cell><cell>0.356</cell></row><row><cell>Snowball Stemmer, base run</cell><cell>821</cell><cell>0.344</cell></row><row><cell>Egothor Stemmer, base run</cell><cell>817</cell><cell>0.346</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_8"><head>Table 6 .</head><label>6</label><figDesc>Results with Lucene stemmer</figDesc><table><row><cell></cell><cell>Boost Values</cell><cell></cell><cell>BRF</cell><cell></cell><cell>Results</cell><cell></cell></row><row><cell cols="7">Title Description BRF Docs. Terms Recall, max=915 Avg. prec.</cell></row><row><cell>9</cell><cell>3</cell><cell>1</cell><cell>5</cell><cell>10</cell><cell>856</cell><cell>0.379</cell></row><row><cell>9</cell><cell>3</cell><cell>1</cell><cell>5</cell><cell>20</cell><cell>857</cell><cell>0.388</cell></row><row><cell>9</cell><cell>3</cell><cell>1</cell><cell>5</cell><cell>30</cell><cell>863</cell><cell>0.405</cell></row><row><cell>9</cell><cell>3</cell><cell>1</cell><cell>5</cell><cell>40</cell><cell>857</cell><cell>0.402</cell></row><row><cell>9</cell><cell>3</cell><cell>2</cell><cell>5</cell><cell>10</cell><cell>855</cell><cell>0.379</cell></row><row><cell>9</cell><cell>3</cell><cell>2</cell><cell>5</cell><cell>20</cell><cell>854</cell><cell>0.390</cell></row><row><cell>9</cell><cell>3</cell><cell>2</cell><cell>5</cell><cell>30</cell><cell>857</cell><cell>0.403</cell></row><row><cell>9</cell><cell>3</cell><cell>2</cell><cell>5</cell><cell>40</cell><cell>855</cell><cell>0.392</cell></row><row><cell>9</cell><cell>3</cell><cell>3</cell><cell>5</cell><cell>10</cell><cell>855</cell><cell>0.379</cell></row><row><cell>9</cell><cell>3</cell><cell>3</cell><cell>5</cell><cell>20</cell><cell>857</cell><cell>0.385</cell></row><row><cell>9</cell><cell>3</cell><cell>3</cell><cell>5</cell><cell>30</cell><cell>861</cell><cell>0.394</cell></row><row><cell>9</cell><cell>3</cell><cell>3</cell><cell>5</cell><cell>40</cell><cell>858</cell><cell>0.388</cell></row><row><cell></cell><cell cols="2">base run</cell><cell></cell><cell></cell><cell>817</cell><cell>0.356</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_9"><head>Table 7 .</head><label>7</label><figDesc>Results with Snowball stemmer</figDesc><table><row><cell></cell><cell>Boost Values</cell><cell></cell><cell>BRF</cell><cell></cell><cell>Results</cell><cell></cell></row><row><cell cols="7">Title Description BRF Docs. Terms Recall, max=915 Avg. prec.</cell></row><row><cell>9</cell><cell>3</cell><cell>1</cell><cell>5</cell><cell>10</cell><cell>850</cell><cell>0.362</cell></row><row><cell>9</cell><cell>3</cell><cell>1</cell><cell>5</cell><cell>20</cell><cell>855</cell><cell>0.387</cell></row><row><cell>9</cell><cell>3</cell><cell>1</cell><cell>5</cell><cell>30</cell><cell>856</cell><cell>0.400</cell></row><row><cell>9</cell><cell>3</cell><cell>1</cell><cell>5</cell><cell>40</cell><cell>854</cell><cell>0.396</cell></row><row><cell>9</cell><cell>3</cell><cell>2</cell><cell>5</cell><cell>10</cell><cell>851</cell><cell>0.359</cell></row><row><cell>9</cell><cell>3</cell><cell>2</cell><cell>5</cell><cell>20</cell><cell>853</cell><cell>0.376</cell></row><row><cell>9</cell><cell>3</cell><cell>2</cell><cell>5</cell><cell>30</cell><cell>855</cell><cell>0.391</cell></row><row><cell>9</cell><cell>3</cell><cell>2</cell><cell>5</cell><cell>40</cell><cell>854</cell><cell>0.385</cell></row><row><cell>9</cell><cell>3</cell><cell>3</cell><cell>5</cell><cell>10</cell><cell>851</cell><cell>0.362</cell></row><row><cell>9</cell><cell>3</cell><cell>3</cell><cell>5</cell><cell>20</cell><cell>852</cell><cell>0.377</cell></row><row><cell>9</cell><cell>3</cell><cell>3</cell><cell>5</cell><cell>30</cell><cell>856</cell><cell>0.385</cell></row><row><cell>9</cell><cell>3</cell><cell>3</cell><cell>5</cell><cell>40</cell><cell>853</cell><cell>0.382</cell></row><row><cell></cell><cell cols="2">base run</cell><cell></cell><cell></cell><cell>821</cell><cell>0.344</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_10"><head>Table 8 .</head><label>8</label><figDesc>Results with Egothor stemmer</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_11"><head>Boost Values BRF Results Title Description BRF Docs. Terms Recall, max=915 Avg. prec.</head><label></label><figDesc></figDesc><table><row><cell>9</cell><cell>3</cell><cell>1</cell><cell>5</cell><cell>10</cell><cell>849</cell><cell>0.359</cell></row><row><cell>9</cell><cell>3</cell><cell>1</cell><cell>5</cell><cell>20</cell><cell>850</cell><cell>0.376</cell></row><row><cell>9</cell><cell>3</cell><cell>1</cell><cell>5</cell><cell>30</cell><cell>852</cell><cell>0.389</cell></row><row><cell>9</cell><cell>3</cell><cell>1</cell><cell>5</cell><cell>40</cell><cell>848</cell><cell>0.388</cell></row><row><cell>9</cell><cell>3</cell><cell>2</cell><cell>5</cell><cell>10</cell><cell>852</cell><cell>0.354</cell></row><row><cell>9</cell><cell>3</cell><cell>2</cell><cell>5</cell><cell>20</cell><cell>850</cell><cell>0.385</cell></row><row><cell>9</cell><cell>3</cell><cell>2</cell><cell>5</cell><cell>30</cell><cell>851</cell><cell>0.390</cell></row><row><cell>9</cell><cell>3</cell><cell>2</cell><cell>5</cell><cell>40</cell><cell>837</cell><cell>0.389</cell></row><row><cell>9</cell><cell>3</cell><cell>3</cell><cell>5</cell><cell>10</cell><cell>855</cell><cell>0.351</cell></row><row><cell>9</cell><cell>3</cell><cell>3</cell><cell>5</cell><cell>20</cell><cell>849</cell><cell>0.382</cell></row><row><cell>9</cell><cell>3</cell><cell>3</cell><cell>5</cell><cell>30</cell><cell>843</cell><cell>0.389</cell></row><row><cell>9</cell><cell>3</cell><cell>3</cell><cell>5</cell><cell>40</cell><cell>831</cell><cell>0.386</cell></row><row><cell></cell><cell></cell><cell>base run</cell><cell></cell><cell></cell><cell>817</cell><cell>0.346</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_12"><head>Table 9 .</head><label>9</label><figDesc>Best runs of stemmer-based retrieval experiments</figDesc><table><row><cell>Stemmer</cell><cell>Run Type</cell><cell cols="2">Recall, max = 915 Avg. Prec.</cell></row><row><cell>Egothor</cell><cell>brf 5 10, boost 9 3 3</cell><cell>855</cell><cell>0.351</cell></row><row><cell>Egothor</cell><cell>brf 5 60, boost 9 3 1</cell><cell>843</cell><cell>0.394</cell></row><row><cell>Lucene</cell><cell>brf 5 30, boost 9 3 1</cell><cell>863</cell><cell>0.405</cell></row><row><cell cols="2">Snowball brf 5 30, boost 9 3 1</cell><cell>856</cell><cell>0.400</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_13"><head>Table 9 .</head><label>9</label><figDesc>Results from the CLEF 2005 Workshop. EDA = Euradicautom</figDesc><table><row><cell></cell><cell>RunID</cell><cell>Languages</cell><cell>Run Type</cell><cell cols="2">retrieved Relevant</cell><cell>Avg.</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>docs.</cell><cell>Prec.</cell></row><row><cell></cell><cell>UHIBG1</cell><cell>Bulgarian</cell><cell>5-gram</cell><cell>587</cell><cell>778</cell><cell>0.189</cell></row><row><cell></cell><cell>UHIBG2</cell><cell>Bulgarian</cell><cell>4-gram</cell><cell>597</cell><cell>778</cell><cell>0.195</cell></row><row><cell>monolingual</cell><cell>UHIHU1 UHIHU2 UHIFR1 UHIFR2</cell><cell>Hungarian Hungarian French French</cell><cell>5-gram 4-gram Lucene stemmer Lucene stemmer, EDA</cell><cell>733 776 2346 2364</cell><cell>939 939 2537 2537</cell><cell>0.310 0.326 0.385 0.382</cell></row><row><cell></cell><cell>UHIFR3</cell><cell>French</cell><cell>5-gram</cell><cell>1816</cell><cell>2537</cell><cell>0.340</cell></row><row><cell></cell><cell>UHIFR4</cell><cell>French</cell><cell>5-gram, EDA</cell><cell>1851</cell><cell>2537</cell><cell>0.274</cell></row><row><cell></cell><cell cols="3">UHIENFR1 English -&gt; French Lucene stemmer,</cell><cell>2269</cell><cell>2537</cell><cell>0.337</cell></row><row><cell>bi-ling-ual</cell><cell cols="3">ImTranslator UHIENFR2 English -&gt; French Lucene stemmer, EDA ImTranslator UHIRUFR1 Russsian -&gt; French Lucene stemmer,</cell><cell>2307 1974</cell><cell>2537 2537</cell><cell>0.347 0.269</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">Lucene: http://lucene.apache.org</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">Snowball: http://jakarta.apache.org/lucene/docs/lucene-sandbox/snowball/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">Egothor: http://www.egothor.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">http://europa.eu.int/eurodicautom/Controller</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">http://freetranslation.paralink.com/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">http://recoin.sourceforge.net</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>We would like to thank Nina Kummer and Sarah Risse for including the Egothor stemmer into our system. We also acknowledge the work of Viola Barth and Joachim Pfister who ported the trec_eval tool to Java.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Combination Approaches for Multilingual Text Retrieval, Information Retrieval</title>
		<author>
			<persName><forename type="first">Martin</forename><surname>Braschler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Information Retrieval</title>
				<imprint>
			<publisher>Kluwer</publisher>
			<date type="published" when="2004">2004</date>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="183" to="204" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">An Information-Theoretic Approach to Automatic Query Expansion</title>
		<author>
			<persName><forename type="first">C</forename><surname>Carpineto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>De Mori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Romano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Bigi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Information Systems</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="27" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">René</forename><surname>Hackl</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Mono-and Cross-lingual Retrieval Experiments at the University of Hildesheim</title>
		<author>
			<persName><forename type="first">Thomas</forename><forename type="middle">;</forename><surname>Mandl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christa</forename><surname>Womser-Hacker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Multilingual Information Access for Text, Speech and Images: Results of the Fifth CLEF Evaluation Campaign</title>
				<editor>et al</editor>
		<meeting><address><addrLine>Berlin</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="volume">3491</biblScope>
			<biblScope unit="page" from="165" to="169" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">Paul</forename><surname>Mcnamee</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Character N-Gram Tokenization for European Language Text Retrieval</title>
		<author>
			<persName><forename type="first">James</forename><surname>Mayfield</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information Retrieval</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="issue">1/2</biblScope>
			<biblScope unit="page" from="73" to="98" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Das RECOIN Framework für Information Retrieval Experimente</title>
		<author>
			<persName><forename type="first">Jan-Hendrik</forename><surname>Scheufen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Effektive Information Retrieval Verfahren in der Praxis: Proceedings Vierter Hildesheimer Information Retrieval und Evaluierungsworkshop (HIER 2005</title>
				<editor>
			<persName><forename type="first">Thomas</forename><forename type="middle">;</forename><surname>Mandl</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Christa</forename><surname>Womser-Hacker</surname></persName>
		</editor>
		<meeting><address><addrLine>Hildesheim; Konstanz</addrLine></address></meeting>
		<imprint>
			<publisher>Universitätsverlag</publisher>
			<date type="published" when="2005">2005. 20. 2005</date>
		</imprint>
	</monogr>
	<note>Schriften zur Informationswissenschaft. to appear</note>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><surname>Womser-Hacker</surname></persName>
		</author>
		<title level="m">Das MIMOR-Modell. Mehrfachindexierung zur dynamischen Methoden-Objekt-Relationierung im Information Retrieval</title>
				<imprint>
			<date type="published" when="1997">1997</date>
		</imprint>
		<respStmt>
			<orgName>Habilitationsschrift ; Universität Regensburg, Informationswissenschaft</orgName>
		</respStmt>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
