<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Dublin City University at CLEF 2005: Multilingual Merging Experiments</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Adenike</forename><forename type="middle">M</forename><surname>Lam-Adesina</surname></persName>
							<email>adenike@computing.dcu.ie</email>
							<affiliation key="aff0">
								<orgName type="department">School of Computing</orgName>
								<orgName type="institution">Dublin City University</orgName>
								<address>
									<settlement>Dublin 9</settlement>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Gareth</forename><forename type="middle">J F</forename><surname>Jones</surname></persName>
							<email>gjones@computing.dcu.ie</email>
							<affiliation key="aff0">
								<orgName type="department">School of Computing</orgName>
								<orgName type="institution">Dublin City University</orgName>
								<address>
									<settlement>Dublin 9</settlement>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Dublin City University at CLEF 2005: Multilingual Merging Experiments</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">9B810E33A606BEEE3EF0F07E1B33184C</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T00:38+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>H.3 Information Storage and Retrieval]: H.3.3 Information Search and Retrieval</term>
					<term>H3.4 Systems and Software -Distributed systems</term>
					<term>H.3.7 Digital Libraries Measurement, Performance, Experimentation Multilingual information retrieval, Retrieved list merging</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This year the Dublin City University group participated in the CLEF 2005 Multilingual merging task. We tested different a range of standard merging techniques for merging the provided ranked result lists and show that the success of these techniques can sometimes be dependent on the retrieval system used.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Multilingual information retrieval (MIR) refers to a process of retrieving relevant documents from collections in different languages in response to a user request in a single language. Standard approaches to MIR involve either translating the topics into the document languages or the document collections into the expected topic language. In CLEF 2003 we showed that translating the document collections into the query language using a standard machine translation system and then merging them to form a single collection, can result in better retrieval performance than translating the topics <ref type="bibr" target="#b0">[1]</ref>. However, this method is not always practical, particularly if the collection is very large or the translation resources are limited. For the second method whereby the topics are translated, the topics are used to retrieve ranked lists of potentially documents from the separate collections. These result lists then need to be merged together to form a single ranked list for the system output. The different statistics of the individual collections and the varied topic translations mean that the scores of documents in the separate lists will generally be incompatible, and thus that merging is a non-trivial problem. The CLEF 2005 Multilingual merging task aims to encourage researchers to focus directly on the merging problem, since merging strategies explored previously for multilingual retrieval tasks at CLEF and elsewhere have generally produced disappointing results. Previously work on multilingual merging has been combined with the document retrieval stage, the idea of the CLEF merging task is to explore the merging of provided precomputed ranked lists to enable direct comparison of the behaviour of merging strategies between different retrieval systems.</p><p>Many different techniques for merging separate result lists to form a single list have been proffered and tested in recent years. All of the techniques suggest that making an assumption that the distribution of relevant documents in the results sets of retrieval from individual collections is similar is not true <ref type="bibr" target="#b1">[2]</ref>. Hence, straight merging of relevant documents from the sources will result in poor combination. However, none of the proposed more complex merging techniques have really been demonstrated to be consistently effective.</p><p>For our participation in the merging track at CLEF 2005 we applied a range of standard merging strategies to the two provided sets of ranked lists. Our aim was to compare the behaviour of these methods for the two sets of ranked documents in order to learn something about concepts that might be consistently useful or poor when merging ranked lists.</p><p>This paper is organized as follows: Section 2 overviews the merging techniques explored in this paper, Section 3 gives our experimental results, and Section 4 draws conclusions and considers strategies for further experimentation.</p><p>The aim of a merging strategy is typically to include as many relevant documents at the highest ranks in the merged list. This section overviews the merging strategies used in our experiments. The basic idea is to modify the scored weight of each document to take account of some aspect of the maximum and minimum values of the matching scores or the distribution of scores in the lists to improve the compatibility of scores to form a more effective ranked list. The schemes used in our experiments were as follows: (5)</p><formula xml:id="formula_0">wgt doc p _ =<label>(1</label></formula><formula xml:id="formula_1">rank wt wt wt wgt doc * min_ max_ min_ _ − − = b (<label>6</label></formula><formula xml:id="formula_2">) ) _ min_ _ ( ) _ _ _ ( 1 wt gstd wgt g wt gmean wt gstd wt gmean wgt doc m − + − = (7) m (8) rank m * ) 1 ( 2 =</formula><p>where p, d, r, q, b, m1 and m2 are the new document weight for all document in all collections and corresponding results are labelled * where * can be p, d, r, q, b, m1 and m2 depending on the merging scheme used doc_wgt = the initial document weight gmax_wt = the global maximum weight, i.e the highest document weight from all collections for a given query gmin_wt = the global minimum weight, i.e the lowest document weight from all collections for a given query gmean_wt = the global median weight, i.e the mean document weight from all collections for a given query Method p is used as a baseline using the raw document scores from the retrieved lists without modification. A useful merging scheme should be expected to improve on the performance of the p scheme. The rank was adjusted using the 20 training topics provided for the merging task.</p><p>Results for our experiments using these merging schemes are shown in Tables <ref type="table" target="#tab_1">1 and 2</ref>  <ref type="table">2</ref>: Merging results using the provided Prosit ranked lists from the University of Neuchatel.</p><p>Tables <ref type="table" target="#tab_1">1 and 2</ref> show merging results using runs provided by Hummingbird and the University of Neuchatel respectively. Results are shown for precision at cutoff of 10 and 30 documents, Mean Average Precision (MAP) and the total number of relevant documents retrieved. The raw score merging scheme p is taken as a baseline and changes for each scheme are shown for each data set with respect to the reported metrics.</p><p>The most obvious results are that the more complex merging schemes are shown in Table <ref type="table">2</ref> to generally improve performance by a small amount for the Prosit data, but in Table <ref type="table" target="#tab_1">1</ref> in all cases reduce performance for the Hummingbird data with respect to both the precision measures and the number of relevant retrieved. This appears to offer an answer to one of the questions associated with the CLEF merging task, namely whether the same merging techniques will always be found to be effective for different sets of ranked lists for a common merging task generated using alternative information retrieval systems. The reasons for this difference in behaviour need to be investigated. This analysis will hopefully provide insights into the selection of appropriate merging strategies or the development of merging strategies which will operate more consistently when merging different sets of ranked lists. There are some other observations of consistent behaviour which can be made be. It can be seen that there is no consistent relationship between the variation in precision measures and the number of relevant documents retrieved for the different merging schemes. Schemes with better precision can be accompanied by lower relevant retrieved and vice versa. This is most notable for the b results where good relevant retrieved (in relative terms) is accompanied by a large reduction in MAP for both data sets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusions</head><p>Results of our merging experiments for CLEF 2005 indicate that the behaviour of merging schemes varies for different sets of ranked lists. The reasons for this behaviour are not obvious and further analysis is planned to attempt to better understand this behaviour as a basis for the extension of these techniques for merging or the proposal of new ones.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>individual collection maximum weight for a given query min_wt = the individual collection minimum weight for a given query rank = a parameter to control the effect of size of collection -a collection with more documents gets a higher rank (value ranges between 1.5 and 1).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 :</head><label>1</label><figDesc>. Our official submissions to CLEF 2005 are marked *. Merging results using the provided Hummingbird ranked lists.</figDesc><table><row><cell>Run-id</cell><cell>P10</cell><cell>% chg.</cell><cell>P30</cell><cell>% chg.</cell><cell>MAP</cell><cell>% chg.</cell><cell>Rel. Ret.</cell><cell>chg.</cell></row><row><cell>dcu.hump*</cell><cell>0.5175</cell><cell>-</cell><cell>0.3958</cell><cell>-</cell><cell>0.2086</cell><cell>-</cell><cell>2982</cell><cell>-</cell></row><row><cell>dcu.humd</cell><cell>0.3725</cell><cell>-28.0</cell><cell>0.3467</cell><cell>-12.4</cell><cell>0.1775</cell><cell>-14.9</cell><cell>2965</cell><cell>-17</cell></row><row><cell>dcu.humr</cell><cell>0.4550</cell><cell>-12.1</cell><cell>0.3642</cell><cell>-8.0</cell><cell>0.1932</cell><cell>-7.4</cell><cell>2964</cell><cell>-18</cell></row><row><cell>dcu.humq</cell><cell>0.4575</cell><cell>-11.6</cell><cell>0.3633</cell><cell>-8.2</cell><cell>0.2005</cell><cell>-3.9</cell><cell>2752</cell><cell>-230</cell></row><row><cell>dcu.humb</cell><cell>0.3200</cell><cell>-32.2</cell><cell>0.2925</cell><cell>-26.1</cell><cell>0.1596</cell><cell>-23.5</cell><cell>2950</cell><cell>-32</cell></row><row><cell>dcu.humt*</cell><cell>0.4075</cell><cell>-21.3</cell><cell>0.3275</cell><cell>-17.3</cell><cell>0.1734</cell><cell>-16.9</cell><cell>2442</cell><cell>-540</cell></row><row><cell>dcu.humm1*</cell><cell>0.4800</cell><cell>-7.2</cell><cell>0.3817</cell><cell>-3.6</cell><cell>0.1988</cell><cell>-4.7</cell><cell>2873</cell><cell>-109</cell></row><row><cell>dcu.humm2*</cell><cell>0.4650</cell><cell>-10.1</cell><cell>0.3625</cell><cell>-8.4</cell><cell>0.1846</cell><cell>-11.5</cell><cell>2846</cell><cell>-136</cell></row><row><cell>Run-id</cell><cell>P10</cell><cell>% chg.</cell><cell>P30</cell><cell>% chg.</cell><cell>MAP</cell><cell cols="2">% chg. Rel. Ret.</cell><cell>chg.</cell></row><row><cell>dcu.Prositqgp*</cell><cell>0.4500</cell><cell>-</cell><cell>0.4458</cell><cell>-</cell><cell>0.3103</cell><cell>-</cell><cell>4404</cell><cell>-</cell></row><row><cell>dcu.Prositqgd</cell><cell>0.4850</cell><cell>+7.7</cell><cell>0.4442</cell><cell>-0.4</cell><cell>0.2931</cell><cell>-5.5</cell><cell>4552</cell><cell>+148</cell></row><row><cell>dcu.Prositqgr</cell><cell>0.4950</cell><cell>+10.0</cell><cell>0.4458</cell><cell>0.0</cell><cell>0.3011</cell><cell>-3.0</cell><cell>4544</cell><cell>+140</cell></row><row><cell>dcu.Prositqgq</cell><cell>0.4650</cell><cell>+3.3</cell><cell>0.4462</cell><cell>+0.1</cell><cell>0.3192</cell><cell>+2.9</cell><cell>4469</cell><cell>+65</cell></row><row><cell>dcu.Prositqgb</cell><cell>0.4725</cell><cell>+5.0</cell><cell>0.4408</cell><cell>-1.1</cell><cell>0.2834</cell><cell>-8.7</cell><cell>4538</cell><cell>+134</cell></row><row><cell>dcu.Prositqgt*</cell><cell>0.4600</cell><cell>+2.2</cell><cell>0.4458</cell><cell>0.0</cell><cell>0.3201</cell><cell>+3.2</cell><cell>4477</cell><cell>+73</cell></row><row><cell>dcu.Prositqgm1*</cell><cell>0.4750</cell><cell>+5.6</cell><cell>0.4592</cell><cell>+3.0</cell><cell>0.3241</cell><cell>+4.5</cell><cell>4486</cell><cell>+82</cell></row><row><cell>dcu.Prositqgm2*</cell><cell>0.4700</cell><cell>+4.4</cell><cell>0.4608</cell><cell>+3.4</cell><cell>0.3286</cell><cell>+5.9</cell><cell>4512</cell><cell>+108</cell></row><row><cell>Table</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Exeter at CLEF 2003: Experiments with Machine Translation for Monolingual, Bilingual and Multilingual Retrieval</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Lam-Adesina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">J F</forename><surname>Jones</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the CLEF 2003 Workshop on Cross-Language Information Retrieval and Evaluation</title>
				<meeting>the CLEF 2003 Workshop on Cross-Language Information Retrieval and Evaluation<address><addrLine>Trondheim, Norway</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Report on CLEF-2003 Multilingual Tracks</title>
		<author>
			<persName><forename type="first">Jacques</forename><surname>Savoy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the CLEF 2003 Workshop on Cross-Language Information Retrieval and Evaluation</title>
				<meeting>the CLEF 2003 Workshop on Cross-Language Information Retrieval and Evaluation<address><addrLine>Trondheim, Norway</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
