<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">ub-botswana participation to CLEF eHealth IR challenge 2017: Task 3 (IRTask1 : ad-hoc search)</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Edwin</forename><surname>Thuma</surname></persName>
							<email>thumae@mopipi.ub.bw</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">University of Botswana</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Nkwebi</forename><surname>Motlogelwa</surname></persName>
							<email>motlogel@mopipi.ub.bw</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">University of Botswana</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Tebo</forename><surname>Leburu-Dingalo</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">University of Botswana</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">ub-botswana participation to CLEF eHealth IR challenge 2017: Task 3 (IRTask1 : ad-hoc search)</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">68A2AF0E74870408FDB7CE147116E34F</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T20:29+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Explicit relevance Feedback</term>
					<term>Proximity Search</term>
					<term>Pseudo Relevance Feedback</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, we describe the methods deployed in the different runs submitted for our participation to the CLEF eHealth 2017 Task 3: Patient-Centered Information Retrieval, IRTask 1: ad-hoc search. Specifically, we deploy DPH term weighting model with explicit relevance feedback, where the expansion terms are selected from documents which were previously identified as relevant by assessors for each query. As improvement we deployed proximity search using both Full Dependence (FD) and Sequential Dependence (SD) variants of the Markov Random Fields and the Divergence From Randomness (DFR) based dependence models to re-rank documents, which have query terms in close proximity. In another approach, we deploy pseudo relevance feedback, where the expansion terms are selected from the top 3 ranked documents after a first pass retrieval. In addition, we deploy proximity search using the SD variant of the DFR based dependence model.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In this paper, we describe the methods used for our participation to the CLEF eHealth 2017 Task 3: Patient-Centered Information Retrieval, IRTask 1: adhoc search. Detailed task description is available in the overview paper of Task 3 <ref type="bibr" target="#b6">[7]</ref>. This task is a continuation of the previous CLEF eHealth Information Retrieval (IR) task that ran in 2013 <ref type="bibr" target="#b2">[3]</ref>, 2014 <ref type="bibr" target="#b3">[4]</ref>, 2015 <ref type="bibr" target="#b5">[6]</ref> and 2016 <ref type="bibr" target="#b4">[5]</ref>. The CLEF eHealth task aims to evaluate the effectiveness of information retrieval systems when searching for health related content on the web, with the objective to foster research and development of search engines tailored to health information seeking <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b4">5]</ref>. The CLEF eHealth Information Retrieval task was motivated by the problem of users of information retrieval systems formulating circumlocutory queries, using colloquial language instead of medical terms as studied by Zuccon et al. <ref type="bibr" target="#b8">[9]</ref> and Stanton et al. <ref type="bibr" target="#b7">[8]</ref>. In their studies, they found that modern search engines are ill-equipped to handle such queries; only 3 out of the to 10 results were highly useful for self diagnosis. In this paper, we attempt to tackle this problem by using explicit relevance feedback in order to improve the retrieval effectiveness. In addition, we deploy proximity search to further improve the retrieval effectiveness of our system. Moreover, we investigate whether pseudo relevance feedback, where the expansion terms are selected from the top 3 ranked documents after a first pass retrieval can improve the retrieval effectiveness. This paper is structured as follows. Section 2 contains a background on algorithms used. Section 3 describes the experimental environment. In Section 4, we describe the experimental the 5 runs submitted by team ub-botswana. Section 5 presents and discusses results on training data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Background</head><p>In this section, we begin by presenting a brief but essential background on the different algorithms used in our experimental investigation and evaluation. We start describing the DPH term weighting model in Section 2.1. We then describe the Bose-Einstein 1 (Bo1) model for query expansion in Section 2.2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">DPH Term Weighting Model</head><p>For all our experimental investigation and evaluation we used the parameterfree DPH term weighting model from the Divergence from Randomness (DFR) framework <ref type="bibr" target="#b1">[2]</ref>. The DPH term weighting model calculates the score of a document d for a given query Q as follows:</p><formula xml:id="formula_0">score DP H (d, Q) = t∈Q qtf • norm • tf • log((tf • avg l l ) • ( N tf c )) + 0.5 • log(2 • π • tf • (1 − t M LE ))<label>(1)</label></formula><p>where qtf , tf and tf c are the frequencies of the term t in the query Q , in the document d and in the collection C respectively. N is number of documents in the collection C, avg l is the average length of documents in the collection C and l is the length of the document d. t</p><formula xml:id="formula_1">M LE = tf l and norm = (1−t M LE ) 2 tf +1</formula><p>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Bose-Einstein 1 (Bo1) Model for Query Expansion</head><p>In our experimental investiagtion and evaluation, we used the Terrier-4.0 Divergence from Randomness (DFR) Bose-Einstein 1 (Bo1) model to select the most informative terms from the topmost documents after a first pass document ranking. The DFR Bo1 model calculates the information content of a term t in the top-ranked documents as follows <ref type="bibr" target="#b0">[1]</ref>:</p><formula xml:id="formula_2">w(t) = tf x • log 2 1 + P n (t) P n (t) + log 2 (1 + P n (t))<label>(2)</label></formula><formula xml:id="formula_3">P n (t) = tf c N<label>(3)</label></formula><p>where P n (t) is the probability of t in the whole collection, tf x is the frequency of the query term in the top x ranked documents, tf c is the frequency of the term t in the collection, and N is the number of documents in the collection.</p><p>FAQ Retrieval Platform: For all our experimental evaluation, we used Terrier-4.</p><p>2, an open source Information Retrieval (IR) platform. All the documents (ClueWeb 12 B13) used in this study were first pre-processed before indexing and this involved tokenising the text and stemming each token using the full Porter stemming algorithm. Stopword removal was enabled and we used Terrier stopword list. The index was created using blocks to save positional information with each term. For pseudo relevance feedback, we used Terrier-4.2 DFR Bose-Einstein 1 (Bo1) model for query expansion to select the 10 most informative terms from the top 3 ranked documents.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Description of the Different Runs</head><p>Term Weighting Model: For all our runs, we used the parameter-free DPH Divergence From Randomness term weighting model in Terrier-4.2 IR platform to score and rank the documents in the ClueWeb 12 B13 document collection.</p><p>ub-botswana IRTask1 run1: We ranked the documents using DPH DFR term weighting. As improvement, we deployed explicit relevance feedback, where we selected expansion terms from the top 3 documents that were explicitly marked relevant by assessors for each query. We used the Terrier-4.2 DFR Bose-Einstein 1 (Bo1) model for query expansion to select the 10 most informative terms from these documents. In addition, we deployed the Full Dependence (FD) variant of the Markov Random Fields for terms dependence. Full Dependence assumes all query terms are in some way dependent on each other. In this work, we experimentally selected a window size of 15, which yielded the highest retrieval performance on the training data.</p><p>ub-botswana IRTask1 run2: We performed a first pass retrieval using DPH DFR term weighting model. As improvement, we deployed explicit relevance feedback, where we deployed DFR Bo1 model for query expansion to select the expansion terms.</p><p>ub-botswana IRTask1 run3: We produced an initial ranking using DPH DFR term weighting. As improvement, we deployed explicit relevance feedback and used the DFR Bo1 model for query expansion to select the expansion terms.</p><p>In addition, we deployed the Sequential Dependence (SD) variant of the Divergence from Randomness based dependence model. Sequential Dependence only assumes a dependence between neighbouring query terms. In this work, we experimentally selected a window size of 15, which yielded the highest retrieval performance on the training data.</p><p>ub-botswana IRTask1 run4: We used the parameter-free DPH DFR term weighting model to produce and initial ranking. As improvement, we deployed a simple pseudo-relevance feedback on the local collection. We used the Bo1 model for query expansion to select the expansion terms. We then performed a second pass retrieval on the local collection with the new expanded query.</p><p>ub-botswana IRTask1 run5: We used ub-botswana IRTask1 run4 as the baseline system. As improvement, we deployed the Sequential Dependence (SD) variant of the Divergence from Randomness based term dependence model. Sequential Dependence only assumes a dependence between neighbouring query terms. In this work, we experimentally selected a window size of 15, which yielded the highest retrieval performance on the training data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Results and Discussion</head><p>These working notes were compiled and submitted before the relevance judgments were released. Below we present the results of our runs using the 2016 query relevance judgments. Please note that the official results to be released will be different because new query relevance judgments will be released  <ref type="table" target="#tab_0">1</ref> presents our results on the training data. From this table, we see a degradation in performance when we incorporate term dependence only in our ranking (ub-botswana IRTask1 run5 ). However, when we deploy pseudo relevance feedback (ub-botswana IRTask1 run4 ), we see an improvement in the retrieval performance in terms of precision at 5 (P@5), precision at 10 (P@10) and recall (rel ret). Moreover significant improvement in the recall is obtained when explicit relevance feedback is deployed ((ub-botswana IRTask1 run1 ), (ubbotswana IRTask1 run2 ) and (ub-botswana IRTask1 run3 )). In addition, we obtain mixed results when we incorporate proximity search after deploying explicit relevance feedback. For example, there was an improvement in the retrieval performance in terms of P@5 and P@10 when we deploy the FD variant of the Markov Random Fields for term dependence using a window size of 15 (ubbotswana IRTask1 run1 ). In contrast, we obtain a degradation in the retrieval performance in terms of P@5 and P@10 when we deploy the SD variant of the Divergence from Randomness based term dependence model using a window size of 15 (ub-botswana IRTask1 run5 ).</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Retrieval Results for all 5 Runs using 2016 qrel</figDesc><table><row><cell>Run ID</cell><cell>P@5 P@10 rel ret</cell></row><row><cell>DPH Baseline</cell><cell>0.2973 0.2710 10104</cell></row><row><cell cols="2">ub-botswana IRTask1 run1 0.5093 0.4423 13661</cell></row><row><cell cols="2">ub-botswana IRTask1 run2 0.4513 0.4097 13661</cell></row><row><cell cols="2">ub-botswana IRTask1 run3 0.4433 0.4073 13661</cell></row><row><cell cols="2">ub-botswana IRTask1 run4 0.3160 0.2903 11129</cell></row><row><cell cols="2">ub-botswana IRTask1 run5 0.2873 0.2617 10104</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table</head><label></label><figDesc></figDesc><table /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Probabilistic Models for Information Retrieval based on Divergence from Randomness</title>
		<author>
			<persName><forename type="first">G</forename><surname>Amati</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2003-06">June 2003</date>
			<biblScope unit="page" from="1" to="198" />
		</imprint>
		<respStmt>
			<orgName>University of Glasgow,UK</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">PhD Thesis</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">FUB, IASI-CNR and University of Tor Vergata at TREC 2007 Blog Track</title>
		<author>
			<persName><forename type="first">G</forename><surname>Amati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ambrosi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bianchi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gaibisso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Gambosi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 16th Text REtrieval Conference (TREC-2007)</title>
				<meeting>the 16th Text REtrieval Conference (TREC-2007)<address><addrLine>Gaithersburg, Md., USA</addrLine></address></meeting>
		<imprint>
			<publisher>TREC</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="1" to="10" />
		</imprint>
	</monogr>
	<note>Text REtrieval Conference</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">ShARe/CLEF eHealth Evaluation Lab 2013, Task 3: Information Retrieval to Address Patients&apos; Questions when Reading Clinical Reports</title>
		<author>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">J</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Leveling</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hanbury</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Müller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Salantera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Suominen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Zuccon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2013 Online Working Notes</title>
				<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="volume">8138</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Share/clef ehealth Evaluation Lab 2014, Task 3: User-Centred Health Information Retrieval</title>
		<author>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Palotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Pecina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Zuccon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hanbury</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">J</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Mueller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2014 Online Working Notes</title>
				<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Overview of the CLEF eHealth Evaluation Lab</title>
		<author>
			<persName><forename type="first">Liadh</forename><surname>Kelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lorraine</forename><surname>Goeuriot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hanna</forename><surname>Suominen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aurélie</forename><surname>Névéol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">João</forename><surname>Palotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Guido</forename><surname>Zuccon</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2016">2016. 2016</date>
			<publisher>Springer International Publishing</publisher>
			<biblScope unit="page" from="255" to="266" />
			<pubPlace>Cham</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">CLEF eHealth Evaluation Lab 2015 task 2: Retrieving Information about Medical Symptoms</title>
		<author>
			<persName><forename type="first">J</forename><surname>Palotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Zuccon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hanbury</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">J F</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lupu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Pecina</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2015 Online Working Notes</title>
				<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">CLEF 2017 Task Overview: The IR Task at the eHealth Evaluation Lab</title>
		<author>
			<persName><forename type="first">J</forename><surname>Palotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Zuccon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Jimmy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Pecina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Lupu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kelly</surname></persName>
		</author>
		<author>
			<persName><surname>Hanbury</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of Conference and Labs of the Evaluation (CLEF) Forum. CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Circumlocution in Diagnostic Medical Queries</title>
		<author>
			<persName><forename type="first">I</forename><surname>Stanton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ieong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Mishra</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 37th international ACM SIGIR conference on Research &amp; development in information retrieval</title>
				<meeting>the 37th international ACM SIGIR conference on Research &amp; development in information retrieval</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="133" to="142" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Diagnose This If You Can: On the Effectiveness of Search Engines in Finding Medical Self-Diagnosis Information</title>
		<author>
			<persName><forename type="first">G</forename><surname>Zuccon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Koopman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Palotti</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Information Retrieval (ECIR 2015)</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="562" to="567" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
