<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A Study on Query Expansion with MeSH Terms and Elasticsearch. IMS Unipd at CLEF eHealth Task 3</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Giorgio</forename><forename type="middle">Maria</forename><surname>Di</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Dept. of Information Engineering</orgName>
								<orgName type="institution">University of Padua</orgName>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Alexandru</forename><surname>Moldovan</surname></persName>
							<email>alexandru.moldovan@studenti.unipd.it</email>
							<affiliation key="aff0">
								<orgName type="department">Dept. of Information Engineering</orgName>
								<orgName type="institution">University of Padua</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">A Study on Query Expansion with MeSH Terms and Elasticsearch. IMS Unipd at CLEF eHealth Task 3</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">B954875AF296AF9C9BA5C72A554B355D</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T02:34+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, we describe the first participation of the Information Management Systems (IMS) group at CLEF eHealth 2018 Task 3, Consumer Health Search Task. In particular, we participated in the subtask IRTask 1: Ad-hoc Search which is a standard ad-hoc search task, aiming at retrieving information relevant to people seeking health advice on the web. The goal of our work is to evaluate 1) different query expansion strategies based on the recognition of Medical Subject Headings (MeSH) terms present in the original query; 2) different approaches to combine multiple ranking lists given the query expansions. We used Elasticsearch as search engine and the indexes provided by the organizers of this task.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In this paper, we report the experimental results of our first participation to the CLEF eHealth Lab Task 3 <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b2">3]</ref>: "Consumer Health Search". This task investigates the problem of retrieving documents to support information needs of health consumers that are confronted with a health problem.</p><p>This work is part of a Master Degree thesis in Computer Engineering where the main goal is to test the effectiveness of some variants of query expansion approaches based on the recognition of MeSH terms present in the original query.</p><p>The contribution of our experiments to this task can be summarized as follows:</p><p>-A study of several query expansion approaches that takes into account the relationships between MeSH terms <ref type="bibr" target="#b5">[6]</ref>; -An evaluation of different document scoring strategies given the multiple ranking list produced by the query expansions <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b5">6]</ref>.</p><p>The remainder of the paper will introduce the methodology and a brief summary of the experimental settings that we used in order to create the runs that we submitted for the task.</p><p>In this section, we describe the query expansion approaches <ref type="bibr" target="#b5">[6]</ref> as well as the document scoring strategies <ref type="bibr" target="#b0">[1]</ref> that we used to create the expanded queries and the ranked lists.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Query expansion approaches</head><p>For the experiments of the query expansion approach, we used the English version of the information need. Before running the automatic expansion algorithms, we performed a manual check in order to correct spelling errors. For example, for topic 189001, the term gonorrhea is misspelt as gonhrrea.</p><p>Identification of MeSH terms After the manual cleaning process, we use the MeshOnDemand<ref type="foot" target="#foot_0">1</ref> API to identify the MeSH terms present in the query.</p><p>For example, for topic 188001 "caffeine high blood pressure" we obtain the following MeSH terms -Caffeine<ref type="foot" target="#foot_1">2</ref> -Hypertension<ref type="foot" target="#foot_2">3</ref> plus one additional term -Blood pressure <ref type="foot" target="#foot_3">4</ref>Finding related MeSH terms For each MeSH term found in the previous step, we use the MeSHRDF<ref type="foot" target="#foot_4">5</ref> database to look for semantically related (MeSH) terms. See for example Figure <ref type="figure" target="#fig_0">1</ref> that shows a part of the structure of the tree of relations among concepts related to the MeSH term Caffeine.</p><p>We choose a subset of all the possible relations (predicates) between terms in the MeSHRDF database <ref type="foot" target="#foot_5">6</ref> , and we use this subset of predicates for query expansion in the following way:</p><p>-Baseline: the original query is used without any additional term.</p><p>-Simple Expansion (SE): given a MeSH term identified in the first step, all the MesH entries related to that term are kept, except for the predicates 'meshv:Qualifier', 'meshv:seeAlso', 'meshv:broader' e 'meshv:broaderDescriptor'.</p><p>Then we re-apply just once (not recursively) a SE for each 'child' node. -SE broader: like SE, in addition, only for the original MeSH terms (those that appear in the query) we expand the MesH terms by adding the predicates 'meshv:broader' e 'meshv:broaderDescriptor'. -SE also: like SE broader, instead of adding the predicates 'broader', we add the MesH terms related with the predicate 'meshv:seeAlso'. -SE broader also: a combination of SE broader and SE also.</p><p>-SE child broader: first, apply SE, then for each 'child' node search for 'parents' different from the original MeSH terms. -SE recursive down: like SE, for each 'child' we recursively apply SE until leaf nodes are found (no more recursions). -All in one: all the approaches at once.</p><p>At the end of a query expansion process, we have three main data objects:</p><p>the original query q; a vector m = (m 1 , m 2 , . . . , m n ) of MeSH terms associated with the original query; a list t of expanded terms of n elements where each element t i is another vector of terms resulting from the iteration of the expansion approach, t i = (t i1 , t i2 , . . . , t ij ).</p><p>Given these three objects, we create a set of expanded queries that will be used to create a number of ranked lists, as explained in the following subsections.</p><p>Building expanded query Give the vector of MesH terms m and the list of expanded terms t, we create a set of expanded queries by means of the following procedure:</p><p>-For each term term m i of the vector of MeSH terms associated with the original query, we substitute m i with one of the terms in t i , for example t i1 , then, we build the expanded query by joining the original query with the new vector m * i = (m 1 , . . . , t i1 , . . . , m n ), q * = q m * i . Therefore, at the end of the process we generate a set V of vectors of expanded queries where the cardinality |V | is the sum of all the elements in the vectors of the list t. For each vector of v k ∈ V we obtain a list l k of ranked documents.</p><p>Merging ranked lists We use different approaches to merge the |V | ranked list into a single list based on the combination of scores of the documents.</p><p>-Average: given a document present in one or more lists, the scores associated to the document are averaged. Then the documents are ordered in decreasing order on the basis of this new score. -Sum: given a document present in one or more lists, we sum the scores associated to the documents. -Normalized sum: like Sum but the sum of the scores are normalized by the highest score in the ranked lists. -Round robin: for each rank r, we take the document of each list l k at r and add it to the new ranked list if it has not already been seen.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Experiments</head><p>For the experiments, we used the Elasticsearch search engine<ref type="foot" target="#foot_6">7</ref> and the indexes provided by the organizers of the task. We used the BM25 ranking function with default parameters. Given the constraints on the number of runs, four in total, that could be submitted to the task, we submitted one baseline run and three query expansion variant with the same scoring approach. We will evaluate all the other combinations as soon as the qrels will be made available.</p><p>In particular, we submitted the following runs that use the Sum (of the scores) as the document scoring approach:</p><p>baseline.exp, a baseline run (plain BM25 with no expansion), sum score simple.exp, simple expansion, sum score broader also, simple expansion plus the two predicates broader and also, sum score recursive.exp, a recursive down approach.</p><p>In Table ??, we show the number of query variants that are generated per topic by three approaches.</p><p>The aim of our first participation to the CLEF eHealth Task 3 was to test the effectiveness of different query expansion approaches that use the MeSH term RDF graph as well as different merging approaches of the ranking lists produced by the query expansion approach. As future work, we will study the combination of the MesH term expansion with the help of medical terminological records <ref type="bibr" target="#b4">[5]</ref> for technologically assisted systematic reviews <ref type="bibr" target="#b1">[2]</ref>. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Partial view of the tree structure of the MeSH term Caffeine.</figDesc><graphic coords="3,134.77,115.83,345.83,259.46" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Number of query variants per approach</figDesc><table><row><cell>topic</cell><cell cols="3">simple broader also recursive</cell></row><row><cell>151001</cell><cell>290</cell><cell>412</cell><cell>524</cell></row><row><cell>152001</cell><cell>310</cell><cell>330</cell><cell>1402</cell></row><row><cell>153001</cell><cell>71</cell><cell>117</cell><cell>127</cell></row><row><cell>154001</cell><cell>90</cell><cell>128</cell><cell>139</cell></row><row><cell>155001</cell><cell>74</cell><cell>123</cell><cell>184</cell></row><row><cell>156001</cell><cell>281</cell><cell>406</cell><cell>547</cell></row><row><cell>157001</cell><cell>222</cell><cell>226</cell><cell>671</cell></row><row><cell>158001</cell><cell>85</cell><cell>152</cell><cell>85</cell></row><row><cell>159001</cell><cell>298</cell><cell>404</cell><cell>1260</cell></row><row><cell>160001</cell><cell>84</cell><cell>191</cell><cell>119</cell></row><row><cell>161001</cell><cell>104</cell><cell>134</cell><cell>129</cell></row><row><cell>162001</cell><cell>42</cell><cell>118</cell><cell>42</cell></row><row><cell>163001</cell><cell>168</cell><cell>286</cell><cell>216</cell></row><row><cell>164001</cell><cell>38</cell><cell>60</cell><cell>231</cell></row><row><cell>165001</cell><cell>181</cell><cell>296</cell><cell>439</cell></row><row><cell>166001</cell><cell>253</cell><cell>323</cell><cell>436</cell></row><row><cell>167001</cell><cell>120</cell><cell>126</cell><cell>312</cell></row><row><cell>168001</cell><cell>241</cell><cell>339</cell><cell>241</cell></row><row><cell>169001</cell><cell>16</cell><cell>80</cell><cell>16</cell></row><row><cell>170001</cell><cell>144</cell><cell>169</cell><cell>169</cell></row><row><cell>171001</cell><cell>22</cell><cell>61</cell><cell>22</cell></row><row><cell>172001</cell><cell>96</cell><cell>135</cell><cell>96</cell></row><row><cell>173001</cell><cell>15</cell><cell>20</cell><cell>15</cell></row><row><cell>174001</cell><cell>112</cell><cell>186</cell><cell>166</cell></row><row><cell>175001</cell><cell>82</cell><cell>124</cell><cell>102</cell></row><row><cell>176001</cell><cell>42</cell><cell>101</cell><cell>42</cell></row><row><cell>177001</cell><cell>76</cell><cell>169</cell><cell>92</cell></row><row><cell>178001</cell><cell>236</cell><cell>336</cell><cell>388</cell></row><row><cell>179001</cell><cell>100</cell><cell>179</cell><cell>146</cell></row><row><cell>180001</cell><cell>83</cell><cell>105</cell><cell>89</cell></row><row><cell>181001</cell><cell>28</cell><cell>75</cell><cell>28</cell></row><row><cell>182001</cell><cell>257</cell><cell>279</cell><cell>423</cell></row><row><cell>183001</cell><cell>85</cell><cell>157</cell><cell>116</cell></row><row><cell>184001</cell><cell>117</cell><cell>166</cell><cell>117</cell></row><row><cell>185001</cell><cell>96</cell><cell>175</cell><cell>96</cell></row><row><cell>186001</cell><cell>245</cell><cell>308</cell><cell>562</cell></row><row><cell>187001</cell><cell>85</cell><cell>204</cell><cell>169</cell></row><row><cell>188001</cell><cell>125</cell><cell>164</cell><cell>239</cell></row><row><cell>189001</cell><cell>18</cell><cell>38</cell><cell>18</cell></row><row><cell>190001</cell><cell>377</cell><cell>428</cell><cell>678</cell></row><row><cell>191001</cell><cell>6</cell><cell>19</cell><cell>6</cell></row><row><cell>192001</cell><cell>169</cell><cell>245</cell><cell>270</cell></row><row><cell>193001</cell><cell>53</cell><cell>88</cell><cell>104</cell></row><row><cell>194001</cell><cell>95</cell><cell>147</cell><cell>95</cell></row><row><cell>195001</cell><cell>47</cell><cell>62</cell><cell>47</cell></row><row><cell>196001</cell><cell>13</cell><cell>104</cell><cell>13</cell></row><row><cell>197001</cell><cell>109</cell><cell>201</cell><cell>185</cell></row><row><cell>198001</cell><cell>51</cell><cell>78</cell><cell>51</cell></row><row><cell>199001</cell><cell>131</cell><cell>209</cell><cell>131</cell></row><row><cell>200001</cell><cell>129</cell><cell>185</cell><cell>202</cell></row><row><cell cols="2">average 124.24</cell><cell>183.36</cell><cell>239.94</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://meshb.nlm.nih.gov/MeSHonDemand</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://meshb.nlm.nih.gov/record/ui?name=Caffeine</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://meshb.nlm.nih.gov/record/ui?name=Hypertension</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://meshb.nlm.nih.gov/record/ui?name=Blood&amp;#37;Pressure</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">https://id.nlm.nih.gov/mesh/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">https://hhs.github.io/meshrdf/predicates</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">https://www.elastic.co/products/elasticsearch</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Combining the evidence of multiple query representations for information retrieval</title>
		<author>
			<persName><forename type="first">Nick</forename><forename type="middle">J</forename><surname>Belkin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Paul</forename><surname>Kantor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Edward</forename><forename type="middle">A</forename><surname>Fox</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Joseph</forename><forename type="middle">A</forename><surname>Shaw</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Second Text Retrieval Conference</title>
				<imprint>
			<date type="published" when="1995">1995</date>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="page" from="431" to="448" />
		</imprint>
	</monogr>
	<note>TREC-2</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">A study of an automatic stopping strategy for technologically assisted medical reviews</title>
		<author>
			<persName><forename type="first">Giorgio</forename><surname>Maria</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Di</forename><surname>Nunzio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Information Retrieval -40th European Conference on IR Research, ECIR 2018</title>
				<meeting><address><addrLine>Grenoble, France</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">March 26-29, 2018. 2018</date>
			<biblScope unit="page" from="672" to="677" />
		</imprint>
	</monogr>
	<note>Proceedings</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Overview of the CLEF 2018 Consumer Health Search Task</title>
		<author>
			<persName><surname>Kelly</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2018 Evaluation Labs and Workshop: Online Working Notes</title>
				<editor>
			<persName><forename type="first">Guido</forename><surname>Jimmy</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Joao</forename><surname>Zuccon</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Lorraine</forename><surname>Palotti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Liadh</forename><surname>Goeuriot</surname></persName>
		</editor>
		<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2018-09">September 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Overview of the CLEF eHealth Evaluation Lab 2018</title>
		<author>
			<persName><forename type="first">Hanna</forename><surname>Suominen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Liadh</forename><surname>Kelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lorraine</forename><surname>Goeuriot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Evangelos</forename><surname>Kanoulas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Leif</forename><surname>Azzopardi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rene</forename><surname>Spijker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dan</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aurélie</forename><surname>Névéol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lionel</forename><surname>Ramadier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aude</forename><surname>Robert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Joao</forename><surname>Palotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jimmy</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Guido</forename><surname>Zuccon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2018 -8th Conference and Labs of the Evaluation Forum</title>
		<title level="s">Lecture Notes in Computer Science (LNCS</title>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2018-09">September 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Trimed: A multilingual terminological database</title>
		<author>
			<persName><forename type="first">Federica</forename><surname>Vezzani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Giorgio</forename><forename type="middle">Maria</forename><surname>Di</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nunzio</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Geneviève</forename><surname>Henrot</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018</title>
				<meeting>the Eleventh International Conference on Language Resources and Evaluation, LREC 2018<address><addrLine>Miyazaki, Japan</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">May 7-12, 2018. 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Query expansion using mesh terms for dataset retrieval: Ohsu at the biocaddie 2016 dataset retrieval challenge</title>
		<author>
			<persName><forename type="first">David</forename><surname>Theodore B Wright</surname></persName>
		</author>
		<author>
			<persName><forename type="first">William</forename><surname>Ball</surname></persName>
		</author>
		<author>
			<persName><surname>Hersh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Database</title>
		<imprint>
			<biblScope unit="page">65</biblScope>
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
