<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">SINAI at CLEF eHealth 2017 Task 3</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Manuel</forename><surname>Carlos Díaz-Galiano</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">Universidad de Jaén Campus</orgName>
								<address>
									<addrLine>Las Lagunillas</addrLine>
									<postCode>E-23071</postCode>
									<settlement>Jaén</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">M</forename><surname>Teresa Martín-Valdivia</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">Universidad de Jaén Campus</orgName>
								<address>
									<addrLine>Las Lagunillas</addrLine>
									<postCode>E-23071</postCode>
									<settlement>Jaén</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Salud</forename><surname>María Jiménez-Zafra</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">Universidad de Jaén Campus</orgName>
								<address>
									<addrLine>Las Lagunillas</addrLine>
									<postCode>E-23071</postCode>
									<settlement>Jaén</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Alberto</forename><surname>Andreu</surname></persName>
							<email>aandreu@ujaen.es</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">Universidad de Jaén Campus</orgName>
								<address>
									<addrLine>Las Lagunillas</addrLine>
									<postCode>E-23071</postCode>
									<settlement>Jaén</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">L</forename><surname>Alfonso</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">Universidad de Jaén Campus</orgName>
								<address>
									<addrLine>Las Lagunillas</addrLine>
									<postCode>E-23071</postCode>
									<settlement>Jaén</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ureña</forename><surname>López</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">Universidad de Jaén Campus</orgName>
								<address>
									<addrLine>Las Lagunillas</addrLine>
									<postCode>E-23071</postCode>
									<settlement>Jaén</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">SINAI at CLEF eHealth 2017 Task 3</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">1E599AEBB4699011B6547D167DA5AFE3</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T20:31+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper we present our participation as SINAI research group from the University of Jaén at Task3 Patient-Centred Information Retrieval. Although only two runs are allowed to be submitted, we have tried several strategies using different models and parameters in order to check the effectiveness of our system. The main 3 approaches try to apply query feedback using MeSH expansion, search engine Google and a Word2Vec model over the Wikipedia. Finally, we have sent two runs in the ad-hoc task. The first one uses Google and the second one applies Word2Vec using the pages related with Health extracted from the Wikipedia.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The Internet is an important source of health information not only for medical professionals but also for patients or traditional users. Everyday more and more, users are searching for medical information. However, the terminology and the understanding of professional and non-professional users are very different. In this paper, we describe our participation in CLEF eHealth 2017 Task 3: Patient-Centered Information Retrieval <ref type="bibr" target="#b5">[6]</ref>. The CLEF eHealth lab aims to evaluate the effectiveness of information retrieval systems when searching for health content on the web. From 2013 the share task eHealth organized by the CLEF <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b3">4]</ref> is focused on studying the medical information retrieval but from the patient point of view, assuming that this kind of user has more difficulties understanding documents in the health domain. In 2017, the CLEF eHealth task 3 Patient-Centred Information Retrieval, continues to focus on evaluating the effectiveness of information retrieval system on the Web <ref type="bibr" target="#b3">[4]</ref>, the topics provided by the organizers are also the same as those of 2016, with the aim of improving the relevance assessment pool and the collection reusability.</p><p>The 2016 topics were developed by mining health web forums where users were seeking advice about specific symptoms, diagnosis, conditions or treatments. For each forum post a set of 6 query variants were generated, representing different ways to express the same information need.</p><p>In this section, we present the different strategies that we have followed in our participation in CLEF eHealth 2017 Task 3 Patient-Centred Information Retrieval: IRTask 1 Ad-hoc search.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">System description</head><p>Although our research group SINAI has a large experience participating in several tasks of other editions of CLEF, mainly in ImageCLEFmed <ref type="bibr" target="#b1">[2]</ref>, this is the first time that we participate in CLEF eHealth. We have tried 3 main approaches, all of them focused on the integration of external knowledge in order to enrich the query:</p><p>-Including terms extracted from MeSH.</p><p>-Including information retrieved from Google.</p><p>-Including terms extracted from the Wikipedia.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Preprocessing and indexing</head><p>We have used ClueWeb12 B13 corpus<ref type="foot" target="#foot_0">1</ref> and the Lemur IR System<ref type="foot" target="#foot_1">2</ref> . Specifically, we have used Indri search engine for indexing with several default parameters: preprocessing deleting stopwords and stemming words with krovezt algorithm. In addition, we have used Dirichlet prior retrieval method with µ = 2500.</p><p>For the queries, we have also applied stopword removal and krovezt stemmer.</p><p>3 Experiments</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">MeSH approach</head><p>Our first approach was to apply the query expansion strategy using MeSH that we have used in other CLEF task in previous years <ref type="bibr" target="#b0">[1]</ref>. The main goal is to integrate medical knowledge in order to semantically enrich the query. However, when we tested the result with the assessments of 2016, the results were very poor, even worse than the baseline. We think that the main reason for this is because the collection and the queries are written by non-professional of medicine. Thus, we need to integrate other kind of information with more informal writing instead of using the technical terms extracted from MeSH.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Google approach</head><p>Since the collection and queries are designed to simulate a typical searching on the web, we have tried to integrate the knowledge from the most popular web search engine, i.e., Google. Thus, we first launch a query on Google and then, we have accomplished experiments with different parameters:</p><p>-Replace the query by the titles of the top X retrieved documents, with X ={1,2,3,4,5,10} -Replace the query by the snippets of the top X retrieved documents, with X ={1,2,3,4,5,10} -Replace the query by the titles and snippets of the top X retrieved documents, with X ={1,2,3,4,5,10} -Include in the query the titles of the top X retrieved documents, with X ={1,2,3,4,5,10} plus the original query -Include in the query the snippets of the top X retrieved documents, with X ={1,2,3,4,5,10} plus the original query -Include in the query the titles and snippets of the top X retrieved documents, with X ={1,2,3,4,5,10} plus the original query</p><p>We have evaluated the experiments using the 2016 relevance assessments and the results are a bit better than the MeSH approach although only the experiments including the information of 5 and 10 documents overcome the baseline. It is worth to mention that the inclusion of the original query always improves the results. Of course, run time is increasing as the number of documents increases, being the experiment with 10 documents the slowest one. Anyway, we have selected the experiment including the titles and snippets of the top 10 retrieved documents plus the original query because is the one with higher precision.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Wikipedia approach</head><p>Finally, we have integrated information by including word vectors obtained from Word2Vec. We have applied two different approaches to create our model, one using the whole Wikipedia (Wikipedia-All) and another one using only pages related to health categories from the Wikipedia (Wikipedia-Health). To create Wikipedia-Health we have obtained all the pages included in the Health category and subcategories. For subcategories we have gone down four levels. We have downloaded a total of 80,765 pages from 13,279 categories.</p><p>To expand the original query we calculate a vector for each word in the query. Next, we find the centroid of these vectors calculating the average vector. Finally, we obtain the words whose vectors are near to the centroid. We use the proximity value as weight for this word in the expansion.</p><p>Although usually the Word2vec models work better as more documents are included, in this case, the Wikipedia-Health seems that is more efficient than the whole Wikipedia. In our case, evaluating with the 2016 assessments, both approaches slighted overcome the baseline although the Wikipedia-Health works a bit better. For these reasons, we have selected this last experiment to be summited to CLEF eHealth 2017.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Results</head><p>After running all the experiments described in the previous section, we select only two of them in order to be presented in the CLEF eHealth 2017. We have selected the best one from the Google approach and the best one from the Wikipedia approach.</p><p>-SINAI-Run1: experiment including the titles and snippets of the top 10 retrieved documents plus the original query. -SINAI-Run2: experiment including one word got using word2vec model generated from the Health-Wikipedia.</p><p>Unfortunately, assessments and official results will be released before the conference, thus we can not include our system evaluation.</p><p>Table <ref type="table" target="#tab_0">1</ref> shows results obtained with 2016 relevance judgments. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>MAP and R-Prec with 2016 relevance judgments.</figDesc><table><row><cell>Runs</cell><cell>MAP R-Prec</cell></row><row><cell>Base line</cell><cell>0.0862 0.1292</cell></row><row><cell cols="2">SINAI-Run1 0.1330 0.1786</cell></row><row><cell cols="2">SINAI-Run2 0.0892 0.1311</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://lemurproject.org/clueweb12/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">http://lemurproject.org/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work has been partially supported by a grant from the Ministerio de Educación Cultura y Deporte (MECD -scholarship FPU014/00983), Fondo Europeo de Desarrollo Regional (FEDER) and REDES project (TIN2015-65136-C2-1-R) from the Spanish Government.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Integrating mesh ontology to improve medical information retrieval</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">C</forename><surname>Díaz-Galiano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>García-Cumbreras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Martín-Valdivia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Montejo-Ráez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Urena-López</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Workshop of the Cross-Language Evaluation Forum for European Languages</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="601" to="606" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">SINAI at ImageCLEFmed</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">C</forename><surname>Díaz-Galiano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>García-Cumbreras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Martín-Valdivia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Urena-López</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Montejo-Ráez</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">CLEF (Working Notes)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">C</forename><surname>Peters</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2008">2008. 2008</date>
			<biblScope unit="volume">1174</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Overview of the clef ehealth evaluation lab</title>
		<author>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Suominen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Hanlen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Névéol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Grouin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Palotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Zuccon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference of the Cross-Language Evaluation Forum for European Languages</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2015">2015. 2015</date>
			<biblScope unit="page" from="429" to="443" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Clef 2017 ehealth evaluation lab overview</title>
		<author>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Suominen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Névéol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Robert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kanoulas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Spijker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Palotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Zuccon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2017 -8th Conference and Labs of the Evaluation Forum</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Overview of the share/clef ehealth evaluation lab</title>
		<author>
			<persName><forename type="first">L</forename><surname>Kelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Suominen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Schreck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Leroy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">L</forename><surname>Mowery</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Velupillai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">W</forename><surname>Chapman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Martinez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Zuccon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference of the Cross-Language Evaluation Forum for European Languages</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2014">2014. 2014</date>
			<biblScope unit="page" from="172" to="191" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Clef 2017 task overview: The ir task at the ehealth evaluation lab</title>
		<author>
			<persName><forename type="first">J</forename><surname>Palotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Zuccon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jimmy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Pecina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lupu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hanbury</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of Conference and Labs of the Evaluation (CLEF) Forum. CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Overview of the share/clef ehealth evaluation lab</title>
		<author>
			<persName><forename type="first">H</forename><surname>Suominen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Salanterä</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Velupillai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">W</forename><surname>Chapman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Savova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Elhadad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Pradhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>South</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">L</forename><surname>Mowery</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">J</forename><surname>Jones</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference of the Cross-Language Evaluation Forum for European Languages</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013. 2013</date>
			<biblScope unit="page" from="212" to="231" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">The ir task at the clef ehealth evaluation lab 2016: usercentred health information retrieval</title>
		<author>
			<persName><forename type="first">G</forename><surname>Zuccon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Palotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lupu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Pecina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Mueller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Budaher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Deacon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2016-Conference and Labs of the Evaluation Forum</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="volume">1609</biblScope>
			<biblScope unit="page" from="15" to="27" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
