<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the CLEF 2018 Consumer Health Search Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jimmy</string-name>
          <email>jimmy@Staff.ubaya.ac.id</email>
          <email>jimmy@qut.edu.au</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guido Zuccon</string-name>
          <email>g.zuccon@qut.edu.au</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joao Palotti</string-name>
          <email>jpalotti@hbku.edu.qa</email>
          <email>palotti@ifs.tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lorraine Goeuriot</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liadh Kelly</string-name>
          <email>liadh.kelly@mu.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Maynooth University</institution>
          ,
          <addr-line>Maynooth</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Qatar Computing Research Institute</institution>
          ,
          <addr-line>Doha</addr-line>
          ,
          <country country="QA">Qatar</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Queensland University of Technology</institution>
          ,
          <addr-line>Brisbane</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Université Grenoble Alpes</institution>
          ,
          <addr-line>Grenoble</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Surabaya</institution>
          ,
          <addr-line>Surabaya</addr-line>
          ,
          <country country="ID">Indonesia</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Vienna University of Technology</institution>
          ,
          <addr-line>Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper details the collection, systems and evaluation methods used in the CLEF 2018 eHealth Evaluation Lab, Consumer Health Search (CHS) task (Task 3). This task investigates the effectiveness of search engines in providing access to medical information present on the Web for people that have no or little medical knowledge. The task aims to foster advances in the development of search technologies for Consumer Health Search by providing resources and evaluation methods to test and validate search systems. Built upon the the 2013-17 series of CLEF eHealth Information Retrieval tasks, the 2018 task considers both mono- and multilingual retrieval, embracing the Text REtrieval Conference (TREC) -style evaluation process with a shared collection of documents and queries, the contribution of runs from participants and the subsequent formation of relevance assessments and evaluation of the participants submissions. For this year, the CHS task uses a new Web corpus and a new set of queries compared to the previous years. The new corpus consists of Web pages acquired from the CommonCrawl and the new set of queries consists of 50 queries issued by the general public to the Health on the Net (HON) search services. We then manually translated the 50 queries to French, German, and Czech; and obtained English query variations of the 50 original queries. A total of 7 teams from 7 different countries participated in the 2018 CHS task: CUNI (Czech Republic), IMS Unipd (Italy), MIRACL (Tunisia), QUT (Australia), SINAI (Spain), UB-Botswana (Botswana), and UEvora (Portugal).</p>
      </abstract>
      <kwd-group>
        <kwd>Evaluation</kwd>
        <kwd>Consumer Health Search</kwd>
        <kwd>Medical Information Retrieval</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The use of the Web as a source of health-related information is a widespread
practice among health consumers [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and search engines are commonly used
as a means to access health information available online [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. However, there is
on-going need for development of retrieval approaches and resources to support
development in this domain as highlighted in for example [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        This document reports on the CLEF 2018 eHealth Evaluation Lab
information retrieval (IR) task (Task 3). The task investigated the problem of retrieving
Web pages to support information needs of health consumers (including their
next-of-kin) that are confronted with a health problem or a medical condition
and that use a search engine to seek better understanding about their health.
This task has been developed within the CLEF 2018 eHealth Evaluation Lab,
which aims to build efforts around the easing and support of patients, their
next-of-kins, clinical staff, and health scientists in understanding, accessing, and
authoring eHealth information in a multilingual setting [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ].
      </p>
      <p>
        The 2018 Consumer Health Search (CHS) continued the previous iterations of
this task (i.e. the 2013, 2014, 2015, 2016, and 2017 CLEFeHealth Lab information
retrieval task [
        <xref ref-type="bibr" rid="ref10 ref12 ref27 ref29 ref39 ref9">9,12,27,39,29,10</xref>
        ]) that aimed at evaluating the effectiveness of
search engines to support people when searching for information about their
medical conditions, e.g., to answer queries like “antiandrogen therapy for prostate
cancer” with correct, trustworthy and understandable search results.
      </p>
      <p>
        The 2013 and 2014 tasks focused on helping patients or their next-of-kin
understanding information in their medical discharge summary. The 2015 task
focused on supporting consumers searching for self-diagnosis information, an
important type of health information seeking activity [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The 2016 task expanded
the 2015 task by considering not only self-diagnosing information but also needs
related to treatment and management of health conditions. Finally, the 2017
task used the corpus and topics of the 2016 task, with the focus of expanding
the assessment pool and the number of relevance assessments.
      </p>
      <p>
        The 2018 CHS task considered similar subtasks as in 2017: ad hoc search,
query variation, methods to personalize health search, and multilingual search.
A new subtask was also introduced: this required participants to classify queries
with respect to their underlying query intent as detailed in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. For this task
a new query set was introduced and a new document corpus, obtained from a
subset of the CommonCrawl 7 data.
      </p>
      <p>The remainder of this paper is structured as follows: Section 2 details the
subtasks we considered this year; Section 3 describes the query set and the
methodology used to create it; Section 4 describes the corpus used; Section 5
details the baselines created by the organisers as a benchmark for participants;
Section 6 describes participants submissions; Section 7 details the methods used
to create the assessment pools and relevance criteria; finally, Section 8 concludes
this overview paper.
7 http://commoncrawl.org/</p>
    </sec>
    <sec id="sec-2">
      <title>The Subtasks of CLEF CHS</title>
      <p>2.1</p>
      <sec id="sec-2-1">
        <title>Subtask 1: Ad-hoc Search</title>
        <p>This was a standard ad-hoc search task, aiming at retrieving information relevant
to people seeking health advice on the Web. Queries for this task were generated
by mining 50 queries issued by the general public to the HON search services, as
detailed in Section 3. Every query was treated as independent and participants
were asked to generate retrieval runs in answer to each query, as in a common
ad-hoc search task.</p>
        <p>
          Participants submitted a TREC result file containing a ranking of results in
answer to each query. Each participant was allowed to submit up to 4 submissions
which will be evaluated using normalized discounted cumulative gain [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] at 10
(NDCG@10), binary preference (BPref) and rank biased precision (RBP) [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ],
with a persistence parameter p = 0:80 (see [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]).
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Subtask 2: Personalized Search</title>
        <p>
          This task was developed on top of the Subtask 1 and followed on a similar task
started in 2017. This subtask aimed to personalize the retrieved list of search
results so as to match user expertise, measured by how likely the person is to
understand the content of a document (with respect to the health information).
To this end, submissions (i.e., in standard TREC run format and up to 4
submissions per participant) will be evaluated using the graded version of the uRBP
evaluation measure [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ], which uses both topical relevance and other dimensions
of relevance, such as understandability and trustworthiness.
        </p>
        <p>The parameters of this evaluation measure were further varied to evaluate
personalization to different users. In addition to the 50 base queries as used in
Subtask 1, each topic was released with 6 query variations issued by 6 research
students at QUT; students had no medical knowledge. When evaluating results
for a query variation, a parameter will be used to capture user expertise. The
parameter is used to determine the shape of the gain curve, so that documents
at the right understandability level obtain the highest gains, with decaying gains
being assigned to documents that do not suit the understandability level of the
modelled user. The alpha parameters used for each query variation are: = 0:0
for query variation 1 (the basis query), = 0:2 for query variation 2, = 0:4 for
query variation 3, = 0:5 for query variation 4, = 0:6 for query variation 5,
= 0:8 for query variation 6 and, finally, = 1:0 for query variation 7. The
variation models an increasing level of expertise across query variations for one
topic. The intuition in such evaluation is that a person with no specific health
knowledge (represented by query variant 1) would not understand complex and
technical health material, while an expert (represented by query variant 7) would
have little or no interest in reading introductory/basic material.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Subtask 3: Query Variations</title>
        <p>Subtask 3 aimed to foster research into building search systems that are robust
to query variations. The task used the same 7 query variations set as in Subtask
2.</p>
        <p>
          For this subtask, participants were asked to submit a single set of results
for each topic in standard TREC run format. Each topic had 7 variations to
be considered. Each participant was allowed to submit up to 4 submissions and
submissions were evaluated using the same measures as for Subtask 1 but using
the mean-variance evaluation framework (MVE) [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ]. In this framework,
evaluation results for each query variation for a topic were averaged, and their variance
also accounted for to compute a final system performance estimate.
2.4
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Subtask 4: Multilingual Ad-hoc Search</title>
        <p>The multilingual task extended the ad-hoc subtask by providing a translation
of the queries from English into Czech, French, and, German. The goal of this
subtask was to support research in multilingual information retrieval, developing
techniques to support users that can express their information need well in their
native language and can read the results in English.</p>
        <p>The queries for this subtask were manual translations of the Subtask 1
queries. Participant’s submissions in standard TREC run format (up to 4
submissions per participant) were evaluated against Subtask 1 evaluation metrics.
2.5</p>
      </sec>
      <sec id="sec-2-5">
        <title>Subtask 5: Query Intent Identification</title>
        <p>This task, introduced this year, required participants to classify queries with
respect to the underlying health intent. Health intents in this sub-task were
clustered into 8 high-level intents: (1) Disease/illness/syndrome/pathological
condition, (2) Drugs and medicinal substances, (3) Healthcare, (4) Test &amp;
procedures, (5) First aid, (6) Healthy lifestyle, (7) Human anatomy, (8) Organ
systems. For each high-level intent, there was a maximum of 13 low-level intents.
The health intent taxonomy is provided at https://github.com/CLEFeHealth/
CLEFeHealth2018IRtask/blob/master/clef2018_health_intent_taxonomy.csv.</p>
        <p>Given a query, participants needed to predict the correct intent underlying
the query. Note that a query may have multiple intents. For each query,
participants were asked to submit the top 3 intent predictions, in the form of the
taxonomy ID corresponding to the intent. Each participant could submit up
to 2 submissions in TREC format, where instead of a document ID,
participants should list a low-level taxonomy ID, e.g., 1.1, 1.4, 2.1. Submissions were
evaluated using mean reciprocal rank and NDCG@1,2,3. Matches at high-levels
intents are differentiated from matches at low-levels intents.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Queries</title>
      <p>
        This year CHS task used a new set of 50 queries, generated by the Khresmoi
project8 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which were issued by the general public to the HON9 search
service. These queries were manually selected by a domain expert from a sample of
raw queries from the HON search engine collected over a period of 6 months to
be representative of the type of queries posed to the search engine. Only
noncapitalized queries were taken into account to remove possible influence by web
crawlers using predetermined queries. Queries which seemed to be too
"specialized" (for example, complex medical terms) and in languages other than
English were excluded. Query intent was manually added to queries by the domain
expert using the taxonomy provided at https://github.com/CLEFeHealth/
CLEFeHealth2018IRtask/blob/master/clef2018_health_intent_taxonomy.csv.
On generation of the query set it was found that all 50 queries contained less
than or equal to two query terms, as detailed in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. As such a new set of 50
representative queries was generated using the same procedure, this time with the
extra stipulation that queries must contain greater than two query terms. These
50 queries were considered as the base queries and used in our CLEF eHealth
CHS Subtask 1. Queries were not preprocessed, for example any spelling
mistakes that may be present has not been removed; systems submitted by challenge
participants should have taken this into account.
      </p>
      <p>For Subtasks 2 and 3, each base query was augmented with 6 query
variations issued by 6 research students at QUT, with no medical knowledge. Each
student was asked to formulate a query for each of the 50 queries narrative.
No post-processing was applied to the formulated query variations: duplicates
and spelling errors were kept. Subtask 4 used parallel queries in the following
languages: French, German, and Czech. These queries are manual translations
of Subtask 1’s 50 base queries. Finally, Subtask 5 used the 50 base queries as in
Subtask 1.</p>
      <p>Queries were numbered using a 6 digits number with the following convention:
the first 3 digits of a query ID identified a topic number (information need)
ranged from 151 to 200. The last 3 digits of a query ID identified each individual
query creator. The base queries used the identifier 1 and research students query
variations used the identifier 2, 3, 4, 5, 6, and 7. Figure 1 shows the base query
and the 6 query variations for topic 152.</p>
      <p>For the multilingual queries, queries were placed within their respective
language tags: cz (Czech), de (German), en (English), and fr (France). Figure 2
shows topic 152’s queries in all 4 languages.
8 http://khresmoi.eu/
9 https://hon.ch/en/
&lt;queries&gt;
...
&lt;query&gt;
&lt;id&gt;152001&lt;/id&gt;
&lt;en&gt;emotional and mental disorders&lt;/en&gt;
&lt;/query&gt;
&lt;query&gt;
&lt;id&gt;152002&lt;/id&gt;
&lt;en&gt;work colleague depression&lt;/en&gt;
&lt;/query&gt;
&lt;query&gt;
&lt;id&gt;152003&lt;/id&gt;
&lt;en&gt;mental health problems, change in mood and withdrawn&lt;/en&gt;
&lt;/query&gt;
&lt;query&gt;
&lt;id&gt;152004&lt;/id&gt;
&lt;en&gt;mental health cause withdrawn mood changes&lt;/en&gt;
&lt;/query&gt;
&lt;query&gt;
&lt;id&gt;152005&lt;/id&gt;
&lt;en&gt;disease cause mental health behaviour change&lt;/en&gt;
&lt;/query&gt;
&lt;query&gt;
&lt;id&gt;152006&lt;/id&gt;
&lt;en&gt;mood alterations causes&lt;/en&gt;
&lt;/query&gt;
&lt;query&gt;
&lt;id&gt;152007&lt;/id&gt;
&lt;en&gt;uncommon mood change &lt;/en&gt;
&lt;/query&gt;
...
&lt;/queries&gt;</p>
    </sec>
    <sec id="sec-4">
      <title>Dataset</title>
      <p>The CHS task in 2016 and 2017 used the ClueWeb12-B1310 collection, a corpus of
more than 52 million Web pages. This year we introduced clefehealth2018 corpus.
This was crated by compiling Web pages of selected domains acquired from the
CommonCrawl 11. An initial list of Websites was identified for acquisition. The
list was built by submitting the CLEF 2018 base queries to the Microsoft Bing
APIs (through the Azure Cognitive Services) repeatedly over a period of few
weeks 12, and acquiring the URLs of the retrieved results. The domains of the
URLs were then included in the list, except some domains that were excluded for
decency reasons (e.g. pornhub.com). The list was further augmented by including
a number of known reliable health Websites and other known unreliable health
Websites, from lists previously compiled by health institutions and agencies.</p>
      <p>The corpus was divided into folders, by domain name. Each folder
contained a file for each Webpage from the domain available in the
CommonCrawl dump. In total, 2,021 domains were requested from the CommonCrawl
dump of 2018-09 13. Of the 2,021 domains in total, 1,903 were successfully
acquired. The remaining domains were discarded due to errors, corrupted or
incomplete data returned by the CommonCrawl API (a total of ten retries were
attempted for each domain before giving up on a domain). Of the 1,903 crawled
domains, 84 were not available in the CommonCrawl dump, and for these,
a folder in the corpus exists and represents the domain that was requested;
however, the folder is empty, meaning that it was not available in the dump.
Note that .pdf documents were excluded from the data acquired from
CommonCrawl. A complete list of domains and size of the crawl data for each domain
is available at https://github.com/CLEFeHealth/CLEFeHealth2018IRtask/
blob/master/clef2018collection_listofdomains.txt.</p>
      <p>The full collection, clefehealth2018 14, it contains 5,535,120 Web pages and
its uncompressed size is about 480GB. In addition to the full collection, an
alternative corpus named clefehealth2018_B 15 was created by manually removing a
number of domains that were not strictly health-related (e.g., news Websites).
This subset contains 1,653 domains and its size is about 294GB, uncompressed.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Baselines</title>
      <p>
        We generated 21 runs, from which 19 were for Sub-Task 1, 3 for Sub-Task 2, and
2 for Sub-Task 5, based on common baseline models and simple approaches for
fusing query variations. In this section we describe the baseline runs.
10 http://lemurproject.org/clueweb12/
11 http://commoncrawl.org/
12 repeated submissions over time were performed because previous work has shown
that Bing’s API results vary sensibly over time, both in terms of results and
effectiveness [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]
13 http://commoncrawl.org/2018/03/february-2018-crawl-archive-now-available/
14 clefehealth2018 is available at https://goo.gl/uBJaNi
15 clefehealth2018_B is available at https://goo.gl/uBJaNi
      </p>
      <p>For these systems, we created the runs using and not using the default
pseudorelevance feedback (PRF) of each toolkit. When using PRF, we added to the
original query the top 10 terms of the top 3 documents. All these baseline runs
were created using the Terrier, Indri and ElasticSearch instances made available
to participants in the Azure platform.</p>
      <p>Additionally, we created a set of baseline runs that took into account the
reliability and understandability of information.</p>
      <p>
        Also based on the BM25 baseline run of Terrier, two understandability
baselines were created using readability formulae. We created runs based on CLI
(Coleman-Liau Index) and GFI (Gunning Fox Index) scores [
        <xref ref-type="bibr" rid="ref13 ref4">4,13</xref>
        ], which are
a proxy for the number of years of the school required to read the text being
evaluated. These two readability formulae were chosen because they showed to
be robust across different methods for HTML preprocessing [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. We followed
one of the methods suggested in [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], in which the HTML documents are
preprocessed using Justext16[
        <xref ref-type="bibr" rid="ref31">31</xref>
        ], the main text was extracted, periods at the end
of sentences were added whenever they were necessary (e.g., in presence of line
breaks), and then readability scores were calculated. Given the initial score S
for a document and its readability score R, the final score for each document is
the combination of score obtained as S 1:0=R.
where D is set the documents to be ranked, R is the set of document rankings
retrieved for each query variation by the same retrieval model, r(d) is the
rank of document d, and k is a constant set to 60, as in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
16 https://pypi.python.org/pypi/jusText
We generated two baselines for SubTask 5 with MetaMap [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and QuickUMLS [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ],
both softwares were used to map the text in the queries to Unified Medical
Language System (UMLS) concepts from which pre-defined semantic types can be
extracted. A full list of UMLS semantic types can be found at https://metamap.
nlm.nih.gov/SemanticTypesAndGroups.shtml. Similar to previous work in the
literature [
        <xref ref-type="bibr" rid="ref16 ref24 ref25">16,25,24</xref>
        ], a mapping was established between the UMLS semantic
types and the search intents. This mapping is provided in the Github repository
of this year’s task.
6
      </p>
    </sec>
    <sec id="sec-6">
      <title>Participant Submissions</title>
      <p>This year, 7 participants from 7 countries submitted at least one run for any of
the subtasks, as shown in Table 1. Each participant could submit up to 4 runs
for Subtasks 1, 2, and 3, 4 runs for each language of Subtask 4, and 2 runs for
Subtask 5.</p>
      <p>We include below a self-described summary of the approach of each team
(with minor editing by the task coordinators).</p>
      <p>
        CUNI [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ]: They submited runs for IRTask1 and IRTask4. For IRTask1, Run
1, they used the Terrier’s index that is provided by the organisers without
applying any data preprocessing. Terrier’s implementation of Dirichlet smoothing
language model is used as the retrieval model with its default parameters. Run
2, they used the same retrieval mode as in Run 1, while as an index, they used
Terrier’s index that uses Porter-stemming method and English stop-word list.
Run 3, used Terrier’s implementation of TF-IDF model, for the purpose of
comparing between a vector-space model and an LM model (the one that is used in
Run 1), they used the same index as in Run 1. Run 4, they used Terrier’s
implementation of Kullback-Leiber divergence (KLD) for query expansion, with
number of top documents is set to 10 and number of terms for expansion is set
to 3.
      </p>
      <p>For IRTask 4, Run 1, they translated the queries in the source languages into
English and get 1-best-list translations. Retrieval is conducted using Dirichlet
model, and non-stemmed index. The same retrieval settings are used in the
following runs. Run 2, they used hypotheses reranking approach, in which each
query is translated into English and from the 15-best-list translations,
1-bestlist (in terms of IR quality) translation is selected for the retrieval. Run 3, first
they translated the queries into English and the 1-best-list that is produced by
the SMT system is chosen as a base query, then this query is expanded by one
term using term reranking approach. Run 4, this run is similar to Run1, the
only difference is that Google Translate is used to translate the queries into
English.</p>
      <p>
        IELAB [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]: The IELAB team participated in Subtask 1. The team addressed
the challenge by extending the Entity Query Feature Expansion model, a
knowledge base (KB) query expansion method. To obtain the query expansion terms,
first, we mapped entity mentions to KB entities by performing exact matching.
After mapping, we used the Title of the mapped KB entities as the source for
expansion terms. For our first three expanded query sets, we expanded the
original queries sourcing expansion terms from each of Wikipedia, the UMLS, and
the CHV. For our fourth expanded query set, we combined expansion terms
from Wikipedia and CHV.
      </p>
      <p>
        IMS [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]: They studied a query expansion approach that takes into account the
relationships between the terms of the query and the Medical Subject Headings
(MeSH) terms [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]; in addition, they evaluated different document scoring
strategies given the multiple ranking list produced by the query expansions [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ]. The
methodology used for query expansion stick to the following sequence of steps:
1) they identified the MeSH terms that are present in the query by means of the
MeshOnDemand1 API. 2) For each MeSH term found in the previous step, they
used the MeSHRDF2 database to look for semantically related (MeSH) terms.
3) We choosed a subset of all the possible relations (predicates) between terms
in the MeSHRDF database; then, we use this subset of predicates for query
expansion in different ways (change width and depth in the graph of related
terms). Given the list of MeSH terms, they combined these terms with the
original query and create many variants with small differences. For each variant,
they computed a ranked list of documents using the Elasticsearch engine. In
order to merge the ranked lists, they compared many weighting strategies
(average, sum of scores, normalized scores, etc.) to calculate a single score for each
document. Then, they ordered the documents and produce the final ranked list.
Miracl [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ]:The Miracl team submitted 4 runs in Subtask 1. The team
submitted 2 baseline runs with different weighting models (TF-IDF and BM25) and
the other 2 runs include query expansion using with two different ways, the
MeSH ontology (via scopeNotes and via related terms) and the BM25
weighting model.
      </p>
      <p>
        SINAI [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]: The SINAI team participated in Sub-task 1. They applied the
query expansion technique using the most famous search engine at the
moment: Google. Using the search engine, they added additional information not
previously included in the query. They identified the medical concepts using
cTAKES. This recognizer provides them with UMLS concepts in the expanded
query. This way, they avoided introducing noise with words that are not related
to the user query.
      </p>
      <p>
        UB-Botswana [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]: They described the methods deployed in the different runs
submitted for their participation to the CLEF eHealth2018 Task 3: Consumer
Health Search Task, IRTask 3: Query Variations. In particular, they deployed
data fusion techniques to merge search results generated by multiple query
variants. As improvement, they attempted to alleviate the term mismatch between
the queries and the relevant documents by deploying query expansion before
merging the results. For their baseline system, they concatenated the multiple
query variants for retrieval and then deploy query expansion.
      </p>
      <p>
        UEvora [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ]: The work of UEvora explored the use of learning to rank
techniques as well as query expansion approaches. A number of field based features
were used in training a learning to rank model. A medical concept model
proposed in their previous work is re-employed with this year’s new task. A word
vectors model and UMLS are used as the query expansion sources.
7
      </p>
    </sec>
    <sec id="sec-7">
      <title>Assessments</title>
      <p>
        To assess submissions for Subtasks 1, 2, 3, and 4, first, we collected all unique
topic-document pairs from all submitted runs. Then, using the RBP-based Method
A (Summing contributions) by Moffat et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], we weighted each document
according to their overall contribution to the effectiveness evaluation as provided
the RBP formula (with p=0.8, following Park and Zhang [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]). This strategy,
named RBPA, was chosen because it was shown that it should be preferred
over traditional fixed-depth or stratified pooling when deciding upon the
pooling strategy to be used to evaluate systems under fixed assessment budget
constraints [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], as is the case for this task. For each topic, we selected the top 500
weighted documents to form an assessment pool of 25,000 topic-document pairs.
      </p>
      <p>
        After the assessment pool was formed, we developed a number of relevance
assessment tasks to be issued as HITs on Amazon Mechanical Turk.
Workers, selected among those with a 90% acceptance rate and at least 1,000 tasks
completed, were presented with the base query and the narrative of a topic
with a link to the archived Webpage to be assessed. Currently, these
assessments are in progress. Relevance assessments will be provided with respect to
the grades Highly relevant, Somewhat relevant and Not Relevant.
Readability/understandability and reliability/trustworthiness assessments will also be
collected for the documents in the assessment pool. These assessments are
collected using an integer value between 0 and 100 (lower values meant harder to
understand document / low reliability) provided by assessors through a slider
tool and will be used to evaluate systems across different dimensions of
relevance [
        <xref ref-type="bibr" rid="ref37 ref38">38,37</xref>
        ].
      </p>
      <p>As a pre-work task, workers have been asked to explicitly state what they
understood on the topic and the narrative: workers that provided incorrect answers
are not assigned HITs.</p>
      <p>For Subtask 5, the 50 base queries were manually labelled with search
intent(s) according to the search intents taxonomy (see Section 2.5. These manual
labels were used for assessment.
8</p>
    </sec>
    <sec id="sec-8">
      <title>Conclusions</title>
      <p>This paper described methods and analysis of the CLEF 2018 eHealth Evaluation
Lab Consumer Health Search Task. The task considered the problem of retrieving
Web pages for people seeking health information regarding medical conditions,
treatments and suggestions. The task was divided into 5 Sub-tasks including
ad-hoc search, query variations, and multilingual ad-hoc search. Seven teams
participated in the task; relevance assessment is underway and assessments along
with the participants results will be released at the CLEF 2018 conference (and
will be available at the task’s GitHub repository).</p>
      <p>
        As a by-product of this evaluation exercise, the task makes available to the
research community a collection with associated assessments and evaluation
framework (including readability and reliability evaluation) that can be used
to evaluate the effectiveness of retrieval methods for health information seeking
on the web (e.g. [
        <xref ref-type="bibr" rid="ref23 ref26">23,26</xref>
        ]).
      </p>
      <p>Baseline runs, participant runs and results, assessments, topics and query
variations are available online at the GitHub repository for this Task: https:
//github.com/CLEFeHealth/CLEFeHealth2018IRtask.
9</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments</title>
      <p>Jimmy is sponsored by the Indonesia Endowment Fund for Education (Lembaga
Pengelola Dana Pendidikan / LPDP).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Aronson</surname>
          </string-name>
          .
          <article-title>Effective mapping of biomedical text to the umls metathesaurus: the metamap program</article-title>
          .
          <source>In Proceedings of the AMIA Symposium, page 17</source>
          . American Medical Informatics Association,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>N.</given-names>
            <surname>Aswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Beckers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Birngruber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Boyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Burner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bystron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Choukri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cruchet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Cunningham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dedek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dolamic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Donner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dungs</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Eggel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Foncubierta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Fuhr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Funk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garcia Seco de Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gaudinat</surname>
          </string-name>
          , G. Georgiev,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gobeill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Greenwood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gschwandtner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hajic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hlavácová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Holzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jordan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jordan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kaderk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Kainberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kriewel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kritz</surname>
          </string-name>
          , G. Langs,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lawson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Markonis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Martinez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Momtchev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Masselot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mazo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pecina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Pentchev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Peychev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Pletneva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pottecher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ruch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Samwald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schneller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stefanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Tinte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Urešová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vargas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Vishnyakova. Khresmoi - Multimodal Multilingual Medical Information</surname>
          </string-name>
          <article-title>Search</article-title>
          .
          <source>In Proceedings of Medical Informatics Europe</source>
          <year>2012</year>
          (
          <article-title>MIE 2012), Village of the Future</article-title>
          ,
          <year>August 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>N.</given-names>
            <surname>Aswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Greenwood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Samwald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Pletneva</surname>
          </string-name>
          , G. Jones, and
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          .
          <source>D1</source>
          .
          <article-title>3 - Report on results of the WP1 first evaluation phase</article-title>
          .
          <source>Technical report, Khresmoi Project</source>
          ,
          <year>August 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M.</given-names>
            <surname>Coleman</surname>
          </string-name>
          and
          <string-name>
            <given-names>T. L.</given-names>
            <surname>Liau</surname>
          </string-name>
          .
          <article-title>A computer readability formula designed for machine scoring</article-title>
          .
          <source>Journal of Applied Psychology</source>
          ,
          <volume>60</volume>
          :
          <fpage>283</fpage>
          -
          <lpage>284</lpage>
          ,
          <year>1975</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>G. V.</given-names>
            <surname>Cormack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L. A.</given-names>
            <surname>Clarke</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Buettcher</surname>
          </string-name>
          .
          <article-title>Reciprocal rank fusion outperforms condorcet and individual rank learning methods</article-title>
          .
          <source>In Proceedings of the 32Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '09</source>
          , pages
          <fpage>758</fpage>
          -
          <lpage>759</lpage>
          , New York, NY, USA,
          <year>2009</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>M. C. Díaz-Galiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>López-Úbeda</surname>
          </string-name>
          , M.-
          <string-name>
            <surname>T.</surname>
            Martín-Valdivia, and
            <given-names>L. A. UreñaLópez.</given-names>
          </string-name>
          <article-title>Sinai at clef ehealth 2018 task 3. using ctakes to remove noise from expanding queries with google</article-title>
          .
          <source>In CEUR Workshop Proceedings: Working Notes of CLEF</source>
          <year>2018</year>
          :
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>S.</given-names>
            <surname>Fox</surname>
          </string-name>
          . Health topics:
          <volume>80</volume>
          %
          <article-title>of internet users look for health information online</article-title>
          .
          <source>Pew Internet &amp; American Life Project</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hegarty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hodmon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kriewel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lupu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Markonis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pecina</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Schneller</surname>
          </string-name>
          .
          <source>D7</source>
          .
          <article-title>3 Meta-analysis of the second phase of empirical and user-centered evaluations</article-title>
          .
          <source>Technical report, Khresmoi Project</source>
          ,
          <year>August 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. J.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leveling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Salantera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Suominen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Zuccon.</surname>
          </string-name>
          <article-title>Share/clef ehealth evaluation lab 2013, task 3: Information retrieval to address patients' questions when reading clinical reports</article-title>
          .
          <source>CLEF 2013 Online Working Notes</source>
          ,
          <volume>8138</volume>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. L.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>G. J.</given-names>
          </string-name>
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Leveling</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Lupu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Palotti</surname>
            , and
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Zuccon</surname>
          </string-name>
          .
          <article-title>An Analysis of Evaluation Campaigns in ad-hoc Medical Information Retrieval: CLEF eHealth 2013 and 2014</article-title>
          . Springer Information Retrieval Journal,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. L.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>G. J.</given-names>
          </string-name>
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Mueller</surname>
            , and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Zobel</surname>
          </string-name>
          .
          <source>Report on the SIGIR 2014 Workshop on Medical Information Retrieval (MedIR)</source>
          .
          <source>SIGIR Forum</source>
          ,
          <volume>48</volume>
          (
          <issue>2</issue>
          ),
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. L.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Palotti</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Pecina</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Hanbury</surname>
            , and
            <given-names>H. M. Gareth J.F.</given-names>
          </string-name>
          <string-name>
            <surname>Jones</surname>
          </string-name>
          .
          <source>ShARe/CLEF eHealth Evaluation Lab</source>
          <year>2014</year>
          ,
          <article-title>Task 3: User-centred health information retrieval</article-title>
          .
          <source>In CLEF 2014 Evaluation Labs and Workshop:</source>
          Online Working Notes, Sheffield, UK,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>R.</given-names>
            <surname>Gunning</surname>
          </string-name>
          .
          <article-title>The Technique of Clear Writing</article-title>
          .
          <string-name>
            <surname>McGraw-Hill</surname>
          </string-name>
          ,
          <year>1952</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>K.</given-names>
            <surname>Järvelin</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Kekäläinen</surname>
          </string-name>
          .
          <article-title>Cumulated gain-based evaluation of IR techniques</article-title>
          .
          <source>ACM Transactions on Information Systems</source>
          ,
          <volume>20</volume>
          (
          <issue>4</issue>
          ):
          <fpage>422</fpage>
          -
          <lpage>446</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Jimmy</surname>
            ,
            <given-names>G.</given-names>
            Zuccon, and G.
          </string-name>
          <string-name>
            <surname>Demartini</surname>
          </string-name>
          .
          <article-title>On the volatility of commercial search engines and its impact on information retrieval research</article-title>
          .
          <source>In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '18</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Jimmy</surname>
            , G. Zuccon, and
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Koopman</surname>
          </string-name>
          .
          <article-title>Choices in knowledge-base retrieval for consumer health search</article-title>
          .
          <source>In European Conference on Information Retrieval</source>
          , pages
          <fpage>72</fpage>
          -
          <lpage>85</lpage>
          . Springer,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Jimmy</surname>
            , G. Zuccon, and
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Koopman</surname>
          </string-name>
          .
          <article-title>Qut ielab at clef 2018 consumer health search task: Knowledge base retrieval for consumer health search</article-title>
          .
          <source>In CEUR Workshop Proceedings: Working Notes of CLEF</source>
          <year>2018</year>
          :
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>A.</given-names>
            <surname>Lipani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Palotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lupu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Piroi</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Zuccon, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          .
          <article-title>Fixed-cost pooling strategies based on ir evaluation measures</article-title>
          .
          <source>In European Conference on Information Retrieval</source>
          , pages
          <fpage>357</fpage>
          -
          <lpage>368</lpage>
          . Springer,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>D. McDaid</surname>
            and
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Park</surname>
          </string-name>
          .
          <article-title>Online health: Untangling the web. evidence from the bupa health pulse 2010 international healthcare survey</article-title>
          .
          <source>Technical report</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>A.</given-names>
            <surname>Moffat</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Zobel</surname>
          </string-name>
          .
          <article-title>Rank-biased precision for measurement of retrieval effectiveness</article-title>
          .
          <source>ACM Trans. Inf</source>
          . Syst.,
          <volume>27</volume>
          (
          <issue>1</issue>
          ):2:
          <fpage>1</fpage>
          -
          <lpage>2</lpage>
          :
          <fpage>27</fpage>
          ,
          <string-name>
            <surname>Dec</surname>
          </string-name>
          .
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>N.</given-names>
            <surname>Motlogelwa</surname>
          </string-name>
          , E. Thuma, and
          <string-name>
            <surname>T.</surname>
          </string-name>
          Leburu-Dingalo.
          <article-title>Merging search results generated by multiple query variants using data fusion</article-title>
          .
          <source>In CEUR Workshop Proceedings: Working Notes of CLEF</source>
          <year>2018</year>
          :
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>G. M. D. Nunzio</surname>
            and
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Moldovan</surname>
          </string-name>
          .
          <article-title>A study on query expansion with mesh terms and elasticsearch. ims unipd at clef ehealth task 3</article-title>
          .
          <source>In CEUR Workshop Proceedings: Working Notes of CLEF</source>
          <year>2018</year>
          :
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>J. Palotti</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>Zuccon, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          .
          <article-title>Ranking health web pages with relevance and understandability</article-title>
          .
          <source>In Proceedings of the 39th international ACM SIGIR conference on Research and development in information retrieval</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <given-names>J.</given-names>
            <surname>Palotti</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          .
          <article-title>Tuw@ trec clinical decision support track 2015</article-title>
          .
          <source>Technical report</source>
          , Vienna University of Technology Vienna Austria,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>J. Palotti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Hanbury</surname>
            , and
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Müller</surname>
          </string-name>
          .
          <article-title>Exploiting health related features to infer user expertise in the medical domain</article-title>
          .
          <source>In Web Search Click Data workshop at WSCM</source>
          , New York City, NY, USA,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>J. Palotti</surname>
            , G. Zuccon,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Bernhardt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Hanbury</surname>
            , and
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Goeuriot</surname>
          </string-name>
          .
          <article-title>Assessors Agreement: A Case Study across Assessor Type, Payment Levels, Query Variations and Relevance Dimensions</article-title>
          .
          <source>In Experimental IR Meets Multilinguality, Multimodality, and Interaction: 7th International Conference of the CLEF Association, CLEF'16 Proceedings</source>
          . Springer International Publishing,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>J. Palotti</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Hanburyn</surname>
            ,
            <given-names>G. J.</given-names>
          </string-name>
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Lupu</surname>
            , and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Pecina</surname>
          </string-name>
          .
          <source>CLEF eHealth Evaluation Lab</source>
          <year>2015</year>
          ,
          <article-title>Task 2: Retrieving Information about Medical Symptoms</article-title>
          .
          <source>In CLEF 2015 Online Working Notes. CEUR-WS</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>J. Palotti</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>Zuccon, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          .
          <article-title>The influence of pre-processing on the estimation of readability of web documents</article-title>
          .
          <source>In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management</source>
          ,
          <source>CIKM '15</source>
          , pages
          <fpage>1763</fpage>
          -
          <lpage>1766</lpage>
          , New York, NY, USA,
          <year>2015</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>J. Palotti</surname>
            , G. Zuccon, Jimmy,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Pecina</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Lupu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Kelly</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          .
          <article-title>CLEF 2017 Task Overview: The IR Task at the eHealth Evaluation Lab</article-title>
          . In Working Notes of Conference and
          <article-title>Labs of the Evaluation (CLEF) Forum</article-title>
          , CEUR Workshop Proceedings,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <given-names>L.</given-names>
            <surname>Park</surname>
          </string-name>
          and
          <string-name>
            <surname>Y. Zhang.</surname>
          </string-name>
          <article-title>On the distribution of user persistence for rank-biased precision</article-title>
          .
          <source>In Proceedings of the 12th Australasian document computing symposium</source>
          , pages
          <fpage>17</fpage>
          -
          <lpage>24</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <given-names>J.</given-names>
            <surname>Pomikálek</surname>
          </string-name>
          .
          <article-title>Removing boilerplate and duplicate content from web corpora</article-title>
          .
          <source>PhD thesis</source>
          , Masaryk university, Faculty of informatics, Brno, Czech Republic,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <given-names>S.</given-names>
            <surname>Saleh</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Pecina</surname>
          </string-name>
          .
          <article-title>Cuni team: Clef ehealth consumer health search task 2018</article-title>
          .
          <source>In CEUR Workshop Proceedings: Working Notes of CLEF</source>
          <year>2018</year>
          :
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <given-names>L.</given-names>
            <surname>Soldaini</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Goharian</surname>
          </string-name>
          .
          <article-title>QuickUMLS: a fast, unsupervised approach for medical concept extraction</article-title>
          .
          <source>In SIGIR MedIR Workshop</source>
          , Pisa, Italy,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34. H.
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Kanoulas</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Azzopardi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Spijker</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Névéol</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Ramadier</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Robert</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Palotti</surname>
            , Jimmy, and
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Zuccon</surname>
          </string-name>
          .
          <article-title>Overview of the clef ehealth evaluation lab 2018</article-title>
          .
          <source>CLEF 2018 - 8th Conference and Labs of the Evaluation Forum, Lecture Notes in Computer Science (LNCS)</source>
          . Springer,
          <year>September 2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <given-names>H.</given-names>
            <surname>Yang</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Gonçalves</surname>
          </string-name>
          .
          <article-title>Improving personalized consumer health search:notebook for ehealth at clef 2018</article-title>
          .
          <source>In CEUR Workshop Proceedings: Working Notes of CLEF</source>
          <year>2018</year>
          :
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <given-names>S.</given-names>
            <surname>Zayani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ksentini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tmar</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Gargouri</surname>
          </string-name>
          . Miracl at clef 2018 :
          <article-title>Consumer health search task</article-title>
          .
          <source>In CEUR Workshop Proceedings: Working Notes of CLEF</source>
          <year>2018</year>
          :
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          37. G. Zuccon.
          <article-title>Understandability biased evaluation for information retrieval</article-title>
          .
          <source>In Advances in Information Retrieval</source>
          , pages
          <fpage>280</fpage>
          -
          <lpage>292</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          38.
          <string-name>
            <given-names>G.</given-names>
            <surname>Zuccon</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Koopman</surname>
          </string-name>
          .
          <article-title>Integrating understandability in the evaluation of consumer health search engines</article-title>
          .
          <source>In Medical Information Retrieval Workshop at SIGIR</source>
          <year>2014</year>
          , page
          <volume>32</volume>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          39. G. Zuccon,
          <string-name>
            <given-names>J.</given-names>
            <surname>Palotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lupu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pecina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mueller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Budaher</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Deacon</surname>
          </string-name>
          .
          <source>The IR Task at the CLEF eHealth Evaluation Lab</source>
          <year>2016</year>
          :
          <article-title>User-centred Health Information Retrieval. In CLEF 2016 Evaluation Labs</article-title>
          and Workshop: Online Working Notes, CEUR-WS,
          <year>September 2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          40. G. Zuccon,
          <string-name>
            <given-names>J.</given-names>
            <surname>Palotti</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          .
          <article-title>Query variations and their effect on comparing information retrieval systems</article-title>
          .
          <source>In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management</source>
          ,
          <source>CIKM '16</source>
          , pages
          <fpage>691</fpage>
          -
          <lpage>700</lpage>
          , New York, NY, USA,
          <year>2016</year>
          . ACM.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>