<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CLEF 2017 Task Overview: The IR Task at the eHealth Evaluation Lab</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Joao Palotti</string-name>
          <email>palotti@ifs.tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guido Zuccon</string-name>
          <email>g.zuccon@qut.edu.au</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jimmy</string-name>
          <email>jimmy@qut.edu.au</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pavel Pecina</string-name>
          <email>pecina@ufal.mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mihai Lupu</string-name>
          <email>lupu@ifs.tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lorraine Goeuriot</string-name>
          <email>lorraine.goeuriot@imag.fr</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liadh Kelly</string-name>
          <email>liadh.kelly@dcu.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Allan Hanbury</string-name>
          <email>hanbury@ifs.tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ADAPT Centre, Dublin City University</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Charles University</institution>
          ,
          <addr-line>Prague</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Queensland University of Technology</institution>
          ,
          <addr-line>Brisbane</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Univ. Grenoble Alpes</institution>
          ,
          <addr-line>CNRS, Grenoble INP, LIG, F-38000 Grenoble</addr-line>
          <country country="FR">France</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Vienna University of Technology</institution>
          ,
          <addr-line>Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper provides an overview of the information retrieval (IR) Task of the CLEF 2017 eHealth Evaluation Lab. This task investigates the effectiveness of web search engines in providing access to medical information for common people that have no or little medical knowledge (health consumers). The task aims to foster advances in the development of search technologies for consumer health search by providing resources and evaluation methods to test and validate search systems. The problem considered in this year's task was to retrieve web pages to support the information needs of health consumers that are faced with a medical condition and that want to seek relevant health information online through a search engine. The task re-used the 2016 topics, to deepen the assessment pool and create a more comprehensive and reusable collection. The task had four sub-tasks: ad-hoc search, personalized search, query variations, and multilingual ad-hoc search. Seven teams participated in the task; relevance assessment is underway and assessments along with the participants results will be released at the CLEF 2017 conference. Resources for this task, including topics, assessments, evaluation scripts and participant runs are available at the task's GitHub repository: https://github.com/CLEFeHealth/CLEFeHealth2017IRtask/</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>process, with a shared collection of documents and queries, the contribution of
runs from participants and the subsequent formation of relevance assessments
and evaluation of the participants submissions.</p>
      <p>
        The task investigated the problem of retrieving web pages to support
information needs of health consumers (including their next-of-kin) that are
confronted with a health problem or medical condition and that use a search engine
to seek better understanding about their health. This task has been developed
within the CLEF 2017 eHealth Evaluation Lab, which aims to foster the
development of approaches to support patients, their next-of-kin, and clinical staff in
understanding, accessing and authoring health information [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        The use of the Web as source of health-related information is a wide-spread
practice among health consumers [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and search engines are commonly used as
a means to access health information available online [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Previous iterations of this task (i.e. the 2013 and 2014 CLEF eHealth Lab
Task 3 [
        <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
        ]) aimed at evaluating the effectiveness of search engines to support
people when searching for information about their conditions, e.g., to answer
queries like “thrombocytopenia treatment corticosteroids length”. These two
evaluation exercises have provided valuable resources and an evaluation
framework for developing and testing new and existing techniques. The fundamental
contribution of these tasks to the improvement of search engine technology aimed
at answering this type of health information need is demonstrated by the
improvements in retrieval effectiveness provided by the best 2014 system [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] over
the best 2013 system [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] (using different, but comparable, topic sets).
      </p>
      <p>
        In 2015 the task took a different focus, specifically focusing on supporting
consumers searching for self-diagnosis information [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], an important type of
health information seeking activity [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Last year’s task expanded on the 2015
task, by considering not only self-diagnosis information needs, but also needs
related to treatment and management of health conditions [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. Previous research
has shown that exposing people with no or scarce medical knowledge to
complex medical language may lead to erroneous self-diagnosis and self-treatment
and that access to medical information on the Web can lead to the escalation
of concerns about common symptoms (e.g., cyberchondria) [
        <xref ref-type="bibr" rid="ref1 ref22">1,22</xref>
        ]. Research has
also shown that current commercial search engines are still far from being
effective in answering such unclear and underspecified queries [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. This year’s task
continues the growth path identified in past years and focuses on conducting
assessments on deeper pooled sets than was possible in previous years of the task.
The subtasks within this year’s IR challenge are similar to 2016’s: ad hoc search,
query variation, and multilingual search. A new subtask is also introduced, aimed
at exploring methods to personalize health search.
      </p>
      <p>This paper is structured as follows: Section 2 details the four sub-tasks we
considered this year; Section 3 describes the data collection, while Section 4
described the query set and the methodology used to create it; Section 5 lists
the participants and their submissions; Section 6 details the methods used to
create the assessment pools and relevance criteria; Section 7 lists the evaluation
metrics used for this Task; finally, Section 8 concludes this overview paper.</p>
    </sec>
    <sec id="sec-2">
      <title>Tasks</title>
      <p>2.1</p>
      <sec id="sec-2-1">
        <title>IRTask1: Ad-hoc Search</title>
        <p>This is a standard ad-hoc search task, aiming at retrieving information relevant
to people seeking health advice on the web. In this year’s task, we re-used the
2016 topics, with the aim of improving the relevance assessment pool and the
collection reusability (increase the pool depth). Because we re-used last year’s
topics, we asked participants to explicitly exclude from their search results for
each query documents that have been already assessed in 2016 (a list of these
documents was provided to participants, along with a script for checking
submissions). Participants were highly encouraged to devise methods that explicitly
explore relevance feedback, i.e. using the already assessed documents to improve
their submissions.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>IRTask2: Personalized Search</title>
        <p>This task develops on top of the IRTask1. Here, we aimed to personalize the
retrieved list of search results so as to match user expertise, measured by how
likely the person is to be satisfied with the content of a document with respect
to the expertise level of the health information.</p>
        <p>Each topic in the collection has 6 query variations: the first 3 have been
issued by people with no medical knowledge, while the second 3 have been issued
by medical experts. When evaluating results for a query variation, we use a
parameter alpha to capture user expertise. The parameter determines the shape
of the gain curve, so that documents at the right understandability level obtain
the highest gains, with decaying gains being assigned to documents that do not
suit the understandability level of the modelled user. We use α=0.0 for query
variation 1, α=0.2 for query variation 2, α=0.4 for query variation 3, α=0.6
for query variation 4, α=0.8 for query variation 5 and, finally, α=1.0 for query
variation 6. This models increasing levels of expertise across query variations for
one topic. The intuition in such evaluation is that a person with no specific health
knowledge (represented by query variant 1) would not understand complex and
technical health material, while an expert (represented by query variant 6) would
have little or no interest in reading introductory/basic material. For more details
about this evaluation measure, we refer the reader to Section 7.</p>
        <p>Note that the 2016 collection includes assessments for understandability (for
the same documents for which relevance was assessed), thus they could be used
by teams for training. These understandability assessments are contained in the
qunder files (similar to qrels, but for understandability) available at https:
//github.com/CLEFeHealth/CLEFeHealth2016Task3.</p>
        <p>As for IRTask1, we asked participants to explicitly exclude from their search
results for each query documents that have been already assessed in 2016.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>IRTask3: Query Variations</title>
        <p>IRTask1 and 2 treated query variations for a topic independently. IRTask3
instead explicitly explores the dependencies among query variations for the same
information need. The task aims to foster research into building search systems
that are robust to query variations. Different query variations were generated
for the same forum entry (i.e. topic/information need), thus capturing the
variability intrinsic in how people formulate queries when searching to answer the
same information need.</p>
        <p>For IRTask3 we asked participants to submit a single set of results for each
topic (each topic has 6 query variations). Participants were informed of which
query variations relate to the same topic, and should have taken these variations
into account when building their systems.
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>IRTask4: Multilingual Search</title>
        <p>The goal of this sub-task is to foster research in multilingual information
retrieval, developing techniques to support users that can express their
information need well in their native language and can read the results in English. This
task, similar to the corresponding one in 2016, offers parallel queries in several
languages (Czech, French, Hungarian, German, Polish, Spanish and Swedish).
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Dataset</title>
      <p>
        The 2013, 2014 and 2015 IR tasks in the CLEF eHealth Lab used the Khresmoi
collection [
        <xref ref-type="bibr" rid="ref18 ref4 ref5 ref7 ref8">8,7,4,5,18</xref>
        ], a collection of about 1 million health web pages. Since last
year we have set a new challenge to the participants by using the
ClueWeb12B136, a collection of more than 52 million web pages. As opposed to the Khresmoi
collection, the crawl in ClueWeb12-B13 is not limited to certified Health On the
Net websites and known health portals, but it is a higher-fidelity representation
of a common Internet crawl, making the dataset more in line with the content
current web search engines index and retrieve.
      </p>
      <p>
        For participants who did not have access to the ClueWeb dataset, Carnegie
Mellon University granted the organisers permission to make the dataset
available through cloud computing instances7 provided by Microsoft Azure. The
Azure instances that were made available to participants for the IR challenge
included (1) the Clueweb12-B13 dataset, (2) standard indexes built with the
Terrier8 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], Indri9 [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] and Elasticsearch10 toolkits, (3) additional resources such
6 http://lemurproject.org/clueweb12/
7 The organisers are thankful to Carnegie Mellon University, and in particular to
Jamie Callan and Christina Melucci, for their support in obtaining the permission
to redistribute ClueWeb 12. The organisers are also thankful to Microsoft Azure
who provided the Azure cloud computing infrastructure that was made available to
participants through the Microsoft Azure for Research Award CRM:0518649.
8 http://terrier.org/
9 http://www.lemurproject.org/indri.php
10 https://www.elastic.co/
as a spam list [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], Page Rank scores, anchor texts [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], urls, etc. made available
through the ClueWeb12 website.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Query Set</title>
      <p>Although a considerable number of documents (25,000) were assessed in the
2016 task, we decided to further deepen the assessment pools this year. Thus,
the same topics developed in 2016 were kept for the 2017 task. For crafting the
topics, we considered real health information needs expressed by the general
public through posts published in public health web forums. Forum posts were
extracted from the AskDocs section of Reddit11. This section allows users to
post a description of a medical case or ask a medical question seeking medical
information such as diagnosis, or details regarding treatments. Users can also
interact through comments. We selected posts that were descriptive, clear and
understandable. Posts with information regarding the author or patient (in case
the post author sought help for another person), such as demographics (age,
gender), medical history and current medical condition, were preferred.</p>
      <p>The posts were manually selected by a student, and a total of 50 posts were
used for query creation. Each of the selected forum posts were presented to 6
query creators with different medical expertise: these included 3 medical experts
(final year medical students undertaking rotations in hospitals) and 3 lay users
with no prior medical knowledge.</p>
      <p>A total of 300 queries were created. Queries were numbered using the
following convention: the first 3 digits of a query id identify a post number (information
need), while the last 3 digits of a query id identify each individual query creator.
Expert query creators used the identifier 1, 2 and 3 and laypeople query creators
used the identifiers 4, 5 and 6. Queries and the posts used to generate them can
be accessed at https://github.com/CLEFeHealth/CLEFeHealth2017IRtask/
tree/master/queries.</p>
      <p>For the query variations element of the task (sub-task 3), participants were
told which queries were related to the same information need, to allow them to
produce one set of results to be used as answer for all query variations of an
information need.</p>
      <p>For the multilingual element of the challenge (sub-task 4), Czech, French,
Hungarian, German, Polish, Spanish and Swedish translations of the queries
were provided. Queries were translated by medical experts hired through a
professional translation company.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Participants Submission</title>
      <p>Out of the 43 registered participating teams, 7 submitted runs. Table 1 lists the
participating teams and the number of submitted runs for each one of the tasks.
Teams were allowed to submit up to 7 runs and priority was sequentially given
11 https://www.reddit.com/r/AskDocs/
Team Name University Country 1Sub2-Tas3k 4
CUNI Charles University in Prague Czech Republic 3 - - 28
IELAB Queensland University of Technology Australia 7 - -
KISTI Korean Institute of Science and Technology Information Korea 3 - -
SINAI Universidad de Ja´en Spain 3 - -
TUW Vienna University of Technology Austria 7 7 -
ub-botswana University of Botswana Botswana 5 - -
UEvora Universidade de E´vora Portugal 5 - -
7 Teams 7 Institutions 7 Countries 33 7 0 28
for assessment depending on the run number. Thus runs were sampled according
to their priority: the priority of a run is expressed by the number that is assigned
to the run by the participant, i.e., run 2 has a higher priority (and thus higher
likelihood of inclusion in the assessment pool) than run 3. Run 1 (the baseline)
has the highest priority; run 7 the lowest.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Assessments</title>
      <p>
        Assessments are currently in progress. Similar to the 2016 pool, this year the
pool was created using the RBP-based Method A (Summing contributions) by
Moffat et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], in which documents are weighted according to their overall
contribution to the effectiveness evaluation as provided by the RBP formula
(with p=0.8, following Park and Zhang [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]). This strategy was chosen because
it was shown that it should be preferred over traditional fixed-depth or stratified
pooling when deciding upon the pooling strategy to be used to evaluate systems
under fixed assessment budget constraints [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], as it is the case for this task.
      </p>
      <p>
        Following the suggestions of Palotti et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], we adopted a two-stage
approach to gather multi-dimensional relevance assessments. In the first stage of
such a method, assessor time from highly-paid expert assessors is focused on
assessing topical relevance and document trustworthiness (for relevant documents).
In the second stage, understandability assessments are acquired employing less
expert or less expensive assessors. The use of such a two-stage approach for
collecting assessments has the potential of reducing the overall cost of evaluation,
allowing assessment of more documents.
      </p>
      <p>
        The relevance criteria created in 2016 were re-used this year. They were
drafted considering the entirety of the forum posts used to create the queries, a
link to the forum posts was also provided to the assessors. Relevance assessments
were provided with respect to the grades Highly relevant, Somewhat relevant
and Not Relevant. Readability/understandability and reliability/trustworthiness
judgements were also collected for the documents in the assessment pool. These
judgements were collected using a integer value between 0 and 100 (lower values
meant harder to understand document / low reliability) provided by judges
through a slider tool; these judgements were used to evaluate systems across
different dimensions of relevance [
        <xref ref-type="bibr" rid="ref24 ref25">25,24</xref>
        ]. All assessments were collected through
a purposely customised version of the Relevation toolkit [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
7
      </p>
    </sec>
    <sec id="sec-7">
      <title>Evaluation Metrics</title>
      <p>System evaluation is conducted with both topical relevance centred metrics as
well as understandability biased metrics in this task. Multiple evaluation metrics
are used depending on the sub-task.</p>
      <p>
        For IRTasks 1 and 4, evaluation is conducted with standard topical relevance
centred metrics: Precision at 10 (P@10), Normalized Discounted Cumulative
Gain at depth 10 (NDCG@10) and Rank Biased Precision with μ parameter set
to 0.8 (RBP(0.8)), as done in previous years [
        <xref ref-type="bibr" rid="ref18 ref27 ref5">27,18,5</xref>
        ].
      </p>
      <p>
        Submissions to IRTask3 are evaluated using the same measures as for
IRTask1 but using the mean-variance evaluation framework (MVE) [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. In this
framework, evaluation results for each query variations for a topic are averaged
and their variance also accounted for to compute a final system performance
estimate. A script that implements the mean-variance evaluation framework is
available at https://github.com/CLEFeHealth/CLEFeHealth2017IRtask.
      </p>
      <p>
        Submissions to IRTask2 are evaluated by understandability biased metrics [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ].
We consider the understandability-biased Rank Biased Precision, also with μ
parameter set to 0.8 (uRBP(0.8)), and propose three new metrics for this subtask.
      </p>
      <p>The first personalization-aware metric is a specialization of uRBP, α-uRBP,
which uses an α parameter to model the kind of documents a user wants to read.
We assume that α is a parameter that represents the understandability profile of
an entity. A low α is assigned to items/documents/users that are experts, while
a high α means the opposite. We assume that a user with a low α is interested in
reading specialized documents as opposed to easy and introductory documents,
while a high α represents users preferring the opposite, i.e. easy and introductory
documents over specialized ones. We model in α-uRBP a penalty for the case
in which a low α document is retrieved for a user that wants high α documents
and vice versa. While we are still investigating which function is best to model
this penalty, for now we assume penalty score drawn from a normal distribution.
Figure 1 shows an example in which a user is seeking to read documents with
α=20 and other values for α would have a penalty associated to them according
to a Gaussian curve centered at 20 and with standard deviation of 30. We use
the standard deviation of 30 in this evaluation campaign – methods to better
estimate these parameters are left for future work.</p>
      <p>The second and third personalization-aware metrics are simple modifications
of Precision at depth X. For the relevant documents found up to rank X, we
inspect how far the understandability label of each document is to the expected
value required by a user. We could penalize the absolute difference linearly
(LinUndP@X) or using the same Gaussian curve as in α-uRBP (GaussianUndP@10).
Note that lower values are better for LinUndP@10, meaning that the distance
from the required understandability value is small, and higher values are
better for GaussianUndP@10, as a value of 100 is the best value one could reach.
Scripts that implement the α-uRBP, LinUndP@X and GaussianUndP@10 are
also available at https://github.com/CLEFeHealth/CLEFeHealth2017IRtask.
8</p>
    </sec>
    <sec id="sec-8">
      <title>Conclusion</title>
      <p>This document describes the settings and evaluation methods used in the IR
Task of CLEF 2017 eHealth Evaluation Lab. The task considers the problem
of retrieving web pages for people seeking health information regarding medical
conditions, treatments and suggestions. The task was divided into 4 sub-tasks:
ad-hoc search, personalized search, query variations, and multilingual ad-hoc
search. Seven teams participated in the task; relevance assessment is underway
and assessments along with the participants results will be released at the CLEF
2017 conference (and will be available at the task’s GitHub repository).</p>
      <p>
        Further development of the assessments made this year makes the collection
stronger, providing to the research community a rich collection that goes beyond
topical judgements. The understandability and trustworthiness assessments can
be used to foster the development of retrieval methods for health information
seeking on the web (e.g. [
        <xref ref-type="bibr" rid="ref15 ref16">16,15</xref>
        ]).
      </p>
      <p>Baseline runs, participant runs and results, assessments, topics and query
variations are available online at the GitHub repository for this Task: https:
//github.com/CLEFeHealth/CLEFeHealth2017IRtask/.</p>
      <p>Acknowledgements We would like to thank Microsoft Azure grant (CRM:0518649),
ESF for the financial support for relevance assessments, and the many assessor
for their hard work. This work is supported in part by a Google Faculty Award.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>M.</given-names>
            <surname>Benigeri</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Pluye</surname>
          </string-name>
          .
          <article-title>Shortcomings of health information on the internet</article-title>
          .
          <source>Health promotion international</source>
          ,
          <volume>18</volume>
          (
          <issue>4</issue>
          ):
          <fpage>381</fpage>
          -
          <lpage>386</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>G. V.</given-names>
            <surname>Cormack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Smucker</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Clarke</surname>
          </string-name>
          .
          <article-title>Efficient and effective spam filtering and re-ranking for large web datasets</article-title>
          .
          <source>Information retrieval</source>
          ,
          <volume>14</volume>
          (
          <issue>5</issue>
          ):
          <fpage>441</fpage>
          -
          <lpage>465</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>S.</given-names>
            <surname>Fox</surname>
          </string-name>
          . Health topics:
          <volume>80</volume>
          %
          <article-title>of internet users look for health information online</article-title>
          .
          <source>Pew Internet &amp; American Life Project</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. J.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leveling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          , H. Mu¨ller, S. Salantera,
          <string-name>
            <given-names>H.</given-names>
            <surname>Suominen</surname>
          </string-name>
          , and G. Zuccon.
          <source>ShARe/CLEF eHealth Evaluation Lab</source>
          <year>2013</year>
          ,
          <article-title>Task 3: Information retrieval to address patients' questions when reading clinical reports</article-title>
          .
          <source>CLEF 2013 Online Working Notes</source>
          ,
          <volume>8138</volume>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Palotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pecina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Zuccon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H. M. Gareth J.F.</given-names>
            <surname>Jones</surname>
          </string-name>
          .
          <source>ShARe/CLEF eHealth Evaluation Lab</source>
          <year>2014</year>
          ,
          <article-title>Task 3: User-centred health information retrieval</article-title>
          .
          <source>In CLEF 2014 Evaluation Labs and Workshop:</source>
          Online Working Notes, Sheffield, UK,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Suominen</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>N´ev´eol,</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Robert</surname>
          </string-name>
          , E. Kanoulas,
          <string-name>
            <given-names>R.</given-names>
            <surname>Spijker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Palotti</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Zuccon</surname>
          </string-name>
          .
          <article-title>Clef 2017 ehealth evaluation lab overview</article-title>
          .
          <source>In Proceedings of CLEF 2017 - 8th Conference and Labs of the Evaluation Forum. Lecture Notes in Computer Science (LNCS)</source>
          , Springer,
          <year>September 2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          , G. Zuccon, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Palotti</surname>
          </string-name>
          .
          <article-title>Building evaluation datasets for consumer-oriented information retrieval</article-title>
          .
          <source>In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC</source>
          <year>2016</year>
          ), Paris, France, may
          <year>2016</year>
          .
          <article-title>European Language Resources Association (ELRA).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          .
          <article-title>Medical information retrieval: an instance of domain-specific search</article-title>
          .
          <source>In Proceedings of SIGIR 2012</source>
          , pages
          <fpage>1191</fpage>
          -
          <lpage>1192</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>D.</given-names>
            <surname>Hiemstra</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Hauff</surname>
          </string-name>
          . Mirex:
          <article-title>Mapreduce information retrieval experiments</article-title>
          .
          <source>arXiv preprint arXiv:1004.4489</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>B.</given-names>
            <surname>Koopman</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Zuccon.</surname>
          </string-name>
          <article-title>Relevation! an open source system for information retrieval relevance assessment</article-title>
          .
          <source>arXiv preprint</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>A.</given-names>
            <surname>Lipani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Zuccon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Mihai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Koopman</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          .
          <article-title>The impact of fixed-cost pooling strategies on test collection bias</article-title>
          .
          <source>In Proceedings of the 2016 International Conference on The Theory of Information Retrieval</source>
          , ICTIR '
          <fpage>16</fpage>
          , New York, NY, USA,
          <year>2016</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>C. Macdonald</surname>
            ,
            <given-names>R. McCreadie</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Ounis.</surname>
          </string-name>
          <article-title>From puppy to maturity: Experiences in developing terrier</article-title>
          .
          <source>Proc. of OSIR at SIGIR</source>
          , pages
          <fpage>60</fpage>
          -
          <lpage>63</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>D. McDaid</surname>
            and
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Park</surname>
          </string-name>
          .
          <article-title>Online health: Untangling the web. evidence from the bupa health pulse 2010 international healthcare survey</article-title>
          .
          <source>Technical report</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>A.</given-names>
            <surname>Moffat</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Zobel</surname>
          </string-name>
          .
          <article-title>Rank-biased precision for measurement of retrieval effectiveness</article-title>
          .
          <source>ACM Trans. Inf</source>
          . Syst.,
          <volume>27</volume>
          (
          <issue>1</issue>
          ):2:
          <fpage>1</fpage>
          -
          <lpage>2</lpage>
          :
          <fpage>27</fpage>
          ,
          <string-name>
            <surname>Dec</surname>
          </string-name>
          .
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>J.</given-names>
            <surname>Palotti</surname>
          </string-name>
          .
          <article-title>Beyond topical relevance: Studying understandability and reliability in consumer health search</article-title>
          .
          <source>In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval</source>
          , pages
          <fpage>1167</fpage>
          -
          <lpage>1167</lpage>
          . ACM,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>J. Palotti</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>Zuccon, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          .
          <article-title>Ranking health web pages with relevance and understandability</article-title>
          .
          <source>In Proceedings of the 39th international ACM SIGIR conference on Research and development in information retrieval</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>J. Palotti</surname>
            , G. Zuccon,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Bernhardt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Hanbury</surname>
            , and
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Goeuriot</surname>
          </string-name>
          .
          <article-title>Assessors Agreement: A Case Study across Assessor Type, Payment Levels, Query Variations and Relevance Dimensions</article-title>
          .
          <source>In Experimental IR Meets Multilinguality, Multimodality, and Interaction: 7th International Conference of the CLEF Association, CLEF'16 Proceedings</source>
          . Springer International Publishing,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>J. Palotti</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Hanburyn</surname>
            ,
            <given-names>G. J.</given-names>
          </string-name>
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Lupu</surname>
            , and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Pecina</surname>
          </string-name>
          .
          <source>CLEF eHealth Evaluation Lab</source>
          <year>2015</year>
          ,
          <article-title>Task 2: Retrieving Information about Medical Symptoms</article-title>
          .
          <source>In CLEF 2015 Online Working Notes. CEUR-WS</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>L.</given-names>
            <surname>Park</surname>
          </string-name>
          and
          <string-name>
            <surname>Y. Zhang.</surname>
          </string-name>
          <article-title>On the distribution of user persistence for rank-biased precision</article-title>
          .
          <source>In Proceedings of the 12th Australasian document computing symposium</source>
          , pages
          <fpage>17</fpage>
          -
          <lpage>24</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>W.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-Y.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Liui</surname>
          </string-name>
          .
          <article-title>An investigation of the effectiveness of concept-based approach in medical information retrieval GRIUM@ CLEF2014eHealthTask 3</article-title>
          .
          <source>In Proceedings of the CLEF eHealth Evaluation Lab</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>T.</given-names>
            <surname>Strohman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Metzler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Turtle</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>Indri: A language modelbased search engine for complex queries</article-title>
          .
          <source>In Proceedings of the International Conference on Intelligent Analysis</source>
          , volume
          <volume>2</volume>
          , pages
          <fpage>2</fpage>
          -
          <lpage>6</lpage>
          . Amherst, MA, USA,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>R. W.</given-names>
            <surname>White</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Horvitz</surname>
          </string-name>
          .
          <article-title>Cyberchondria: studies of the escalation of medical concerns in web search</article-title>
          .
          <source>ACM TOIS</source>
          ,
          <volume>27</volume>
          (
          <issue>4</issue>
          ):
          <fpage>23</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , S. T.
          <string-name>
            <surname>-I. Wu</surname>
            ,
            <given-names>J. J.</given-names>
          </string-name>
          <string-name>
            <surname>Masanz</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Carterette</surname>
            , and
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
          </string-name>
          .
          <article-title>Using discharge summaries to improve information retrieval in clinical domain</article-title>
          .
          <source>In Proceedings of the CLEF eHealth Evaluation Lab</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24. G. Zuccon.
          <article-title>Understandability biased evaluation for information retrieval</article-title>
          .
          <source>In Proc. of ECIR</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <given-names>G.</given-names>
            <surname>Zuccon</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Koopman</surname>
          </string-name>
          .
          <article-title>Integrating understandability in the evaluation of consumer health search engines</article-title>
          .
          <source>In Medical Information Retrieval Workshop at SIGIR</source>
          <year>2014</year>
          , page
          <volume>32</volume>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26. G. Zuccon,
          <string-name>
            <given-names>B.</given-names>
            <surname>Koopman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Palotti</surname>
          </string-name>
          .
          <article-title>Diagnose this if you can: On the effectiveness of search engines in finding medical self-diagnosis information</article-title>
          .
          <source>In Advances in Information Retrieval</source>
          , pages
          <fpage>562</fpage>
          -
          <lpage>567</lpage>
          . Springer,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27. G. Zuccon,
          <string-name>
            <given-names>J.</given-names>
            <surname>Palotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lupu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pecina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mueller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Budaher</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Deacon</surname>
          </string-name>
          .
          <source>The IR Task at the CLEF eHealth Evaluation Lab</source>
          <year>2016</year>
          :
          <article-title>User-centred Health Information Retrieval. In CLEF 2016 Evaluation Labs</article-title>
          and Workshop: Online Working Notes, CEUR-WS,
          <year>September 2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28. G. Zuccon,
          <string-name>
            <given-names>J.</given-names>
            <surname>Palotti</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          .
          <article-title>Query variations and their effect on comparing information retrieval systems</article-title>
          .
          <source>In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management</source>
          , pages
          <fpage>691</fpage>
          -
          <lpage>700</lpage>
          . ACM,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>