<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the CLEF eHealth 2020 Task 2: Consumer Health Search with ad-hoc and Spoken Queries?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zhengyang Liu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gabriella Pasi</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Viviani</string-name>
          <email>marco.vivianig@unimib.it</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chenchen Xu</string-name>
          <email>chenchen.xug@anu.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data61/Commonwealth Scienti c and Industrial Research Organisation</institution>
          ,
          <addr-line>Canberra, ACT</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Maynooth University</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>The Australian National University</institution>
          ,
          <addr-line>Canberra, ACT</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Univ. Grenoble Alpes</institution>
          ,
          <addr-line>CNRS, Grenoble INP, LIG, F-38000 Grenoble</addr-line>
          <country country="FR">France</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Milano-Bicocca, Dept. of Informatics</institution>
          ,
          <addr-line>Systems, and Communication, Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>University of Turku</institution>
          ,
          <addr-line>Turku</addr-line>
          ,
          <country country="FI">Finland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1980</year>
      </pub-date>
      <abstract>
        <p>In this paper, we provide an overview of the CLEF eHealth Task 2 on Information Retrieval (IR), organized as part of the eighth annual edition of the CLEF eHealth evaluation lab by the Conference and Labs of the Evaluation Forum. Its aim was to address laypeople's di culties in retrieving and digesting valid and relevant information, in their preferred language, to make health-centred decisions. The task was a novel extension of the most popular and established task in CLEF eHealth on Consumer Health Search (CHS), which makes responses to spoken ad-hoc queries. In total, ve submissions were made to its two subtasks; three addressed the ad-hoc IR task on text data and two considered the spoken queries. Herein, we describe the resources created for the task and evaluation methodology adopted. We also summarize lab submissions and results. As in previous years, organizers have made data, methods, and tools associated with the lab tasks available for future research and development.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <p>In recent years, electronic health (eHealth) content has become available in a
variety of forms ranging from patient records and medical dossiers, scienti c
publications, and health-related websites to medical-related topics shared across
social networks. Laypeople, clinicians, and policy-makers need to easily retrieve,
and make sense of such medical content to support their decision making. The
increasing di culties experienced by these stakeholders in retrieving and
digesting valid and relevant information in their preferred language to make
healthcentred decisions has motivated CLEF eHealth to organise yearly shared
challenges since 2013.</p>
      <p>More speci cally, CLEF eHealth7 was established as a lab workshop in 2012
as part of the Conference and Labs of the Evaluation Forum (CLEF). Since 2013,
it has o ered evaluation labs in the elds of layperson and professional health
information extraction, management, and retrieval with the aims of bringing
together researchers working on related information access topics and providing
them with datasets to work with and validate the outcomes. These labs and
their subsequent workshops target:
1. developing processing methods and resources (e.g., dictionaries,
abbreviation mappings, and data with model solutions for method development and
evaluation) in a multilingual setting to enrich di cult-to-understand eHealth
texts, support personalized reliable access to medical information, and
provide valuable documentation;
2. developing an evaluation setting and releasing evaluation results for these
methods and resources;
3. contributing to participants and organizers' professional networks and
interaction with all interdisciplinary actors of the ecosystem for producing,
processing, and consuming eHealth information.</p>
      <p>The vision for the Lab is two-fold:
1. to develop tasks that potentially impact laypeople's understanding of
medical information, and
2. to provide the community with an increasingly sophisticated dataset of
clinical narrative, enriched with links to standard knowledge bases,
evidencebased care guidelines, systematic reviews, and other further information, to
advance the state-of-the-art in multilingual information extraction (IE) and
information retrieval (IR) in healthcare.
7 https://clefehealth.imag.fr/</p>
      <p>
        The eighth annual CLEF eHealth lab, CLEF eHealth 2020 [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], aiming to
build upon the resource development and evaluation approaches o ered in the
previous years of the lab [
        <xref ref-type="bibr" rid="ref11 ref12 ref19 ref20 ref21">53, 20, 11, 19, 12, 51, 21</xref>
        ], o ered the following two tasks:
{ Task 1. Multilingual IE [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] and
{ Task 2. Consumer Health Search (CHS).
      </p>
      <p>
        The CHS task was a continuation of the previous CLEF eHealth IR tasks
that ran in 2013, 2014, 2015, 2016, 2017 and 2018 [
        <xref ref-type="bibr" rid="ref10 ref17 ref33 ref34 ref8 ref9">8, 10, 33, 62, 34, 17, 9</xref>
        ], and
embraced the Text REtrieval Conference (TREC) { style evaluation process,
with a shared collection of documents and queries, the contribution of runs
from participants and the subsequent formation of relevance assessments and
evaluation of participants submissions. The 2020 task used the representative
web corpus developed in the 2018 challenge. This year we o ered spoken queries,
as well as textual transcripts of these queries. The task was structured into a two
optional subtasks, covering (1) ad-hoc searchand (2) query variation using the
spoken queries, textual transcripts of the spoken queries, or provided automatic
speech-to-text conversions of the spoken queries.
      </p>
      <p>
        The multilingual IE task focused on Spanish. Further details on this challenge
are available in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ].
      </p>
      <p>The remainder of this paper is structured as follows: First, in Section 2,
we detail the task, evaluation, and datasets created. Second, in Section 3, we
describe the task submissions and results. Finally, in Section 4, we discuss the
study and provide conclusions.
2</p>
    </sec>
    <sec id="sec-3">
      <title>Materials and Methods</title>
      <p>In this section, we describe the materials and methods used in the two subtasks.
After specifying our document collection, we address the spoken, transcribed,
and speech recognized queries. Then, we describe our evaluation methods.
Finally, we introduce our human relevance assessments for information topicality,
understandability, and credibility.
2.1</p>
      <sec id="sec-3-1">
        <title>Documents</title>
        <p>
          The 2018 CLEF eHealth Consumer Health Search document collection was used
in this year's IR challenge. As detailed in [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], this collection consists of web
pages acquired from the CommonCrawl.
        </p>
        <p>An initial list of websites was identi ed for acquisition. The list was built by
submitting queries on the 2018/2020 topics to the Microsoft Bing Application
Programming Interfaces (APIs), through the Azure Cognitive Services,
repeatedly over a period of a few weeks, and acquiring the uniform resource locators
(URLs) of the retrieved results. The domains of the URLs were then included
in the list, except some domains that were excluded for decency reasons.</p>
        <p>The list was further augmented by including a number of known reliable
health websites and other known unreliable health websites. This augmentation
was based on lists previously compiled by health institutions and agencies.
2.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Queries</title>
        <p>Historically, the CLEF eHealth IR task has released text queries representative of
layperson medical information needs in various scenarios. In recent years, query
variations issued by multiple laypeople for the same information need have been
o ered. In this year's task, we extended this to also include spoken queries.</p>
        <p>
          These spoken queries were generated by six laypeople in English. All native
English speakers. E orts were made to include a diverse set of accents. Narratives
for query generation were those used in the 2018 challenge. These narratives
relate to real medical queries compiled by the Khresmoi project [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] which were
issued to a search engine by laypeople; full details are available in the CLEF
eHealth 2018 IR task overview paper [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. Spoken transcripts of these narratives
were generated for use in query generation in this year's challenge.
        </p>
        <p>To create a spoken query the layperson listened to the narrative; and
generated their spoken query associated with the narrative. The layperson then
listened to their generated spoken query and created a textual transcript of the
query. To ensure accuracy in transcription, they were required to repeat this
process of listening to their narrative and textually transcribing it. This allowed
us to generate accurate transcripts of the spoken queries. We did not preprocess
the textual transcripts of queries; for example, any spelling mistakes that may
be present were not removed.The nal generated query set consisted of 50 topics,
with 6 query variations for each topic.</p>
        <p>Ethical approval was obtained to generate the spoken queries, and informed
consent obtained from study participants. Spoken queries were downloadable
from a secured server for the purpose of participating in this year's CLEF eHealth
IR challenge, on completion of a signed use agreement by the participating team.</p>
        <p>We also provided participants with the textual transcripts of these
spoken queries and automatic speech-to-text translations. This transcription of
the audio les was generated using the End-to-End Speech Processing Toolkit
(ESPNET), Librispeech, CommonVoice, and Google API (with three models).
Speech recognition is assessed using Kaldi [40], an open-source speech recognition
toolkit distributed under a free license. We use mel-frequency cepstral coe cient
(MFCC) acoustic features (13 coe cients expanded with delta and double delta
features and energy : 40 features) with various feature transformations including
linear discriminant analysis (LDA), maximum likelihood linear transformation
(MLLT), and feature space maximum likelihood linear regression (fMLLR) with
speaker adaptive training (SAT).</p>
        <p>The speech transcription process is carried out in two passes: an automatic
transcript is generated with a GMM-HMM model of 12000 states and 200000
Gaussians. The second pass is performed using DNN (nnet3 recipe in kaldi
toolkit) acoustic model trained on acoustic features normalized with the
fMLLR matrix.</p>
        <p>TEDLIUM dataset [42] was used for training acoustic models. It was
developed for large vocabulary continuous speech recognition (LVCSR). The train
part of the dataset is composed 118 hours of speech.</p>
        <p>
          The English language model is trained with MIT language model toolkit
using following corpora : News commentary 2007-2012 [55], Gigaword version
5 [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], TDT 2-4 [57]. The vocabulary size is 150K based on most frequent words.
2.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Evaluation Methods</title>
        <p>
          Similar to the 2016, 2017, and 2018 pools, we created the pool using the
RBPbased Method A (Summing contributions) by Mo at et al. [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ], in which
documents are weighted according to their overall contribution to the e ectiveness
evaluation as provided by the RBP formula (with p = 0:8, following Park and
Zhang [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ]). This strategy, named RBPA, was chosen because it was shown that
it should be preferred over traditional xed-depth or strati ed pooling when
deciding upon the pooling strategy to be used to evaluate systems under xed
assessment budget constraints [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], as it is the case for this task.
        </p>
        <p>Since the 2018 topics were used in 2020,the pool used in the 2020 CHS task
was an extension of 2018's pool. In other words, the merged 2018&amp;2020 pool
was used in 2020. For Subtasks 1 and 2, participants could submit up to 4 runs
in the TREC format. Evaluation measures were NDCG@10, BPref, and RBP.
Metrics such as uRBP were used to capture various relevance dimensions, as
elaborated below.
2.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Human Assessments for Topicality, Understandability, and Credibility</title>
        <p>Relevance assessments were conducted on three relevance dimensions:
topicality, understandability, and credibility. Topicality referred to a classical relevance
dimension ensuring that the document and the query are on the same topic
and the document answers the query. Understandability was an estimation of
whether the document is understandable by a patient. In assessment guidelines,
assessors were required to estimate how readable they think the documents were
to a layperson, that is, a person without any medical background. Topicality
and understandability have been used as relevance dimensions in the CHS task
of CLEF eHealth for several years.</p>
        <p>
          This year, to take into consideration the phenomenon of the spread of
disinformation online (especially on health-related topics), we introduced a novel
dimension, that is, credibility. Over the years, the interest in studying the concept
of credibility has gradually moved from traditional communication environments,
characterized by interpersonal and persuasive communication, to mass
communication and interactive-mediated communication, with particular reference to
online communication [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]. In this scenario, retrieving credible information is
becoming a fundamental issue [
          <xref ref-type="bibr" rid="ref18 ref23">18, 23, 38, 39, 59</xref>
          ], also in the health-related
context [44].
        </p>
        <p>
          In general, credibility is described in the literature as a perceived quality of
the information receiver [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], and it is composed of multiple dimensions that have
to be considered and evaluated together in the process of information credibility
assessment [
          <xref ref-type="bibr" rid="ref27 ref6">6, 27, 45</xref>
          ]. These dimensions usually include the source that
disseminates content, characteristics related to the message di used, and some social
aspects if the information is disseminated through a virtual community [56].
        </p>
        <p>
          For this reason, when evaluating information credibility in the health-related
context, assessors were asked in the CLEF eHealth 2020 Task 2 to evaluate the
aforementioned multiple aspects by considering, at the same time:
1. any information available about the trustworthiness of the source [
          <xref ref-type="bibr" rid="ref2 ref25 ref3">2, 3, 25</xref>
          ] of
the health-related information (the fact that information comes from a Web
site with a good or bad reputation, or the level of expertise of an individual
answering on a blog or a question-answering system, etc.);
2. syntactic/semantic characteristics of the content [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] (in terms of, e.g.,
completeness, language register, or style); and
3. any information emerging from social interactions when available [37] (the
fact that the circle of social relationships of the author of a content is reliable
or not, the fact that the author is involved in many discussions, etc.).
Obviously, it must be taken into account that the ability to judge the credibility
of information related to health depends very much on the sociocultural
background of the assessor, on the availability of information about the social network
of the source of information, and on the ease versus complexity of identifying in
or inferring from the document the di erent dimensions of credibility.
        </p>
        <p>Assessors considered the three dimensions in assessments (i.e., topicality,
understandability, and credibility) on a 3-levels scale:
{ not relevant/understandable/credible,
{ somewhat relevant/understandable/credible, and
{ highly relevant/understandable/credible.</p>
        <p>In particular, we added a 4th option for credibility for assessors uncertainty: I
am not able to judge. This was motivated by the fact that, as illustrated above,
the documents to be assessed may actually lack (or it may not be entirely clear)
the minimum information necessary to assess their level of credibility.</p>
        <p>
          Assessments were implemented online by expanding and customising the
Relevation! tool for relevance assessments [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] to capture our task dimensions, scales,
and other preferences. The number of assessors was 30, of which about 12 women
(40%) and 18 men (60%). They were based in European countries and Australia.
Their expertise ranged from being a medical doctor (in di erent disciplines) or
a nurse to being a layperson with no or limited background in medicine or
healthcare. Each assessor was assigned 1 to 5 queries to be evaluated. Each
query (concerning a speci c domain linked to health) was associated with 150
documents to be evaluated with respect to the three dimensions of relevance
mentioned above.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>
        CLEF eHealth IR/CHS tasks o ered in 2013{2020 have brought together
researchers working on health information access topics. The tasks have provided
them with data and computational resources to work with and validate their
outcomes. These contributions have accelerated pathways from scienti c ideas
through in uencing research and development to societal impact. The task niche
has been addressing health information needs of laypeople (including, but not
limited to, patients, their families, clinical sta , health scientists, and healthcare
policy makers) | and not healthcare experts only | in a range of languages and
modalities | in retrieving and digesting valid and relevant eHealth information
to make health-centered decisions [
        <xref ref-type="bibr" rid="ref16 ref4 ref5">4, 16, 5, 48, 49</xref>
        ].
      </p>
      <p>
        Next, we report on the 2020 participants, method submissions, and their
resulting evaluation outcomes. This expands the brief results section of our lab
overview [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
3.1
      </p>
      <sec id="sec-4-1">
        <title>Participation</title>
        <p>In 2020, 24 teams registered to CLEF eHealth Task 2. Of these teams, 3 took
part in the task. Registering for a CLEF task consisted of lling in a form on the
CLEF conference website with contact information, and tick boxes corresponding
to the labs of interest. This was done several months before run submission, which
explains the drop in the numbers.</p>
        <p>Also, the task was di cult and demanding which is another explanation for
the drop.</p>
        <p>Although the interest and participation numbers were considerably smaller
than before [48{50], organizers were pleased with this newly introduced task,
with its novel spoken queries element attracting interest and submissions (Table
1). In addition, every participating team took the o ered more-traditional ad-hoc
task.
3.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Participants' Method Submissions</title>
        <p>Among ve submissions to the 2020 CLEF eHealth Task 2, the ad-hoc IR
subtask was the most popular with its three submissions; the subtasks that used
transcriptions of the spoken queries and the original audio les received one
submission each. Speci cally, the subtask that used transcriptions of the
spoken queries had one submission and the subtask where the original audio les
were processed had one submission. The submitting teams were from Australia,
France, and Italy and had 4, 5, and 3 team members, respectively.8 Each team
had members from a single university without other partner organizations.</p>
        <p>
          The Italian submission to the ad-hoc search and spoken queries using
transcription subtasks was from the Information Management System (IMS) Group
of the University of Padua [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]. Its members were Associate Professor Giorgio
Maria Di Nunzio, Stefano Marchesin, and Federica Vezzani. The submission to
the former task included BM25 of the original query; Reciprocal Rank fusion
with BM25, Query Language Model (QLM), and Divergence from Randomness
8 Please note that these numbers are corrected from [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], based on the team's nalised
working notes.
(DFR) approaches. Reciprocal Rank fusion with BM25, QLM, and DFR
approaches using pseudo relevance feedback with 10 documents and 10 terms (the
query weight of 0.5); and Reciprocal rank fusion with BM25 run on manual
variants of the query. The submission to the latter task included the
Reciprocal Rank fusion with BM25; Reciprocal Rank fusion with BM25 using pseudo
relevance feedback with 10 documents and 10 terms (the query weight of 0.5);
Reciprocal Rank fusion of BM25 with all transcriptions; and Reciprocal Rank
fusion of BM25 with all transcripts using pseudo relevance feedback with 10
documents and 10 terms (the query weight of 0.5).
        </p>
        <p>
          The French team, LIG-Health, was formed by Dr Philippe Mulhem, Aidan
Mannion, Gabriela Gonzalez Saez, Dr Didier Schwab, and Jibril Frej from the
Laboratoire d'Informatique de Grenoble of the Univ. Grenoble Alpes [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]. To the
ad-hoc search task, they submitted runs using Terrier BM25 as a baseline, and
explored various expansion methods using UMLS, using the Consumer Health
Vocabulary, expansion using Fast Text, and RF (bose-Einstein) weighted
expansion. For the spoken queries subtask they used various transcriptions on the
same models, opting for the best performing ones based on 2018 qrels. They
submitted merged runs for each query.
        </p>
        <p>
          The Australian team { called SandiDoc { was from the Our Health In Our
Hands (OHIOH) Big Data program, Research School of Computer Science,
College of Engineering and Computer Science The Australian National
University [43]. Its members were Sandaru Seneviratne, Dr Eleni Daskalaki, Dr Artem
Lenskiy, and Dr Zakir Hossain. Di erently from the other two teams,
SandiDoc took part in the ad-hoc search task only. Its IR method had three steps
and was founded on TF IDF scoring: First, both the dataset and the queries
were pre-processed. Second, TF ID scores were computed for the queries and
used to retrieve the most similar documents for the queries. Third, the team
supplemented this method by working on the clefehealth2018 B dataset using
the medical skip-gram word embeddings provided. To represent the documents
and queries, the team used the average word vector representations as well as
the average of minimum and maximum vector representations of the document
or query. In documents, these representations were derived using the 100 most
frequent words in a document. For each representation, the team calculated the
similarity among documents and queries using the cosine measure to obtain the
nal task results. The team's aim was to experiment with di erent vector
representations for text.
In addition to these participants' methods, we as organizers developed baseline
methods that were based on the renowned OKapi BM25 but now with query
expansion optimized in a REINFORCE [58] fashion. In this section, we introduce
the two steps of query expansion and document retrieval. First, we pre-trained
our query expansion model on the generally available corpora. Similar to the
REINFORCE learning protocol as introduced in [58], this pre-training step was
done by iterations of exploration trials and optimization of the current model by
rewarding the explorations. Each time an input query was enriched by the query
expansion model from the last iteration, and from where several candidate new
queries were generated. The system was rewarded or penalized by matching the
retrieved documents from these candidate queries against the ground truth
document ranking. To expand the trial of generating new queries and thus provide
more training sources, the system used the context words found in these newly
retrieved documents to construct queries for the next iteration. For this baseline
model, the same datasets from [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ], TREC-CAR, Jeopardy, and MSA were used
for pre-training.
        </p>
        <p>
          One key challenge for information retrieval in the eHealth domain is that
layperson may lack the professional knowledge to precisely describe medical
topics or symptoms. A layperson's input query into the system can be lengthy
and inaccurate, while the documents to be matched for these queries are usually
composed by people of medical mind and background and thus rigorous in
wording. The query expansion phase was added to increase the chance of matching
more candidate documents by enriching the original query. With this intuition
in mind, we employed similar candidate query construction method and
optimization target as introduced in [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]. Given an original query q0, the system
retrieves several ranked documents set D0. The system constructed new
candidate query q00 by selecting words from the union of words from the original query
q0 and context words from the retrieved documents D0. The new query was fed
back into the retrieval system to fetch documents D00 as the learning source
for the next iteration. The system iteratively apply the retrieval of documents
and reformulation of new candidate queries to create the supervision examples
f(q00; D00); (q10; D10); : : :g. At each iteration, the system memorized the selection
operation of words in constructing the new query as actions to be judged. The
documents retrieved Dk0 along with their ranking were then compared with the
ground truth document ranking. Correct ranking of the documents becomes
reward to the new query and thus also the actions that generate it. Particularly,
the stochastic objective function for calculating the reward was:
Ca = (R
        </p>
        <p>R) X
where R and R are the reward from the new query and baseline reward, and
t 2 T are words from the new query. With the actions and reward being properly
de ned, the system can be optimized under the REINFORCE learning
framework [58]. At inference stage, the system will greedily peek the optimal selection
operations to generate a few candidate queries from the input query.</p>
        <p>After enriching the input queries by the pre-trained query expansion model,
the second step of this baseline model reused the commonly-used BM25
algorithm [41].
3.4</p>
      </sec>
      <sec id="sec-4-3">
        <title>Evaluating Topicality</title>
        <p>shows the task metrics (MAP, BPREF, NDCG, uRBP and cRBP) for all the
participants runs, and the organizers baselines.
ad-hoc search task. Bold cells are the highest scores. Statistical signi cance tests were
conducted but the highest participants runs were not statistically signi cantly higher
than the best baselines.</p>
        <p>Team
Baseline</p>
        <p>IMS</p>
        <p>LIG
SandiDoc
the scores are not statistically signi cantly higher than the best baseline. Their
top run, original rm3 rrf run uses Reciprocal Rank fusion with BM25, QLM,
DFR approaches using pseudo relevance feedback with 10 documents and 10
terms (query weight 0.5) and achieves 0.28 MAP and 0.43 BPref. The second
best run for MAP and BPref is also from IMS, using the same ranking system
without PRF. For NDCG, the organizers baseline using ElasticSearch BM25
without query expansion obtains higher results. Interstingly, we observe in that
can give very di erent results (MAP ranging from 0.11 to 0.26).</p>
        <p>Since the ad-hoc task used the same topics and documents as 2018 but
intended to extend 2018's pool, we compared teams results for all the ad-hoc task
metrics in Figure 1. The</p>
        <p>
          gure shows that the extension of the pool had a
relatively limited impact on the performances of each submitted systems, except
for Bpref measure that shows a consistent decrease. Bpref, correlated to the
average precision, is more robust to reduced pools [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and penalises systems for
ranking non-relevant documents above relevant ones. Therefore, this decrease
can be explained by the fact that the extension of the pool contained relevant
documents.
The evaluation of understandability have been measure with
understandabilityranked biased precision (uRBP) [61]. uRBP evaluate IR systems by taking into
account both topicality and understandability dimensions of relevance.
        </p>
        <p>Particularly, the function for calculating uRBP was:
(1)
where r(d@k) is the relevance of the document d at position k, u(d@k) is the
understandability value of the document d at position k, and the persistent
parameter models the user desire to examine every answer, which was set to
0.50, 0.80 and 0.95 to obtain three version of uRBP, according to di erent user
behaviors.</p>
        <p>The results for all the participants for understandability evaluation is shown
in the second last column of Table 2. Table 5 shows the top ten runs submitted
in the ad-hoc Task. The best run was obtained with Reciprocal rank fusion with
BM25 run on manual variants of the query by team IMS. The BM25 baseline
gives very close performances. The run ranking does not di er from the
topicality evaluation. This could be due to the fact than none of the submitted runs
included features speci cally designed to assess understandability of the results.</p>
        <p>These results have been obtained with the binary relevance
assessment, and the graded understandability assessment, and rbp eval
0.5 as distributed by RBP group. For further details, please refer to
https://github.com/jsc/rbp eval.
In this section, we report the results produced for the two subtasks: ad-hoc search
and spoken queries retrieval. In particular, for the ad-hoc subtask, only the
adhoc IR subtask is considered (no speech recognition). For each subtask, the
results of both baseline methods and team submission methods in the context of
credibility assessment related to IR are reported and commented. The measures
employed to assess the credibility of the tasks considered are detailed below.</p>
        <p>In the literature, accuracy is certainly the measure that has been used most
frequently to evaluate a classi cation task, and, as such, it has been usually
employed to evaluate the e ectiveness of credibility assessment. In fact, to date, the
problem of assessing the credibility of information has mostly been approached
as a binary classi cation problem (by identifying credible versus non-credible
information) [56]. Some works have also proposed the computation of credibility
values for each piece of information considered, by proposing users a
credibilitybased ranking of the considered information items, or by leaving to users the
choice of trusting the information items based on these values [36, 56, 60].
Obviously, once an adequate threshold has been chosen, it would be possible to
transform these approaches into approaches that produce a binary classi cation.</p>
        <p>However, our purpose in asking assessors to evaluate the documents from
the point of view of their credibility is to be able to generate IR Systems that
can retrieve credible information, besides understandable and topically relevant.
For this reason, in CLEF eHealth 2020 we have adapted the
understandabilityranked biased precision (uRBP) illustrated in [61] to credibility, by employing
the so-called cRBP measure. In this case the function for calculating cRBP is the
same used to calculate uRBP (see Equation 1 in Section 3.5 , replacing u(d@k)
by the credibility value of the document d at position k, c(d@k) :
(2)
As in uRBP the parameter was set to three values, from impatient user (0.50)
to more persistent users (0.80 and 0.95).</p>
        <p>It is important to underline that, in de ning IR approaches implemented by
both the organizers and the three groups that submitted runs, no explicit
reference was made to solutions for assessing the credibility of documents. Therefore,
any potential increase in the evaluation gures must be considered purely
coincidental.</p>
        <p>Evaluation with Accuracy. The results illustrated in this section were
obtained with a binary credibility assessment for 2020 and trustworthiness for 2018
data. A document assessed with a credibility/trustworthiness value 50% was
considered as credible. The accuracy of the credibility assessment was calculated
over the top 100 documents retrieved for each query as follows:
acc(q) =
#credible retrieved docs top 100(q)
#retrieved docs top 100(q)
:</p>
        <p>Table 6, referring to the ad-hoc Search subtask, shows that most of the
approaches tested presented a good accuracy value when it comes to the credibility
of the retrieved documents. However, SandiDoc's submission had, unfortunately,
a very low accuracy value. With respect to the spoken queries IR subtask, the
results were available only for IMS and LIG, as follows:</p>
        <p>With respect to this second subtask, the accuracy values illustrated in Table
7 were lower than those referred to the previous task with respect to all the
approaches tested by IMS and LIG. This is most likely due to errors from speech
recognition multiplying in IR, similar to what we experienced in the CLEF
eHealth 2015 and 2016 tasks on speech recognition and IE to support nursing
shift-change handover communication [47, 54].
Evaluation with cRBP. These results have been obtained with the binary
relevance assessment, and the graded credibility assessment, with the same
program referred in section 3.5 (rbp eval 0.5). To obtain cRBP with di erent
persistence values, rbp eval was ran as follows for credibility:
rbp eval -q -H qrels.credibility.clef201820201.test.binary
runName</p>
        <p>As for the values obtained by using the cRBP measure, at 0.50, 0.80, and
0.95, it is possible to say that, regardless of the speci c approach used, and
with respect to the ad-hoc search task, they range in the 0.15{0.57 interval
for the baseline, with an average of 0.40; in the 0.41{0.57 interval for IMS,
with an average of 0.50; in the 0.40 { 0.49 for LIG, with an average of 0.45; in
the 0.16{0.23 interval for SandiDoc, with an average of 0.20. These results are
pretty coherent with those obtained for accuracy. Considering the spoken queries
retrieval subtask, also in this case the results are available only for IMS and LIG.
In this case, for IMS they range in the 0.37{0.50 interval, with an average of 0.44,
while for LIG they range in the 0.30{0.45 interval, with an average of 0.37.
4</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>This year's challenge o ered an ad-hoc search subtask and a spoken query
retrieval subtask. In Section 3 we provided an analysis of the results obtained by
the 3 teams who took part in these tasks. We showed the results achieved by
several baselines provided by the task organizers. We also discussed and compared
these results in Section 3. We consider three dimensions of relevance - topicality,
understanability, and credibility.</p>
      <p>The importance of considering several dimensions of relevance, beyond the
traditional topicality measure is highlighted in the results obtained, where we
nd that di erent retrieval techniques score higher under each of the relevance
dimensions (topicality, understanability, and credibility).</p>
      <p>As might be expected, retrieval performance is impacted when the queries are
presented in spoken form. Speech-to-text conversion is required before the queries
can be used in the developed retrieval approaches. The retrieval performance
then is impacted by the quality of the speech-to-text conversion. Future studies
will explore this phenomenon in greater detail.</p>
      <p>
        We next look at the limitations of this year's challenge. We then re ect on
prior editions of the challenge and the challenges future, before concluding the
paper.
As previously illustrated, the concept of credibility has been studied in many
di erent research elds, including both psychology/sociology and computer
science. Introducing the concept of credibility in the context of IR is not an easy
task. On one hand, it represents a characteristic that can be only perceived
by human beings [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. On the other hand, it can be considered as an objective
property of an information item, which an automated system can only estimate,
and, as a consequence, the `certainty' of the correct estimate can be expressed
by numerical degrees [56]. For instance, let us consider the extreme example of
information seekers who think that Covid-19 does not exist and that, therefore,
any measure of social distancing is useless in preventing a non-existent
contagion. From the point of topical relevance, documents that a rms that the virus
is pure invention should be more relevant to them in a context of personalized
search; however, documents claiming that it is useless to take precautions not to
get infected should be assessed as non-credible in any case. In this context, the
assessment of the global relevance of documents, which should take into account
both objective and subjective dimensions of relevance, becomes highly
problematic. For the above reasons, it is di cult to measure and evaluate credibility
with the traditional measures used in IR.
      </p>
      <p>As illustrated in Section 2.4, measures that have been used so far to evaluate
the e ectiveness of a systems in identifying credible or not credible information
are the traditional ones that are used in machine learning for classi cation tasks,
in particular accuracy. To be able to evaluate IR systems that are capable to
retrieve credible information (as well as understandable and topically relevant)
one could consider more IR-oriented metrics. In CLEF eHealth 2020, we have
adapted uRBP to credibility, by employing the so-called cRBP measure (as
illustrated in Section 3.6).</p>
      <p>However, this measure considers the credibility related to an information item
as subjective, while we believe that we should assess credibility in an objective
way. This makes this measure only partially suitable to our purposes (as well as
the accuracy was only partially suitable). In this scenario, it becomes essential
to develop measures that go beyond taking information credibility into account
as a a binary value, as is done in classi cation systems. This problem calls for
advancing IR systems and developing related evaluation measures that factor in
the joint goodness of the ranking produced by a search engine with respect to
multiple dimensions of relevance. These include, but are not limited to, topicality,
understandability, and credibility. Including credibility is critical in order to
balance between the subjectivity (of assessors) and the objectivity of a fact
reported in an information item. Consequently, its inclusion is certainly one of
the most ambitious goals we set for our future work.</p>
      <sec id="sec-5-1">
        <title>Comparison with Prior Work</title>
        <p>
          The inaugural CLEF eHealth CHS/IR task was organized in 2013 on the
foundation set by the 2012 CLEF eHealth workshop. The principal nding of this
workshop, set to prepare for future evaluation labs, was identifying laypeople's health
information needs and related patient-friendly health information access
methods as a theme of the community's research and development interest [46]. The
resulting CLEF eHealth tasks on CHS/IR, o ered yearly from 2013 to 2020 [
          <xref ref-type="bibr" rid="ref11 ref12 ref13 ref19 ref20 ref21">53,
20, 11, 19, 12, 52, 21, 13</xref>
          ], brought together researchers to work on the theme by
providing them with timely task speci cations, document collections, processing
methods, evaluation settings, relevance assessments, and other tools. Targeted
use scenarios for the designed, developed, and evaluated CHS/IR technologies in
these CLEF eHealth tasks included easing patients, their families, clinical sta ,
health scientists, and healthcare policy makers in accessing and understanding
health information. As a result, the annual tasks accelerated technology
transfers from their conceptualisation in academia to generating societal impact [48,
49].
        </p>
        <p>
          This achieved impact has led to CLEF eHealth establishing its presence and
becoming by 2020 one of the primary evaluation lab and workshop series for
all interdisciplinary actors of the ecosystem for producing, processing, and
consuming eHealth information [
          <xref ref-type="bibr" rid="ref16 ref4 ref5">4, 16, 5</xref>
          ]. Its niche in CHS/IR tasks is formed by
addressing health information needs of laypeople | and not healthcare experts
only | in accessing and understanding eHealth information in multilingual,
multi-modal settings with simultaneous methodological contributions to
dimensions of relevance assessments (e.g., topicality, understandability, and credibility
of returned information).
4.3
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>A Vision for the Task beyond 2020</title>
        <p>
          The general purpose of the CLEF eHealth workshops and its preceding CHS/IR
tasks has been throughout the years from 2012 to 2020 to assist laypeople into
nding and understanding health information in order to make enlightened
decisions concerning their health and/or healthcare [
          <xref ref-type="bibr" rid="ref11 ref12 ref13 ref19 ref20 ref21">46, 53, 20, 11, 19, 12, 51, 21, 13</xref>
          ].
In that sense, the evaluation challenge will focus in the coming years on
patientcentered IR in a both multilingual and multi-modal setting.
        </p>
        <p>Improving multilingual and multi-modal methods is crucial to guarantee a
better access to information, and to understand it. Breaking language and
modality barriers has been a priority in CLEF eHealth over the years, and this will
continue. Text has been the major media of interest, but as of 2020, also speech
has been included as a major new way of people interacting with the systems.</p>
        <p>
          Patient-centered IR/CHS task has been running since 2013 | yet, every
edition has allowed to identify unique di culties and challenges that have shaped
the task evolution [
          <xref ref-type="bibr" rid="ref11 ref12 ref13 ref19 ref20 ref21">53, 20, 11, 19, 12, 51, 21, 13</xref>
          ]. The task has considered in the
past, for example, multilingual queries, contextualized queries, spoken queries,
and query variants. Resources used to build these queries have also been changed.
Further exploration of query construction, aiming at a better understanding of
information seekers' health information needs are needed. The task will also
further explore relevance dimensions (e.g., topicality, understandably, and
credibility), with a particular emphasis on information credibility and methods to
take these dimensions into consideration.
This paper provided an overview of the CLEF eHealth 2020 Task 2 on IR/CHS.
The CLEF eHealth workshop series was established in 2012 as a scienti c
workshop with an aim of establishing an evaluation lab [46]. Since 2013, this annual
workshop has been supplemented with two or more preceding shared tasks each
year, in other words, the CLEF eHealth 2013{2020 evaluation labs [
          <xref ref-type="bibr" rid="ref11 ref12 ref13 ref19 ref20 ref21">53, 20, 11, 19,
12, 51, 21, 13</xref>
          ]. These labs have o ered a recurring contribution to the creation
and dissemination of text analytics resources, methods, test collections, and
evaluation benchmarks in order to ease and support patients, their next-of-kins,
clinical sta , and health scientists in understanding, accessing, and authoring
eHealth information in a multilingual setting.
        </p>
        <p>In 2020 the CLEF eHealth lab o ered two shared task. One on multilingual
IE and the other on consumer health search. These tasks built on the IE and
IR tasks o ered by the CLEF eHealth lab series since its inception in 2013. Test
collections generated by these shared tasks o ered a speci c task de nition,
implemented in a dataset distributed together with an implementation of relevant
evaluation metrics to allow for direct comparability of the results reported by
systems evaluated on the collections.</p>
        <p>These established CLEF IE and IR tasks used a traditional shared task model
for evaluation in which a community-wide evaluation is executed in a controlled
setting: independent training and test datasets are used and all participants gain
access to the test data at the same time, following which no further updates to
systems are allowed. Shortly after releasing the test data (without labels or other
solutions), the participating teams submit their outputs from the frozen systems
to the task organizers, who evaluate these results and report the resulting
benchmarks to the community.</p>
        <p>The annual CLEF eHealth workhops and evaluation labs have matured and
established their presence in 2012{2020 in proposing novel tasks in IR/CHS.
Given the signi cance of the tasks, all problem speci cations, test collections,
and text analytics resources associated with the lab have been made available
to the wider research community through our CLEF eHealth website9.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>We gratefully acknowledge the contribution of the people and organizations
involved in CLEF eHealth in 2012{2020 as participants or organizers. We thank the
CLEF Initiative, Benjamin Lecouteux (Universite Grenoble Alpes), Jo~ao Palotti
9 https://clefehealth.imag.fr/
(Qatar Computing Research Institute), Harrisen Scells (University of
Queensland), and Guido Zuccon (University of Queensland). We thank the individuals
who generated spoken queries for the IR challenge. We also thank the individuals
at University of Queensland who contributed to the IR query generation tool and
process. We are very grateful to our assessors that helped despite the COVID-19
crisis: Paola Alberti, Vincent Arnone, Nathan Baran, Pierre Barbe, Francesco
Bartoli, Nicola Brew-Sam, Angela Calabrese, Sabrina Caldwell, Daniele
Cavalieri, Madhur Chhabra, Luca Cu aro, Yerbolat Dalabayev, Emine Darici, Marco
Di Sarno, Mauro Guglielmo, Weiwei Hou, Yidong Huang, Zhengyang Liu,
Federico Moretti, Marie Revet, Paritosh Sharma, Haozhan Sun, Christophe Zeinaty.
The lab has been supported in part by The Australian National University
(ANU), College of Engineering and Computer Science, Research School of
Computer Science; the Our Health in Our Hands (OHIOH) initiative; and the CLEF
Initiative. OHIOH is a strategic initiative of The ANU which aims to transform
healthcare by developing new personalised health technologies and solutions in
collaboration with patients, clinicians, and healthcare providers. We
acknowledge the Encargo of Plan TL (SEAD) to CNIO and BSC for funding, and the
scienti c committee for their valuable comments and guidance.
36. Pasi, G., De Grandis, M., Viviani, M.: Decision making over multiple criteria to
assess news credibility in microblogging sites. In: IEEE World Congress on
Computational Intelligence (WCCI) 2020, Proceedings. IEEE (2020)
37. Pasi, G., Viviani, M.: Information credibility in the social web: Contexts,
approaches, and open issues. arXiv preprint arXiv:2001.09473 (2020)
38. Popat, K., Mukherjee, S., Strotgen, J., Weikum, G.: Credibility assessment of
textual claims on the web. In: Proceedings of the 25th ACM International on
Conference on Information and Knowledge Management. pp. 2173{2178 (2016)
39. Popat, K., Mukherjee, S., Strotgen, J., Weikum, G.: Where the truth lies:
Explaining the credibility of emerging claims on the web and social media. In: Proceedings
of the 26th International Conference on World Wide Web Companion. pp. 1003{
1012 (2017)
40. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N.,
Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al.: The kaldi speech recognition
toolkit. In: IEEE 2011 workshop on automatic speech recognition and
understanding. No. CONF, IEEE Signal Processing Society (2011)
41. Robertson, S.: The probabilistic relevance framework: BM25 and beyond.</p>
      <p>Foundations and Trends® in Information Retrieval 3(4), 333{389 (2010).
https://doi.org/10.1561/1500000019
42. Rousseau, A., Deleglise, P., Esteve, Y.: Ted-lium: an automatic speech recognition
dedicated corpus. In: LREC. pp. 125{129 (2012)
43. Sandaru Seneviratne, Eleni Daskalaki, A.L., Hossain, M.Z.: SandiDoc at CLEF
2020 - Consumer Health Search : AdHoc IR Task. In: Conference and Labs of
the Evaluation (CLEF) Working Notes. CEUR Workshop Proceedings
(CEURWS.org) (2020)
44. Sba , L., Rowley, J.: Trust and credibility in web-based health information: a
review and agenda for future research. Journal of medical Internet research 19(6),
e218 (2017)
45. Self, C.C.: Credibility. In: An integrated approach to communication theory and
research, pp. 449{470. Routledge (2014)
46. Suominen, H.: CLEFeHealth2012 | The CLEF 2012 Workshop on Cross-Language
Evaluation of Methods, Applications, and Resources for eHealth Document
Analysis. In: Forner, P., Karlgren, J., Womser-Hacker, C., Ferro, N. (eds.) CLEF 2012
Working Notes. CEUR Workshop Proceedings (CEUR-WS.org), ISSN 1613-0073,
http://ceur-ws.org/Vol-1178/ (2012)
47. Suominen, H., Hanlen, L., Goeuriot, L., Kelly, L., Jones, G.: Task 1a of the CLEF
eHealth evaluation lab 2015: Clinical speech recognition. In: Online Working Notes
of CLEF. CLEF (2015)
48. Suominen, H., Kelly, L., Goeuriot, L.: Scholarly in uence of the Conference and
Labs of the Evaluation Forum eHealth Initiative: Review and bibliometric study
of the 2012 to 2017 outcomes. JMIR Research Protocols 7(7), e10961 (2018).
https://doi.org/10.2196/10961
49. Suominen, H., Kelly, L., Goeuriot, L.: The scholarly impact and strategic intent of
CLEF eHealth Labs from 2012 to 2017. In: Ferro, N., Peters, C. (eds.)
Information Retrieval Evaluation in a Changing World: Lessons Learned from 20 Years of
CLEF. pp. 333{363. Springer International Publishing, Cham (2019)
50. Suominen, H., Kelly, L., Goeuriot, L., Krallinger, M.: Clef ehealth evaluation lab
2020. In: Jose, J.M., Yilmaz, E., Magalh~aes, J., Castells, P., Ferro, N., Silva, M.J.,
Martins, F. (eds.) Advances in Information Retrieval. pp. 587{594. Springer
International Publishing, Cham (2020)
51. Suominen, H., Kelly, L., Goeuriot, L., Neveol, A., Ramadier, L., Robert, A.,
Kanoulas, E., Spijker, R., Azzopardi, L., Li, D., Jimmy, Palotti, J., Zuccon, G.:
Overview of the clef ehealth evaluation lab 2018. In: International Conference of the
Cross-Language Evaluation Forum for European Languages, pp. 286{301. Springer
Berlin Heidelberg (2018)
52. Suominen, H., Kelly, L., Goeuriot, L., Neveol, A., Ramadier, L., Robert, A.,
Kanoulas, E., Spijker, R., Azzopardi, L., Li, D., Jimmy, Palotti, J., Zuccon, G.:
Overview of the CLEF eHealth evaluation lab 2018. In: Bellot, P., Trabelsi, C.,
Mothe, J., Murtagh, F., Nie, J.Y., Soulier, L., SanJuan, E., Cappellato, L., Ferro,
N. (eds.) Experimental IR Meets Multilinguality, Multimodality, and Interaction.
pp. 286{301. Springer International Publishing, Cham, Switzerland (2018)
53. Suominen, H., Salantera, S., Velupillai, S., Chapman, W.W., Savova, G., Elhadad,
N., Pradhan, S., South, B.R., Mowery, D.L., Jones, G.J., Leveling, J., Kelly, L.,
Goeuriot, L., Martinez, D., Zuccon, G.: Overview of the ShARe/CLEF eHealth
evaluation lab 2013. In: Information Access Evaluation. Multilinguality,
Multimodality, and Visualization, pp. 212{231. Springer Berlin Heidelberg (2013)
54. Suominen, H., Zhou, L., Goeuriot, L., Kelly, L.: Task 1 of the CLEF eHealth
evaluation lab 2016: Handover information extraction. In: CLEF 2016 Evaluation
Labs and Workshop: Online Working Notes. CEUR-WS (2016)
55. Tiedemann, J.: Parallel data, tools and interfaces in opus. In: Lrec. vol. 2012, pp.</p>
      <p>2214{2218 (2012)
56. Viviani, M., Pasi, G.: Credibility in social media: opinions, news, and health
information|a survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge
Discovery 7(5), e1209 (2017)
57. Wayne, C., Doddington, G., et al.: Tdt2 multilanguage text version 4.0 ldc2001t57.</p>
      <p>Philadelphia: Linguistic Data Consortium (LDC) (2001)
58. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist
reinforcement learning. In: Reinforcement Learning, pp. 5{32. Springer US (1992)
59. Yamamoto, Y., Tanaka, K.: Enhancing credibility judgment of web search results.</p>
      <p>In: Proceedings of the SIGCHI Conference on Human Factors in Computing
Systems. pp. 1235{1244 (2011)
60. Yang, K.C., Varol, O., Davis, C.A., Ferrara, E., Flammini, A., Menczer, F.: Arming
the public with arti cial intelligence to counter social bots. Human Behavior and
Emerging Technologies 1(1), 48{61 (2019)
61. Zuccon, G.: Understandability biased evaluation for information retrieval. In:
Advances in Information Retrieval. pp. 280{292 (2016)
62. Zuccon, G., Palotti, J., Goeuriot, L., Kelly, L., Lupu, M., Pecina, P., Mueller, H.,
Budaher, J., Deacon, A.: The IR Task at the CLEF eHealth Evaluation Lab 2016:
User-centred Health Information Retrieval. In: CLEF 2016 Evaluation Labs and
Workshop: Online Working Notes, CEUR-WS (September 2016)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Buckley</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Voorhees</surname>
            ,
            <given-names>E.M.:</given-names>
          </string-name>
          <article-title>Retrieval evaluation with incomplete information</article-title>
          .
          <source>In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          . pp.
          <volume>25</volume>
          {
          <issue>32</issue>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Carminati</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferrari</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Viviani</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A multi-dimensional and event-based model for trust computation in the social web</article-title>
          .
          <source>In: International Conference on Social Informatics</source>
          . pp.
          <volume>323</volume>
          {
          <fpage>336</fpage>
          . Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Damiani</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Viviani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Trading anonymity for in uence in open communities voting schemata</article-title>
          .
          <source>In: 2009 International Workshop on Social Informatics</source>
          . pp.
          <volume>63</volume>
          {
          <fpage>67</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Demner-Fushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elhadad</surname>
          </string-name>
          , N.:
          <article-title>Aspiring to unintended consequences of natural language processing: A review of recent developments in clinical and consumergenerated text processing</article-title>
          .
          <source>Yearbook of Medical Informatics</source>
          <volume>1</volume>
          ,
          <issue>224</issue>
          {
          <fpage>233</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Filannino</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uzuner</surname>
            ,
            <given-names>O</given-names>
          </string-name>
          .
          <article-title>: Advancing the state of the art in clinical natural language processing through shared tasks</article-title>
          .
          <source>Yearbook of Medical Informatics</source>
          <volume>27</volume>
          (
          <issue>01</issue>
          ),
          <volume>184</volume>
          {
          <fpage>192</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Fogg</surname>
            ,
            <given-names>B.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tseng</surname>
          </string-name>
          , H.:
          <article-title>The elements of computer credibility</article-title>
          .
          <source>In: Proc. of SIGCHI</source>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Fontanarava</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pasi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Viviani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Feature analysis for fake review detection through supervised classi cation</article-title>
          .
          <source>In: 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA)</source>
          . pp.
          <volume>658</volume>
          {
          <fpage>666</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>G.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leveling</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanbury</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Muller, H.,
          <string-name>
            <surname>Salantera</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
          </string-name>
          , G.:
          <source>ShARe/CLEF eHealth Evaluation Lab</source>
          <year>2013</year>
          ,
          <article-title>Task 3: Information retrieval to address patients' questions when reading clinical reports</article-title>
          .
          <source>CLEF 2013 Online Working Notes</source>
          <volume>8138</volume>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>G.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leveling</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lupu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>An Analysis of Evaluation Campaigns in ad-hoc Medical Information Retrieval: CLEF eHealth 2013 and 2014</article-title>
          . Springer Information Retrieval Journal (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pecina</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanbury</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gareth</surname>
            <given-names>J.F.</given-names>
          </string-name>
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>H.M.</given-names>
          </string-name>
          :
          <source>ShARe/CLEF eHealth Evaluation Lab</source>
          <year>2014</year>
          ,
          <article-title>Task 3: User-centred health information retrieval</article-title>
          . In:
          <article-title>CLEF 2014 Evaluation Labs</article-title>
          and Workshop: Online Working Notes. She eld,
          <source>UK</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanlen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grouin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Overview of the clef ehealth evaluation lab 2015</article-title>
          . In: Information Access Evaluation. Multilinguality, Multimodality, and Visualization. Springer Berlin Heidelberg (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kanoulas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spijker</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
          </string-name>
          , G.:
          <article-title>Clef 2017 ehealth evaluation lab overview</article-title>
          .
          <source>In: International Conference of the Cross-Language Evaluation Forum for European Languages</source>
          , pp.
          <volume>291</volume>
          {
          <fpage>303</fpage>
          . Springer Berlin Heidelberg (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miranda-Escalada</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krallinger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pasi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , Gonzales Saez,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Viviani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          :
          <article-title>Overview of the CLEF eHealth evaluation lab 2020</article-title>
          . In: Arampatzis,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Kanoulas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Tsikrika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Vrochidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Joho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Lioma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Eickho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Neveol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Cappellato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          , N. (eds.)
          <string-name>
            <surname>Experimental IR Meets Multilinguality</surname>
          </string-name>
          , Multimodality, and
          <source>Interaction: Proceedings of the Eleventh International Conference of the CLEF Association (CLEF</source>
          <year>2020</year>
          ) .
          <source>Lecture Notes in Computer Science (LNCS) Volume number: 12260</source>
          , Springer, Heidelberg, Germany (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Gra</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kong</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maeda</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>English gigaword</article-title>
          .
          <source>Linguistic Data Consortium, Philadelphia</source>
          <volume>4</volume>
          (
          <issue>1</issue>
          ),
          <volume>34</volume>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Hanbury</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Muller, H.:
          <article-title>Khresmoi { multimodal multilingual medical information search</article-title>
          .
          <source>In: Medical Informatics Europe</source>
          <year>2012</year>
          (
          <article-title>MIE 2012), Village of the Future (</article-title>
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>C.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Community challenges in biomedical text mining over 10 years: Success, failure and the future</article-title>
          .
          <source>Brie ngs in Bioinformatics</source>
          <volume>17</volume>
          (
          <issue>1</issue>
          ),
          <volume>132</volume>
          {
          <fpage>144</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. Jimmy, ., Zuccon,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Palotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          :
          <article-title>Overview of the clef 2018 consumer health search task</article-title>
          . In: Working Notes of Conference and
          <article-title>Labs of the Evaluation (CLEF) Forum</article-title>
          . CEUR Workshop Proceedings (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Kakol</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nielek</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wierzbicki</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Understanding and predicting web content credibility using the content credibility corpus</article-title>
          .
          <source>Information Processing &amp; Management</source>
          <volume>53</volume>
          (
          <issue>5</issue>
          ),
          <volume>1043</volume>
          {
          <fpage>1061</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Overview of the CLEF eHealth evaluation lab 2016</article-title>
          . In:
          <article-title>International Conference of the CrossLanguage Evaluation Forum for European Languages</article-title>
          , pp.
          <volume>255</volume>
          {
          <fpage>266</fpage>
          . Springer Berlin Heidelberg (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schreck</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leroy</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mowery</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velupillai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chapman</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martinez</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
          </string-name>
          , J.:
          <article-title>Overview of the ShARe/CLEF eHealth evaluation lab 2014</article-title>
          . In: Information Access Evaluation. Multilinguality, Multimodality, and Visualization, pp.
          <volume>172</volume>
          {
          <fpage>191</fpage>
          . Springer Berlin Heidelberg (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neves</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kanoulas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Azzopardi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spijker</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scells</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
          </string-name>
          , J.:
          <article-title>Overview of the clef ehealth evaluation lab 2019</article-title>
          . In: Crestani,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Braschler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Savoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Rauber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            , Muller, H.,
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.E.</given-names>
            , Heinatz Burki, G.,
            <surname>Cappellato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          , N. (eds.)
          <string-name>
            <surname>Experimental IR Meets Multilinguality</surname>
          </string-name>
          , Multimodality, and Interaction. pp.
          <volume>322</volume>
          {
          <fpage>339</fpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Koopman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
          </string-name>
          , G.:
          <article-title>Relevation!: an open source system for information retrieval relevance assessment</article-title>
          .
          <source>In: Proceedings of the 37th international ACM SIGIR conference on Research &amp; development in information retrieval</source>
          . pp.
          <volume>1243</volume>
          {
          <fpage>1244</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Lewandowski</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Credibility in web search engines</article-title>
          . In:
          <article-title>Online credibility and digital ethos: Evaluating computer-mediated communication</article-title>
          , pp.
          <volume>131</volume>
          {
          <fpage>146</fpage>
          .
          <string-name>
            <given-names>IGI</given-names>
            <surname>Global</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Lipani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lupu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanbury</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Fixed-cost pooling strategies based on ir evaluation measures</article-title>
          .
          <source>In: European Conference on Information Retrieval</source>
          . pp.
          <volume>357</volume>
          {
          <fpage>368</fpage>
          . Springer (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Livraga</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Viviani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Data con dentiality and information credibility in on-line ecosystems</article-title>
          .
          <source>In: Proceedings of the 11th International Conference on Management of Digital EcoSystems</source>
          . pp.
          <volume>191</volume>
          {
          <issue>198</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Metzger</surname>
            ,
            <given-names>M.J.:</given-names>
          </string-name>
          <article-title>Making sense of credibility on the web: Models for evaluating online information and recommendations for future research</article-title>
          .
          <source>JASIST</source>
          <volume>58</volume>
          (
          <issue>13</issue>
          ),
          <year>2078</year>
          {
          <year>2091</year>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Metzger</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flanagin</surname>
            ,
            <given-names>A.J.</given-names>
          </string-name>
          :
          <article-title>Credibility and trust of information in online environments: The use of cognitive heuristics</article-title>
          .
          <source>Journal of pragmatics 59</source>
          ,
          <volume>210</volume>
          {
          <fpage>220</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Miranda-Escalada</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez-Agirre</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Armengol-Estape</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krallinger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Overview of automatic clinical coding: annotations, guidelines, and solutions for non-english clinical cases at codiesp track of CLEF eHealth 2020</article-title>
          . In: Conference and
          <article-title>Labs of the Evaluation (CLEF) Working Notes</article-title>
          .
          <source>CEUR Workshop Proceedings (CEUR-WS.org)</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29. Mo at, A.,
          <string-name>
            <surname>Zobel</surname>
          </string-name>
          , J.:
          <article-title>Rank-biased precision for measurement of retrieval e ectiveness</article-title>
          .
          <source>ACM Trans. Inf. Syst</source>
          .
          <volume>27</volume>
          (
          <issue>1</issue>
          ), 2:
          <issue>1</issue>
          {2:
          <fpage>27</fpage>
          (Dec
          <year>2008</year>
          ). https://doi.org/10.1145/1416950.1416952, http://doi.acm.
          <source>org/10</source>
          .1145/1416950.1416952
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Mulhem</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saez</surname>
            ,
            <given-names>G.N.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannion</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frej</surname>
          </string-name>
          , J.:
          <article-title>LIG-Health at Adhoc and Spoken IR Consumer Health Search: expanding queries using umls and fasttext</article-title>
          .
          <source>In: Conference and Labs of the Evaluation (CLEF) Working Notes. CEUR Workshop Proceedings (CEUR-WS.org)</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Nogueira</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Task-oriented query reformulation with reinforcement learning</article-title>
          .
          <source>In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics</source>
          (
          <year>2017</year>
          ). https://doi.org/10.18653/v1/d17-
          <fpage>1061</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.M.D.</surname>
          </string-name>
          ,
          <string-name>
            <surname>Marchesin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vezzani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>A Study on Reciprocal Ranking Fusion in Consumer Health Search. IMS UniPD ad CLEF eHealth 2020 Task 2</article-title>
          . In: Conference and
          <article-title>Labs of the Evaluation (CLEF) Working Notes</article-title>
          .
          <source>CEUR Workshop Proceedings (CEUR-WS.org)</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Palotti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanburyn</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>G.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lupu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pecina</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <source>CLEF eHealth Evaluation Lab</source>
          <year>2015</year>
          ,
          <article-title>Task 2: Retrieving Information about Medical Symptoms</article-title>
          .
          <source>In: CLEF 2015 Online Working Notes. CEUR-WS</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Palotti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jimmy</surname>
            , Pecina,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lupu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanbury</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>CLEF 2017 Task Overview: The IR Task at the eHealth Evaluation Lab</article-title>
          . In: Working Notes of Conference and
          <article-title>Labs of the Evaluation (CLEF) Forum</article-title>
          . CEUR Workshop Proceedings (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , Y.:
          <article-title>On the distribution of user persistence for rank-biased precision</article-title>
          .
          <source>In: Proceedings of the 12th Australasian document computing symposium</source>
          . pp.
          <volume>17</volume>
          {
          <issue>24</issue>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>