=Paper=
{{Paper
|id=Vol-1276/MedIR-SIGIR2014-08
|storemode=property
|title=Integrating Understandability in the Evaluation of Consumer Health Search Engines
|pdfUrl=https://ceur-ws.org/Vol-1276/MedIR-SIGIR2014-08.pdf
|volume=Vol-1276
|dblpUrl=https://dblp.org/rec/conf/sigir/ZucconK14
}}
==Integrating Understandability in the Evaluation of Consumer Health Search Engines==
<pdf width="1500px">https://ceur-ws.org/Vol-1276/MedIR-SIGIR2014-08.pdf</pdf>
<pre>
            Integrating Understandability in the Evaluation of
                    Consumer Health Search Engines

                            Guido Zuccon                                                 Bevan Koopman
              Queensland University of Technology                         Australian e-Health Research Centre, CSIRO
                      Brisbane, Australia                                              Brisbane, Australia
                     g.zuccon@qut.edu.au                                          bevan.koopman@csiro.au


ABSTRACT                                                             the top 5 medical related causes of death in US is pre-
In this paper we propose a method that integrates the no-            sented at a readability level (measured by the SMOG, FOG
tion of understandability, as a factor of document relevance,        and Flesch-Kincaid reading indexes [7]) that exceeds that of
into the evaluation of information retrieval systems for con-        the average US citizen (7th grade level). Ahmed et al. [1]
sumer health search. We consider the gain-discount eval-             have highlighted the variability in readability (measured by
uation framework (RBP, nDCG, ERR) and propose two                    the Flesch Reading Ease and the Flesch-Kincaid reading
understandability-based variants (uRBP) of rank biased pre-          index [7]) and quality of concussion information accessed
cision, characterised by an estimation of understandability          through Google searches. The understandability and relia-
based on document readability and by diﬀerent models of              bility of online health information has been considered as a
how readability inﬂuences user understanding of document             critical issue for supporting online consumer health search
content. The proposed uRBP measures are empirically con-             because (1) consumers may not beneﬁt from health infor-
trasted to RBP by comparing system rankings obtained with            mation that is not provided in an understandable way; and
each measure. The ﬁndings suggest that considering under-            (2) the provision of unreliable, misleading or false informa-
standability along with topicality in the evaluation of in-          tion on a health topic, e.g., a medical condition or treatment,
formation retrieval systems lead to diﬀerent claims about            may led to negative health outcomes. This previous research
systems eﬀectiveness than considering topicality alone.              suggests that topicality should not be considered as the only
                                                                     relevance factor for assessing the eﬀectiveness of IR systems
Categories and Subject Descriptors: H.3 [Information                 for consumer health search: other factors, such as under-
Storage and Retrieval]: H.3.3 Information Search and Re-             standability and reliability, should also be included in the
trieval                                                              evaluation framework.
General Terms: Evaluation.                                              Research on the user perception of document relevance
                                                                     has shown that users’ relevance assessments are aﬀected by
1.    INTRODUCTION                                                   a number of factors beyond topicality, although topicality
   Searching for health advice on the Web is an increasingly         has been found to be the essential relevance criteria. For
common practice. A recent research has found that in 2012            example, Xu and Chen proposed and validated a ﬁve-factor
about 58% of US adults (72% of all US Internet users – 66%           model of relevance which consists of novelty, reliability, un-
in 2011) have consulted the Internet for health advice [5];          derstandability, scope, along with topicality [12]. Their em-
of these, 77% have used search engines like Google, Bing,            pirical ﬁndings highlight the importance of understandabil-
or Yahoo! to gather health information, while only 13%               ity, reliability and novelty along with topicality in the rele-
have started their health information seeking activities from        vance judgements they collected. Nevertheless, typical eval-
specialised sites such as WebMD. It is, therefore, crucial to        uation of IR systems commonly considers only relevance as-
create and evaluate information retrieval (IR) systems that          sessments in terms of topicality1 ; this is also the case when
speciﬁcally support consumers searching for health advise            evaluating systems for consumer health search, for example,
on the Web. In this paper, we focus on the evaluation of IR          within CLEF eHealth 2013 [6]. In this paper, we aim to
systems for consumer health search.                                  close this gap in the evaluation of IR systems and focus on
   Previous studies within health informatics have investi-          integrating understandability along with topicality for the
gated online consumer health information beyond topical-             evaluation of consumer health search engines. The integra-
ity to speciﬁc health topics; in particular, with respect to         tion of other factors inﬂuencing relevance, such as reliability,
the understandability and reliability of such information.           are left for future work.
For example, Wiener and Wiener-Pla [11] have investigated               The integration of understandability within the evalua-
the readability (measured by the SMOG reading index [7])             tion methodology is achieved by extending the general gain-
of Web pages concerning pregnancy and the periodontium               discount framework synthesised by Carterette [3]; this frame-
as retrieved by Google, Bing and Yahoo!. Walsh and Vol-              work encompasses the widely-used nDCG, RBP and ERR.
sko [10] have shown that most online information sampled             The result is a series of understandability-biased evaluation
from ﬁve US consumer health organisations and related to             measures. Speciﬁcally, we examine one such measure, the
                                                                     understandability-based rank biased precision (uRBP) – a
Copyright is held by the author/owner(s).
                                                                     variant of rank biased precision (RBP) [8]; variants of nDCG
MedIR July 11, 2014, Gold Coast, Australia.                          1
ACM SIGIR.                                                               With the recent exception of novelty and diversity, e.g., [4].


                                                                32
and ERR may also be derived within our framework.                          relevance, thus P (R|k) = P (T |k), i.e., the probability that
   The proposed evaluation measure is further instantiated                 the document at k is topically relevant (to a query).
by considering speciﬁc estimations of understandability based
on readability measures computed for each retrieved docu-                  2.2 Integrating understandability
ment. While understandability encompasses other aspects                       As discussed by previous work, e.g. [12], relevance is in-
in addition to text readability (e.g., prior knowledge), the               ﬂuenced by many factors; topicality being only one of them
use of readability measures is a good ﬁrst approximation for               – although the most important. To integrate understand-
understandability. This choice is also supported by prior                  ability into the gain-discount framework, we model P (R|k)
work in health informatics regarding understandability of                  as the joint P (T, U |k), i.e. the probability of relevance of a
consumer health information (e.g., see [11, 10, 1]).                       document (at rank k) is estimated using the joint probability
   The impact of the proposed framework and the speciﬁc                    of the document being topical and understandable.
resultant measures on the evaluation of IR systems is inves-                  To compute the joint probability we assume that topical-
tigated in the context of the consumer health search task of               ity and understandability are compositional events and their
CLEF eHealth 2013 [6]; empirical ﬁndings show that systems                 probabilities independent, i.e., P (T, U |k) = P (T |k)P (U |k).
that are most eﬀective according to uRBP are not necessar-                 This is a strong assumption and its limitations are brieﬂy
ily as eﬀective when considering topicality alone (i.e. RBP).              discussed in Section 4. Following this assumption, the gain
                                                                           function in the gain-discount framework is expressed as:
2. UNDERSTANDABILITY-BASED EVALU-                                                                              
                                                                                       g(k) = f (P (R|k)) = f P (T |k)P (U |k)
                                                                                                                               
                                                                                                                                        (2)
   ATION
                                                                              Diﬀerent evaluation measures thatmay be developed   within
                                                                           this framework would instantiate f P (T |k)P (U |k) in dif-
2.1 The gain-discount framework                                            ferent ways. In the following we will propose two RBP-based
  We tackle the problem of jointly evaluating topicality and               instantiations; other instantiations are left for future work.
understandability for measuring IR system eﬀectiveness within
the gain-discount framework synthesised by Carterette [3].                 2.3 Estimating understandability
Within this framework, the eﬀectiveness of a system, con-                     In the traditional TREC settings, assessments about the
veyed by a ranked list of documents, is measured by the                    topicality of a document to a query are collected through
evaluation measure M , deﬁned as:                                          manual annotation of query-document pairs from assessors
                            1 
                                K                                          (i.e., binary or graded relevance assessments3 ); these are
                     M=         g(k)d(k)                       (1)         then turned into estimations of P (T |k). This process may
                            N
                               k=1                                         be mimicked to collect understandability assessments; in this
where g(k) and d(k) are respectively the gain and discount                 paper however we do not explore this possibility. Instead, we
function computed for the document at rank k,2 K is the                    explore the possibility of computing understandability as a
depth of assessment at which the measure is evaluated, and                 property of a document and integrate this in the evaluation
1/N is a (optional) normalisation factor, which serves to                  process, along with standard relevance assessments. To this
bound the value of the sum into the range [0,1] (see [9]).                 aim, readability is used as a proxy for understandability.
   Diﬀerent measures developed within the gain-discount frame-             (The limitations of this choice are brieﬂy noted in Section 4.)
work are characterised by diﬀerent instantiations of its com-              Its use is however justiﬁable because readability is one of the
ponents. For example, the discount function in RBP is                      aspects that inﬂuence the understanding of text.
modelled by d(k) = β k−1 , where β ∈ [0, 1] reﬂects user be-                  To estimate readability (and thus understandability), we
haviour (high values representing persistent users, low values             employ established general readability measures as those
representing impatient users); while in nDCG the discount                  used in [1, 10, 11], e.g., SMOG, FOG and Flesch-Kincaid
function is given by d(k) = 1/(log2 (1 + k)) and in ERR by                 reading indexes. These measures consider the surface level
d(k) = 1/k. Similarly, instantiations of gain functions diﬀer              of the text contained in Web pages, that is, wording and
depending upon the considered measure. In RBP, the gain                    syntax of sentences. In this framework, the presence of long
function is binary-valued (i.e., g(k) = 1 if the document at               sentences, words containing many syllables and unpopular
rank k is relevant, g(k) = 0 otherwise); while for nDCG                    words, are all indicators of diﬃcult text to read [7]. In this
g(k) = 2r(k) − 1 and for ERR g(k) = (2r(k) − 1)/2rmax (with                paper, we use the FOG measure to estimate the readability
r(k) being the relevance grade of the document at rank k).                 of a text; the FOG reading level is computed as
   Without loss of generality, we can express the gain pro-
vided by a document at rank k as a function of its probability                        F OG(d) = 0.4 ∗ (avgslen(d) + phw(d))               (3)
of relevance; for simplicity we shall write g(k) = f (P (R|k)),            where avgslen(d) is the average length of sentences in a
where P (R|k) is the probability of relevance given the docu-              document d and phw(d) is the percentage of hard words
ment at rank k. Note that a similar form has been used for                 (i.e., words with more than two syllables) in d.
the deﬁnition of the gain function for time-biased evaluation                 The use of such general readability measures to assess the
measures [9]. The speciﬁc instantiations of g(k) in measures               readability of documents concerning health information has
like RBP, nDCG and ERR can be seen as the application of                   been questioned [13] as these do not seem to adequately
diﬀerent functions f (.) to estimations of P (R|k).                        correlate with human judgments for documents in this do-
   Traditional TREC-style relevance assessors are instructed               main [13]. Nevertheless, the adoption of standard readabil-
to consider topicality as the only (explicit) factor inﬂuencing            3
                                                                             Recall that although called “relevance assessments”, in TREC-
2
 For simplicity of notation, in the following we override k to rep-        style assessments, annotators are usually instructed to consider
resent the rank position k, or the document at rank k: the context         only the topicality of a document to a query, isolating this factor
of use will determine the meaning of k.                                    from others inﬂuencing relevance in real settings.


                                                                      33
ity measures in this paper is a ﬁrst step towards demon-


                                                                                  1.0
                                                                                                                                                                                                                                                         P1 (U|k)     0.20

strating the use of the proposed understandability biased                                                                                                                                                                                                P2 (U|k)


                                                                                                                                                                                                                                                                             density of redability scores for collection
                                                                                  0.8
measures and analyse how system rankings would change                                                                                                                                                                                                                 0.15

accordingly. In addition, their usage is partially supported


                                                                                  0.6
                                                                                                                                         ●
                                                                                                                                         ●●
                                                                                                                                          ●
                                                                                                                                          ●●
                                                                                                                                           ●
                                                                                                                                           ●●
                                                                                                                                            ●
                                                                                                                                        ●●  ●●
                                                                                                                                             ●
                                                                                                                                        ●

by previous work within health informatics on assessing the                                                                            ●
                                                                                                                                       ●
                                                                                                                                        ●    ●●
                                                                                                                                              ●


                                                                         P(U|k)
                                                                                                                                      ●●      ●●
                                                                                                                                      ●
                                                                                                                                      ●        ●
                                                                                                                                               ●
                                                                                                                                     ●          ●
                                                                                                                                    ●
                                                                                                                                     ●
                                                                                                                                     ●          ●
                                                                                                                                                ●●                                                                                                                    0.10
                                                                                                                                    ●
                                                                                                                                    ●            ●
                                                                                                                                                 ●
                                                                                                                                   ●
                                                                                                                                   ●
                                                                                                                                   ●              ●

readability of online health advice [1, 10, 11].                                                                             ●●
                                                                                                                              ●
                                                                                                                              ●●
                                                                                                                               ●
                                                                                                                               ●●
                                                                                                                                ●
                                                                                                                                ●●
                                                                                                                                 ●
                                                                                                                                 ●●
                                                                                                                                  ●
                                                                                                                                  ●               ●
                                                                                                                                                  ●


                                                                                  0.4
                                                                                                                           ●●
                                                                                                                            ●
                                                                                                                            ●●
                                                                                                                             ●                     ●
                                                                                                                          ●
                                                                                                                          ●●
                                                                                                                           ●                       ●
                                                                                                                        ●●
                                                                                                                         ●
                                                                                                                         ●●                        ●
                                                                                                                       ●
                                                                                                                       ●
                                                                                                                       ●●
                                                                                                                        ●                           ●
                                                                                                                     ●●
                                                                                                                      ●
                                                                                                                      ●                             ●
                                                                                                                    ●●
                                                                                                                     ●                              ●
                                                                                                                    ●
                                                                                                                    ●                                ●
                                                                                                                   ●
                                                                                                                   ●                                 ●
                                                                                                                   ●                                 ●
                                                                                                                  ●
                                                                                                                  ●                                   ●
                                                                                                                  ●                                   ●                                                                                                               0.05
                                                                                                                 ●                                    ●


2.4 Modelling P(U|k)
                                                                                                                 ●
                                                                                                                 ●                                     ●
                                                                                                                                                       ●


                                                                                  0.2
                                                                                                                ●
                                                                                                                ●                                      ●
                                                                                                               ●●                                       ●
                                                                                                                                                        ●
                                                                                                               ●
                                                                                                               ●                                        ●●
                                                                                                              ●                                          ●
                                                                                                                                                         ●
                                                                                                              ●
                                                                                                              ●                                           ●
                                                                                                                                                          ●
                                                                                                             ●
                                                                                                             ●                                            ●●
                                                                                                            ●●                                             ●
                                                                                                                                                           ●●
                                                                                                           ●●
                                                                                                            ●                                               ●
                                                                                                                                                            ●
                                                                                                          ●●
                                                                                                           ●                                                 ●
                                                                                                                                                             ●
                                                                                                                                                             ●
                                                                                                         ●
                                                                                                         ●
                                                                                                         ●●
                                                                                                          ●                                                   ●
                                                                                                                                                              ●
                                                                                                                                                              ●●
                                                                                                       ●
                                                                                                       ●
                                                                                                       ●●
                                                                                                        ●
                                                                                                        ●                                                      ●
                                                                                                                                                               ●●
                                                                                                                                                                ●
                                                                                                                                                                ●●
                                                                                                                                                                 ●
                                                                                                                                                                 ●●
                                                                                                                                                                  ●
                                                                                                                                                                  ●
                                                                                                      ●
                                                                                                      ●                                                            ●
                                                                                                                                                                   ●
                                                                                                                                                                   ●●
                                                                                                                                                                    ●
                                                                                                                                                                    ●●
                                                                                                                                                                     ●
                                                                                                                                                                     ●●

   Given the readability score for a document at rank k,                                            ●
                                                                                                    ●
                                                                                                    ●●
                                                                                                     ●
                                                                                                     ●●                                                               ●
                                                                                                                                                                      ●●
                                                                                                                                                                       ●
                                                                                                                                                                       ●●
                                                                                                                                                                        ●
                                                                                                                                                                        ●●
                                                                                                                                                                         ●
                                                                                                                                                                         ●●
                                                                                                                                                                          ●
                                                                                                                                                                          ●●
                                                                                                 ●
                                                                                                 ●●
                                                                                                  ●
                                                                                                  ●●
                                                                                                   ●
                                                                                                   ●                                                                       ●
                                                                                                                                                                           ●●
                                                                                                                                                                            ●
                                                                                                                                                                            ●●
                                                                                                                                                                             ●
                                                                                                                                                                             ●●
                                                                                                                                                                              ●
                                                                                                                                                                              ●●
                                                                                                                                                                               ●
                                                                                                                                                                               ●●
                                                                                                                                                                                ●
                                                                                                                                                                                ●●


                                                                                  0.0
                                                                                        ●
                                                                                        ●
                                                                                        ●●
                                                                                         ●
                                                                                         ●●
                                                                                          ●
                                                                                          ●●
                                                                                           ●
                                                                                           ●●
                                                                                            ●
                                                                                            ●●
                                                                                             ●
                                                                                             ●●
                                                                                              ●
                                                                                              ●●
                                                                                               ●
                                                                                               ●●
                                                                                                ●
                                                                                                ●●                                                                               ●
                                                                                                                                                                                 ●●
                                                                                                                                                                                  ●
                                                                                                                                                                                  ●●
                                                                                                                                                                                   ●
                                                                                                                                                                                   ●●
                                                                                                                                                                                    ●
                                                                                                                                                                                    ●●
                                                                                                                                                                                     ●
                                                                                                                                                                                     ●●
                                                                                                                                                                                      ●
                                                                                                                                                                                      ●●
                                                                                                                                                                                       ●
                                                                                                                                                                                       ●●
                                                                                                                                                                                        ●
                                                                                                                                                                                        ●●
                                                                                                                                                                                         ●
                                                                                                                                                                                         ●●
                                                                                                                                                                                          ●
                                                                                                                                                                                          ●●
                                                                                                                                                                                           ●
                                                                                                                                                                                           ●●
                                                                                                                                                                                            ●
                                                                                                                                                                                            ●●
                                                                                                                                                                                             ●
                                                                                                                                                                                             ●●
                                                                                                                                                                                              ●
                                                                                                                                                                                              ●●
                                                                                                                                                                                               ●
                                                                                                                                                                                               ●●
                                                                                                                                                                                                ●
                                                                                                                                                                                                ●●
                                                                                                                                                                                                 ●
                                                                                                                                                                                                 ●●
                                                                                                                                                                                                  ●
                                                                                                                                                                                                  ●●
                                                                                                                                                                                                   ●
                                                                                                                                                                                                   ●●
                                                                                                                                                                                                    ●
                                                                                                                                                                                                    ●●
                                                                                                                                                                                                     ●
                                                                                                                                                                                                     ●●
                                                                                                                                                                                                      ●
                                                                                                                                                                                                      ●●
                                                                                                                                                                                                       ●
                                                                                                                                                                                                       ●●
                                                                                                                                                                                                        ●
                                                                                                                                                                                                        ●●
                                                                                                                                                                                                         ●
                                                                                                                                                                                                         ●●
                                                                                                                                                                                                          ●
                                                                                                                                                                                                          ●●
                                                                                                                                                                                                           ●
                                                                                                                                                                                                           ●●
                                                                                                                                                                                                            ●
                                                                                                                                                                                                            ●●
                                                                                                                                                                                                             ●
                                                                                                                                                                                                             ●●
                                                                                                                                                                                                              ●
                                                                                                                                                                                                              ●●
                                                                                                                                                                                                               ●
                                                                                                                                                                                                               ●●
                                                                                                                                                                                                                ●
                                                                                                                                                                                                                ●●
                                                                                                                                                                                                                 ●
                                                                                                                                                                                                                 ●●
                                                                                                                                                                                                                  ●
                                                                                                                                                                                                                  ●●
                                                                                                                                                                                                                   ●
                                                                                                                                                                                                                   ●●
                                                                                                                                                                                                                    ●
                                                                                                                                                                                                                    ●●
                                                                                                                                                                                                                     ●
                                                                                                                                                                                                                     ●●
                                                                                                                                                                                                                      ●
                                                                                                                                                                                                                      ●●
                                                                                                                                                                                                                       ●
                                                                                                                                                                                                                       ●●
                                                                                                                                                                                                                        ●
                                                                                                                                                                                                                        ●●
                                                                                                                                                                                                                         ●
                                                                                                                                                                                                                         ●●
                                                                                                                                                                                                                          ●
                                                                                                                                                                                                                          ●●
                                                                                                                                                                                                                           ●
                                                                                                                                                                                                                           ●●
                                                                                                                                                                                                                            ●
                                                                                                                                                                                                                            ●●
                                                                                                                                                                                                                             ●
                                                                                                                                                                                                                             ●●
                                                                                                                                                                                                                              ●
                                                                                                                                                                                                                              ●●
                                                                                                                                                                                                                               ●
                                                                                                                                                                                                                               ●●
                                                                                                                                                                                                                                ●
                                                                                                                                                                                                                                ●●
                                                                                                                                                                                                                                 ●
                                                                                                                                                                                                                                 ●●
                                                                                                                                                                                                                                  ●
                                                                                                                                                                                                                                  ●●
                                                                                                                                                                                                                                   ●
                                                                                                                                                                                                                                   ●●
                                                                                                                                                                                                                                    ●
                                                                                                                                                                                                                                    ●●
                                                                                                                                                                                                                                     ●
                                                                                                                                                                                                                                     ●●
                                                                                                                                                                                                                                      ●
                                                                                                                                                                                                                                      ●●
                                                                                                                                                                                                                                       ●
                                                                                                                                                                                                                                       ●●
                                                                                                                                                                                                                                        ●
                                                                                                                                                                                                                                        ●●
                                                                                                                                                                                                                                         ●
                                                                                                                                                                                                                                         ●●
                                                                                                                                                                                                                                          ●
                                                                                                                                                                                                                                          ●●
                                                                                                                                                                                                                                           ●
                                                                                                                                                                                                                                           ●●
                                                                                                                                                                                                                                            ●
                                                                                                                                                                                                                                            ●●
                                                                                                                                                                                                                                             ●
                                                                                                                                                                                                                                             ●●
                                                                                                                                                                                                                                              ●
                                                                                                                                                                                                                                              ●●
                                                                                                                                                                                                                                               ●
                                                                                                                                                                                                                                               ●●
                                                                                                                                                                                                                                                ●
                                                                                                                                                                                                                                                ●●
                                                                                                                                                                                                                                                 ●
                                                                                                                                                                                                                                                 ●●
                                                                                                                                                                                                                                                  ●
                                                                                                                                                                                                                                                  ●●
                                                                                                                                                                                                                                                   ●
                                                                                                                                                                                                                                                   ●●
                                                                                                                                                                                                                                                    ●
                                                                                                                                                                                                                                                    ●●
                                                                                                                                                                                                                                                     ●
                                                                                                                                                                                                                                                     ●●
                                                                                                                                                                                                                                                      ●
                                                                                                                                                                                                                                                      ●●
                                                                                                                                                                                                                                                       ●
                                                                                                                                                                                                                                                       ●●
                                                                                                                                                                                                                                                        ●
                                                                                                                                                                                                                                                        ●●
                                                                                                                                                                                                                                                         ●
                                                                                                                                                                                                                                                         ●●
                                                                                                                                                                                                                                                          ●
                                                                                                                                                                                                                                                          ●●
                                                                                                                                                                                                                                                           ●
                                                                                                                                                                                                                                                           ●●
                                                                                                                                                                                                                                                            ●
                                                                                                                                                                                                                                                            ●●
                                                                                                                                                                                                                                                             ●
                                                                                                                                                                                                                                                             ●●
                                                                                                                                                                                                                                                              ●
                                                                                                                                                                                                                                                              ●●
                                                                                                                                                                                                                                                               ●
                                                                                                                                                                                                                                                               ●●
                                                                                                                                                                                                                                                                ●
                                                                                                                                                                                                                                                                ●●
                                                                                                                                                                                                                                                                 ●
                                                                                                                                                                                                                                                                 ●●
                                                                                                                                                                                                                                                                  ●   0.00


P (U |k) needs to be estimated; this is achieved by consid-                             0                          10                          20                          30                          40                          50                          60

                                                                                                                                                      readability score of doc@k
ering user models that encode diﬀerent ways in which a user
is aﬀected by document readability.                                    Figure 1: Distributions for P1 (U|k) and P2 (U|k) with
   We ﬁrst consider a user model P1 (U |k) where a user is             respect to threshold th = 20, along with the density dis-
characterised by a readability threshold th and every docu-            tribution of readability scores (computed using FOG) for
ment that has a readability score below th is considered cer-          the documents in the CLEF eHealth 2013 qrels.
tainly understandable, i.e., P1 (U |k) = 1; while documents            3. EMPIRICAL ANALYSIS
with readability above th are considered not understandable,
i.e. P1 (U |k) = 0. This is a (Heaviside) step function centred
                                                                       3.1 Experiment design and settings
in th; this function is depicted in Figure 1 (P1 (U |k)) with         To understand how accounting for understandability in-
th = 20, along with the FOG readability score distribution         ﬂuences the evaluation of IR systems tailored to searching
for documents from CLEF e-Health 2013 [6]. The use of a            health advice on the Web, we consider the runs submitted to
step function to model P (U |k) is akin to the gain function       the CLEF eHealth 2013 [6], which speciﬁcally aimed at eval-
in RBP (also a step function). The understandability-based         uating systems for this task. Our empirical experiments and
RBP for user model one is then given by:                           subsequent analysis speciﬁcally focus on the changes in sys-
                                                                   tem rankings obtained when evaluating with standard mea-
                                  K                               sures (RBP) and understandability-based measures (uRBP1
              uRBP1 = (1 − β)         β k−1 r(k)u1 (k)       (4)   and uRBP2 ). System rankings are compared using Kendall
                                  k=1                              rank correlation (τ ) and AP correlation [14] (τAP ), which
where, for simplicity of notation, u1 (k) indicates the value      weights higher rank changes that aﬀect top systems. We do
of P1 (U |k) and r(k) is the (topical) relevance assessment of     not experiment with diﬀerent values of β in RBP, and set
document k (alternatively, the value of P(T|k)); thus g(k) =       β = .95 across RBP and uRBP.
f (P (T |k)P (U |k)) = P (T |k)P (U |k) = r(k)u1 (k).                 The document collection used in CLEF eHealth 2013 has
   A second user model (P2 (U |k)) is proposed, where the          been retired due to removal of duplicates and copyrighted
probability estimation is similar to a step function, but smoothed documents; we thus use the CLEF eHealth 2014 collection
in the surroundings of the threshed value; this provides a         (which is a subset of the CLEF eHealth 2013 collection) to
more realistic transition between readable and not-readable        allow reproducibility of the reported results and the 2013
content:                                                         qrels for relevance assessment. For each document in the
                              arctan   F OG(k)  − th               collection, the FOG readability scores (Equation 3) were
                           1
              P2 (U |k) ∝ −                                  (5)   computed – the score distribution for all documents in the
                           2             π
                                                                   CLEF eHealth 2013 qrels is shown in Figure 1. Three thresh-
 where arctan is the arctangent trigonometric function and         olds on the FOG readability values were explored for the
F OG(k) is the FOG readability score of document at rank k;        computation of the two alternative formulations of uRBP:
other readability scores could be used instead of FOG. The         th = 10, 15, 20; documents with a FOG score below 10
distribution of P2 (U |k) values is shown in Figure 1. Equa-       should be near-universally understandable, while documents
tion 5 is not a proper probability distribution, but this can      with FOG scores above 15 and 20 increasingly restrict the
be obtained by normalising
                               Equation 5 by its integral be-   audience able to understand the text.
tween [min F OG(k) , max F OG(k) ]; however Equation 5
is rank equivalent to such distribution, not changing the
                                                                   3.2 Results and analysis
eﬀect on the uRBP variant. These settings lead to the for-            Figure 2 reports RBP vs. uRBP of IR systems participat-
mulation of a second understandability-based RBP, uRBP2 ,          ing to CLEF eHealth 2013 for the two user models proposed
based on the second user model, by simply substituting             in Section 2.4 and for the three readability thresholds con-
u2 (k) = P2 (U |k) to u1 (k) in Equation 4.                        sidered in the experiments. Similarly, Table 1 reports the
   Note that in both understandability-based measures (as          values of Kendall rank correlation (τ ) and AP correlation
well as in the original RBP) the contribution of an irrelevant     (τAP ) between system rankings obtained with RBP and the
document is zero, irrespective of its P (U |k). The contribu-      two  versions of uRBP.
tion (to the gain) of a relevant document with readability            Higher correlation between systems rankings obtained with
score above th is 1 for RBP, 0 for uRBP1 and less than 0.5         RBP and uRBP is observed for higher values of th, irrespec-
for uRBP2 (for uRBP2 the score will quickly tend to 0 the          tively of uRBP version (see Table 1). This is expected as
more the readability score is above the threshold value).          the higher the threshold, the more documents will be charac-
   Finally, note that it is possible to design other user mod-     terised by a P (U |k) = 1 (or ≈ 1 for uRBP2 ), thus reducing
els representing how readability inﬂuences document under-         uRBP to RBP. The fact that in general uRBP2 is correlated
standability; the challenge is to determine which model bet-       with  RBP more than uRBP1 is to RBP highlights the eﬀect
ter represents the relationship between readability and doc-       of smoothing obtained by the arctan function; speciﬁcally,
ument understanding.                                               the increase of readability scores for which P (U |k) is not zero


                                                                  34
                                                                                                                                                                                                                          standability, instead of one as in the standard nDCG.
        0.30


                                                                                                                  0.30
               ●   th=10                                                                                                 ●   th=10
                   th=15                                                                                                     th=15
                   th=20                                                                                                     th=20

                                                                                                                                                                                                                             Xu and Chen [12] have noted that factors of relevance in-
        0.25


                                                                                                                  0.25
                                                                                                                                                                                                                          ﬂuence relevance assessments in diﬀerent proportions, e.g.,
        0.20


                                                                                                                  0.20
uRBP1


                                                                                                          uRBP2
                                                                                                                                                                                                                          in their study, topicality was found to be more inﬂuential
        0.15


                                                                                                                  0.15
                                                                                                                                                                                                                          than understandability. The speciﬁc uRBP measures stud-
        0.10


                                                                                                                  0.10
                                                                                                                                                                                                                          ied here did not consider this aspect; however weighting of
        0.05


                                                                                                                  0.05
                                                                                                                                                                                          ●●●●●
                                                                                                                                                                                  ● ●●●    ●      ●●●● ●  ●
                                                                                                                                                                ●
                                                                                                                                                                ●   ●●
                                                                                                                                                                     ●●
                                                                                                                                                                      ●●
                                                                                                                                                                                    ●
                                                                                                                                                                                  ● ●      ●         ● ●●
                                                                                                                                                                                                   ●●●


                                                                                                                                                                                                                          diﬀerent factors could be accomplished through a diﬀerent
                                                                        ●                                                                            ●●    ● ●● ●
                                                      ●                    ● ● ●●
                                                                                ●●●● ● ●                                                                   ●
                                           ●●         ●   ●●
                                                           ●●●            ●
                                                                        ● ●
                                                                          ●
        0.00


                                                                                                                  0.00
                                                 ● ●● ●
                                                 ●          ●                   ●      ●●● ●●
                                                                                      ●●●   ● ●                                      ●    ●     ●
                           ●    ●     ●


               0.00            0.05       0.10     0.15

                                                          RBP
                                                                 0.20      0.25       0.30        0.35                   0.00            0.05       0.10     0.15

                                                                                                                                                                    RBP
                                                                                                                                                                           0.20     0.25           0.30       0.35
                                                                                                                                                                                                                          f (.) function for converting P (T, U |k) into gain values.
Figure 2: RBP vs. uRBP of CLEF eHealth 2013 sys-                                                                                                                                                                             In this paper, we have used readability as a proxy for un-
tems (left: uRBP1 ; right: uRBP2 ) for varying values of                                                                                                                                                                  derstandability, but this is only one aspect that inﬂuences
threshold on the readability scores (th = 10, 15, 20).                                                                                                                                                                    understandability [12]; future work may explore other fac-
                                                                                                                                                                                                                          tors, e.g., users’ prior knowledge, as well as the presence of
                                                    th = 10                                                th = 15                                               th = 20                                                  images that further explain the textual information. Fur-
        RBP vs.                                    τ = .1277                                              τ = .5603                                             τ = .9574                                                 thermore, readability was estimated using general, surface
        uRBP1                                    τAP = −.0255                                            τAP = .2746                                           τAP = .9261                                                level readability measures. Previous work has shown that
        RBP vs.                                    τ = .5887                                              τ = .6791                                             τ = .9574                                                 these measures are often not suitable to evaluate the read-
        uRBP2                                     τAP = .2877                                            τAP = .4102                                           τAP = .9407                                                ability of health information. For example, Yan et al. [13]
                                                                                                                                                                                                                          claim that people experience the highest readability diﬃcul-
Table 1: Correlation coeﬃcients (τ and τA P) between                                                                                                                                                                      ties at word level rather than at sentence level; they further
system rankings obtained with RBP and uRBP1 or
                                                                                                                                                                                                                          propose a new metric based on concept-based readability,
uRBP2 for diﬀerent values of the readability threshold.
                                                                                                                                                                                                                          speciﬁcally instantiated in the health domain. A number of
beyond th narrows the scope for ranking diﬀerences between                                                                                                                                                                alternative approaches that measure text readability beyond
systems eﬀectiveness. These observations are conﬁrmed in                                                                                                                                                                  the surface characteristics of text have been proposed. Fu-
Figure 2, where only few changes in the rank of systems are                                                                                                                                                               ture work will investigate their use to estimate P(U|k), along
shown for th = 20 (× in Figure 2), while more changes are                                                                                                                                                                 with actual readability assessments collected from users.
found for th = 10 (◦) and th = 15 (+). Note that the small                                                                                                                                                                5.   REFERENCES
diﬀerences in the absolute values of eﬀectiveness recorded
                                                                                                                                                                                                                           [1] O. H. Ahmed, S. J. Sullivan, A. G. Schneiders, and P. R.
by uRBP with th = 10 should not be interpreted as a lack                                                                                                                                                                       McCrory. Concussion information online: evaluation of
of discriminative power. When th = 10 only 1.4% of the                                                                                                                                                                         information quality, content and readability of
documents in the CLEF eHealth 2013 qrels are relevant and                                                                                                                                                                      concussion-related websites. British journal of sports
readable, thus contributing to uRBP.                                                                                                                                                                                           medicine, 46(9):675–683, 2012.
  Figure 2 demonstrates the importance of considering un-                                                                                                                                                                  [2] P. D. Bruza, G. Zuccon, and L. Sitbon. Modelling the
derstandability along with topicality in the evaluation of                                                                                                                                                                     information seeking user by the decision they make. In
                                                                                                                                                                                                                               MUBE 2013, pages 5–6. ACM, 2013.
systems for the considered task. The system ranked highest
                                                                                                                                                                                                                           [3] B. Carterette. System eﬀectiveness, user models, and user
according to RBP (MEDINFO.1.3.noadd) is second to a num-                                                                                                                                                                       utility: a conceptual framework for investigation. In
ber of systems according to uRBP if user understandability                                                                                                                                                                     SIGIR’11, pages 903–912, 2011.
of up to FOG level 15 is wanted. Speciﬁcally, the high-                                                                                                                                                                    [4] C. L. Clarke, N. Craswell, and I. Soboroﬀ. Overview of the
est uRBP1 for th = 10 is achieved by UTHealth_CCB.1.3.                                                                                                                                                                         TREC 2009 web track. In TREC’09, 2009.
noadd, which is ranked 28th according to RBP, and for                                                                                                                                                                      [5] S. Fox and M. Duggan. Health online 2013. Tech. Rep., Pew
th = 15 by teamAEHRC.6.3, which is ranked 19th accord-                                                                                                                                                                         Research Center’s Internet & American Life Project, 2013.
ing to RBP and achieves the highest uRBP2 for th = 10, 15.                                                                                                                                                                 [6] L. Goeuriot, G. Jones, L. Kelly, J. Leveling, A. Hanbury,
                                                                                                                                                                                                                               H. Müller, S. Salanterä, H. Suominen, and G. Zuccon.
4. LIMITATIONS AND CONCLUSIONS                                                                                                                                                                                                 Share/clef ehealth evaluation lab 2013, task 3: Information
                                                                                                                                                                                                                               retrieval to address patients’ questions when reading
   In this paper, we have investigated how understandability                                                                                                                                                                   clinical reports. In CLEF, 2013.
can be integrated in the gain-discount framework for eval-                                                                                                                                                                 [7] D. R. McCallum and J. L. Peterson. Computer-based
uating IR systems. The approach studied here is general                                                                                                                                                                        readability indexes. In ACM’82 Conf., pages 44–48, 1982.
and can be adopted to other factors of relevance, such as                                                                                                                                                                  [8] A. Moﬀat and J. Zobel. Rank-biased precision for
reliability. Information reliability plays an important role in                                                                                                                                                                measurement of retrieval eﬀectiveness. TOIS, 27(1):2, 2008.
consumer health advice search; its integration will be stud-                                                                                                                                                               [9] M. D. Smucker and C. L. Clarke. Time-based calibration of
                                                                                                                                                                                                                               eﬀectiveness measures. In SIGIR’12, pages 95–104, 2012.
ied in future work.
                                                                                                                                                                                                                          [10] T. M. Walsh and T. A. Volsko. Readability assessment of
   In the proposed approach, the relevance (P(R|k)) was                                                                                                                                                                        internet-based consumer health information. Respiratory
modelled as the joint probability P (T, U |k). This joint prob-                                                                                                                                                                care, 53(10):1310–1315, 2008.
ability was assumed to be independent and the two events                                                                                                                                                                  [11] R. C. Wiener and R. Wiener-Pla. Literacy, pregnancy and
to be compositional, thus allowing to derive P (T, U |k) =                                                                                                                                                                     potential oral health changes: The internet and readability
P (T |k)P (U |k) and to treat topicality and understandability                                                                                                                                                                 levels. Maternal and child health journal, pages 1–6, 2013.
separately. This is a strong assumption and it is not neces-                                                                                                                                                              [12] Y. C. Xu and Z. Chen. Relevance judgment: What do
sarily true; alternatives are under investigation, e.g. [2].                                                                                                                                                                   information users consider beyond topicality? JASIST,
                                                                                                                                                                                                                               57(7):961–973, 2006.
   The approach was demonstrated by deriving understand-
                                                                                                                                                                                                                          [13] X. Yan, D. Song, and X. Li. Concept-based document
ability-based variants of RBP; other measures can also be                                                                                                                                                                      readability in domain speciﬁc information retrieval. In
extended, e.g., nDCG and ERR. Note, however, that nDCG-                                                                                                                                                                        CIKM’06, pages 540–549, 2006.
style versions would require normalising the gain function by                                                                                                                                                             [14] E. Yilmaz, J. A. Aslam, and S. Robertson. A new rank
the ideal gain, which in turns requires ﬁnding the optimal                                                                                                                                                                     correlation coeﬃcient for information retrieval. In
ranking based on two criteria, relevance score and under-                                                                                                                                                                      SIGIR’08, pages 587–594, 2008.


                                                                                                                                                                                                                     35

</pre>