<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploring Understandability Features to Personalize Consumer Health Search</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Joao Palotti</string-name>
          <email>P@10</email>
          <email>palotti@ifs.tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Navid Rekabsaz</string-name>
          <email>rekabsaz@ifs.tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Vienna University of Technology (TUW) Favoritenstrasse 9-11/188</institution>
          <addr-line>1040 Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        This paper describes the participation of Technical University of Vienna (TUW)
at CLEF eHealth 2017 Task 3 [
        <xref ref-type="bibr" rid="ref5 ref9">5,9</xref>
        ]. This track runs annually since 2013 (see
[
        <xref ref-type="bibr" rid="ref12 ref3 ref4 ref7">3,4,7,12</xref>
        ]) and this year’s challenge is a continuation of 2016’s one. The
Information Retrieval task of CLEF eHealth Lab aims to foster research on search for
health consumers, emphasizing crucial aspects of this domain such as document
understandability and trustworthiness.
      </p>
      <p>In 2016, fifty topics were extracted from real user posts/interactions in the
AskDocs section of Reddit1. Each topic was presented to six query creators with
different medical expertise. Their job was to read a post (usually with a medical
question) and formulate a query using their medical background knowledge, if
any. In total 300 queries were created.</p>
      <p>
        This year, this track has 4 subtasks (named IRTasks, see [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] for a full
description of each task) and TUW submitted runs for two of them, IRTask 1 and 2.
IRTask 1 is the Ad-Hoc task with the same topics of last year, aiming to increase
the number of assessed documents for the collection. IRTask 2 is a new task, and
the goal is to personalize the results for each query creator according to his/her
medical expertise.
      </p>
      <p>The experiments conducted by TUW aim to investigate two research
directions:
1. IRTask 1 : Can understandability metrics be used to improve retrieval?
2. IRTask 2 : How to personalize retrieval in a learning to rank setting,
according to different reading profiles and user expertise?</p>
      <p>
        For IRTask 1, a previous study conducted in the context of CLEF eHealth
2014 and 2015 ([
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]) showed promising improvements when using a small set of
understandability estimators in a learning to rank context. Here we expand the
set of understandability features used as well as non-understandability features
(see Section 2.2). Our aim is to investigate if the improvements first seen in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
1 https://www.reddit.com/r/AskDocs/
would also occur in this dataset. For IRTask 2, we propose to explicitly define
learning to rank features based on different user profiles. We study the effect of
the suggested features in the system effectiveness.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <p>In this section we describe our learning to rank approach, the feature set devised,
and our functions to map topical and understandability assessments into a single
relevance label.
2.1</p>
      <p>Learning to Rank
Our learning to rank approach is based on 3 items: (1) a set of features, (2) a
set of &lt;document, label&gt; pairs, and (3) a learning to rank algorithm. The set
of features is described in Section 2.2. We consider in this work three different
functions to label documents: for Subtask 1, we only use the pure topical
relevance as judged in 2016; for Subtask 2, we define two understandability-biased
function (named boost and float). Given a document with topical relevance T
and understandability score U , and an user with a reading goal G, we define
boost and float as:
boost(T, U ) =
(2 ∗ T</p>
      <p>T
if |G − U | ≤ 0.2
if |G − U | &gt; 0.2
f loat(T, U ) = T ∗ (1.0 − |G − U |)
(1)
(2)</p>
      <p>
        As topical relevance scores are either 0, 1 or 2, and the understandability
scores are float numbers from 0.0 to 1.0, the possible values for function boost
are the integers 4, 2, 1 and 0, while the possible values for function float are any
float precision number between 0.0 and 2.0. All experiments used the pairwise
learning to rank algorithm based on gradient boosting implemented in XGboost2
with NDCG@20 as goal to be optimized. Differently from past work [
        <xref ref-type="bibr" rid="ref10 ref6">10,6</xref>
        ], we
do consider up to 1000 documents when re-ranking documents.
2.2
      </p>
      <p>
        Features
We devised 91 features from 3 distinct groups: information retrieval traditional
features, understandability related features, and the modified output of
regression algorithms made to estimate the understandability of a document.
Elaborated features based on recent advance on semantic similarity, as made in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
are left as future work. A comprehensive list of all features used in this work is
shown in Table 1.
2 https://github.com/dmlc/xgboost/tree/master/demo/rank
IR Features (12)
Understandability Features (72)
Regression Features (7)
      </p>
      <p>Feature Category Feature Name</p>
      <p>BM25</p>
      <p>PL2
Common IR Models (7) LTDeFirmicIuDhrlTFetFLMIDF</p>
      <p>DFRee</p>
      <p>Hiemstra LM
Query Independ. (3) DDDooocccuuummmeeennnttt SLPpeanagmgetRhSacnokres
Doc. Score Modifier (2) DMiavrekrogvenRceanfrdoommRFaienlddomness</p>
      <p>ARI Index
Coleman Liau Index</p>
      <p>Dale-Chall Score
Traditional Formulas (8) FFlleesscchh RKeinacdainidgGEraasdee</p>
      <p>Gunning Fog Index
LIX Index
SMOG Index
# Characters ♦†
# Sentences ♦
# Syllables ♦†
Surface Measures (25) ## (W| oSrydllsa†bles(Word) | &gt; 3) ♦†
# (| Word | &gt; 4) ♦†
# (| Word | &gt; 6) ♦†
# (| Word | &gt; 10) ♦†
# (| Word | &gt; 13) ♦†
RGeleanteerdalFVeaotcuarbeusla(1ry2) sENDtnuaoglpmelw-ibsCoherhrdsDasl♦ilc♦†Lt†iiosnta♦r†y ♦†</p>
      <p>Acronyms ♦†
Mesh ♦†</p>
      <p>DrugBank ♦†
RMeleadtiecdalFVeaotcuarbeusla(2ry7) IMMCeeDdd1iicc0aall(ISPnurteeffifirnxxeaestsi♦o♦n††al classification of Diseases) ♦†</p>
      <p>Consumer Health Vocabulary ♦†
Sum(chv Score) ♦†
Mean(chv Score) ♦†
Ada Boosting Regressor</p>
      <p>Extra Tree Regressor
Modified Regression Scores (7) KG-rNadeiaernetstBNooesigtihnbgorRRegergersessosror</p>
      <p>
        Linear Regression
Support Vector Machine Regressor
Random Forest Regressor
IR Features: Regularly used information retrieval features are considered in
this work. This list includes many commonly used retrieval models and document
specific values, such as Spam scores[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and PageRank scores3.
      </p>
      <p>
        Understandability Features: All HTML pages were preprocessed with
Boilerpipe4 to remove the undesirable boilerplate content as suggested in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Then,
a series of traditional readability metrics was calculated [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], as well as a number
of basic syntactic and lexical features that are important components of such
readability metrics. Finally, we measure the occurrence of words in different
vocabularies, both medical and non-medical ones.
      </p>
      <p>Regression Features: We adapted the output of regression algorithms to
create personalized features. The 2016’s judgements were used as labels for a
num3 http://www.lemurproject.org/clueweb12/PageRank.php
4 https://pypi.python.org/pypi/boilerpipe
ber of regression algorithms (the list of algorithms used is shown in Table 1).
Models were trained on a Latent Semantic Analysis (LSA) applied on words
from 3.549 documents marked as topical relevant in the QRels from 2016, which
understandability label varied from 0 (easy to understand) to 100 (hard to
understand). LSA dimensions vary from 40 to 240 according to the best result of a
10-fold cross validation experiment. In order to avoid interference from the
training set in the learning to rank algorithm, scores for the documents in the training
set were predicted also in a 10-fold cross validation fashion. The personalization
step consisted in calculating the absolute difference between the estimated score
and the goal score, which is defined by user. For example, if the score estimated
by a regression algorithm for a document D was 0.45 and the reading goal of
a user U was 0.80, we used as feature the value 0.35 (the absolute difference
between 80 and 45). We want to evaluate if features like these ones can help the
learning to rank model to adapt according to the reading skills of a user.
3
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>Evaluation Metrics
We consider a large number of evaluation metrics in this work. As topical
relevance centred evaluation metrics, we consider Precision at 10 (P@10) and Rank
Biased Precision with μ parameter set to 0.8 (RBP(0.8)). Due to the fact that a
learning to rank algorithm has the potential to bring many unjudged documents
to the top of the ranking list, we consider also a modified version of P@10, Only
Judged P@10, which will calculate P@10 considering only the first 10 judged
documents of each topic.</p>
      <p>
        As modified metrics that take into account understandability scores, we
consider understandability-biased Rank Biased Precision, also with μ parameter
set to 0.8 (uRBP(0.8)) as proposed by [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and propose three new metrics for
personalized search.
      </p>
      <p>The first personalization-aware metric is a specialization of uRBP, auRBP,
which takes advantage of an α parameter to model the kind of documents a user
wants to read. We assume that α is a parameter that models understandability
profiles of an entity. A low α is assigned to items/documents/users that are
experts, while a high α are the opposite. We assume that a user with a low α
wants to read specialized documents to the detriment of easy and introductory
documents, while laypeople want the opposite. We model in auRBP a penalty
for the case in which a low α document is presented to a user that wants high
α documents and vice versa. While we are still investigating which one is the
best function to model this penalty, we assume a normal penalty. Figure 1 shows
an example in which a user is looking forward to reading documents with α=20
and other values for α would have a penalty associated to them according to this
normal curve with mean 20 and standard deviation of 30. We use the standard
deviation of 30 in all of our experiments, but it is left as future work ways to
estimate a right value for it.</p>
      <p>The second and third personalization-aware metrics are simple modifications
of Precision at depth X. For the relevant documents found in the top X, we
inspect how far is the understandability label of each document to the expected
value required by a user. We could penalize the absolute difference linearly
(LinUndP@X) or using the same Gaussian curve as in auRBP (GaussianUndP@10).
Note that lower values are better for LinUndP@10, meaning that the distance
from the required understandability value is small, and higher values are better
for GaussianUndP@10, as a value of 100 is the best value one could reach.
100
80
60
40
20
0
0
20
40
60
80
100
Seven runs were submitted to IRTasks 1 and another seven were submitted to
IRTask 2. Tables 2 and 3 present a summary of each approach, submissions for
IRTask 1 and 2, respectively, and the results using 2016 Qrels.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Discussion and Conclusion</title>
      <p>As shown in Table 2 and 3, we based our runs on the BM25 implementation from
a Terrier 4.2 index of ClueWeb 12-B. The results of using relevance feedback are
high because the judged as relevant documents appear at the top of the ranking
list of each topic, but it does not necessarily means that these approach will be
much better than a plain BM25 for 2017, as the already judged documents will
be discarded by the organizers.</p>
      <p>Results on CLEF eHealth 2016 QRels</p>
      <p>Results on CLEF eHealth 2016 QRels
17.34
25.76
that higher RBP(0.8) are followed by higher uRBP(0.8) and auRBP(0.8), while
This means that our efforts to retrieve more topical relevant documents also
increases uRBP and auRBP, but does not affect LinUnd. and GausianUnd.
metrics.
better) and 56.96 for GausianUnd.P@10 (the higher the better).</p>
      <p>We are looking forward to evaluating our results with 2017 QRels, but
as the 2017’s assessments are still being conducted, an analysis of the official
results will be posted online at https://github.com/joaopalotti/tuw_at_
clef_ehealth_2017.
CLEF eHealth Evaluation Lab 2016: User-centred Health Information Retrieval.
In CLEF 2016 Evaluation Labs and Workshop: Online Working Notes, CEUR-WS,
September 2016.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Gordon</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Cormack</surname>
            ,
            <given-names>Mark D.</given-names>
          </string-name>
          <string-name>
            <surname>Smucker</surname>
            ,
            <given-names>and Charles L. A.</given-names>
          </string-name>
          <string-name>
            <surname>Clarke</surname>
          </string-name>
          .
          <article-title>Efficient and effective spam filtering and re-ranking for large web datasets</article-title>
          .
          <source>CoRR, abs/1004.5168</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>William</surname>
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Dubay</surname>
          </string-name>
          .
          <article-title>The principles of readability</article-title>
          . Costa Mesa, CA: Impact Information,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Lorraine</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          , Gareth JF Jones,
          <string-name>
            <surname>Liadh Kelly</surname>
          </string-name>
          , Johannes Leveling, Allan Hanbury, Henning Mu¨ller, Sanna Salantera, Hanna Suominen, and Guido Zuccon.
          <source>ShARe/CLEF eHealth Evaluation Lab</source>
          <year>2013</year>
          ,
          <article-title>Task 3: Information retrieval to address patients' questions when reading clinical reports</article-title>
          .
          <source>CLEF 2013 Online Working Notes</source>
          ,
          <volume>8138</volume>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Lorraine</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          , Liadh Kelly,
          <string-name>
            <given-names>Wei</given-names>
            <surname>Lee</surname>
          </string-name>
          , Joao Palotti, Pavel Pecina, Guido Zuccon, Allan Hanbury, and
          <string-name>
            <surname>Henning Mueller Gareth J.F. Jones</surname>
          </string-name>
          .
          <source>ShARe/CLEF eHealth Evaluation Lab</source>
          <year>2014</year>
          ,
          <article-title>Task 3: User-centred health information retrieval</article-title>
          .
          <source>In CLEF 2014 Evaluation Labs and Workshop:</source>
          Online Working Notes, Sheffield, UK,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Lorraine</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          , Liadh Kelly, Hanna Suominen, Aur´elie N´ev´eol, Aude Robert, Evangelos Kanoulas, Rene Spijker, Joao Palotti, and
          <string-name>
            <given-names>Guido</given-names>
            <surname>Zuccon</surname>
          </string-name>
          .
          <article-title>Clef 2017 ehealth evaluation lab overview</article-title>
          .
          <source>In Proceedings of CLEF 2017 - 8th Conference and Labs of the Evaluation Forum. Lecture Notes in Computer Science (LNCS)</source>
          , Springer,
          <year>September 2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Joao</given-names>
            <surname>Palotti</surname>
          </string-name>
          , Lorraine Goeuriot, Guido Zuccon, and
          <string-name>
            <given-names>Allan</given-names>
            <surname>Hanbury</surname>
          </string-name>
          .
          <article-title>Ranking health web pages with relevance and understandability</article-title>
          .
          <source>In Proceedings of the 39th international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <fpage>965</fpage>
          -
          <lpage>968</lpage>
          . ACM,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Joao</given-names>
            <surname>Palotti</surname>
          </string-name>
          , Guido Zuccon, Lorraine Goeuriot, Liadh Kelly, Allan Hanburyn,
          <string-name>
            <given-names>Gareth J.F.</given-names>
            <surname>Jones</surname>
          </string-name>
          , Mihai Lupu, and
          <string-name>
            <given-names>Pavel</given-names>
            <surname>Pecina</surname>
          </string-name>
          .
          <source>CLEF eHealth Evaluation Lab</source>
          <year>2015</year>
          ,
          <article-title>Task 2: Retrieving Information about Medical Symptoms</article-title>
          .
          <source>In CLEF 2015 Online Working Notes. CEUR-WS</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Joao</given-names>
            <surname>Palotti</surname>
          </string-name>
          , Guido Zuccon, and
          <string-name>
            <given-names>Allan</given-names>
            <surname>Hanbury</surname>
          </string-name>
          .
          <article-title>The influence of pre-processing on the estimation of readability of web documents</article-title>
          .
          <source>In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management</source>
          , pages
          <fpage>1763</fpage>
          -
          <lpage>1766</lpage>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Joao</given-names>
            <surname>Palotti</surname>
          </string-name>
          , Guido Zuccon, Jimmy, Pavel Pecina, Mihai Lupu, Lorraine Goeuriot, Liadh Kelly, and
          <string-name>
            <given-names>Allan</given-names>
            <surname>Hanburyn</surname>
          </string-name>
          .
          <article-title>Clef 2017 task overview: The ir task at the ehealth evaluation lab</article-title>
          .
          <source>In Working Notes of Conference and Labs of the Evaluation (CLEF) Forum. CEUR Workshop Proceedings. Proceedings of CLEF 2017 - 8th Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>Luca</given-names>
            <surname>Soldaini</surname>
          </string-name>
          and
          <string-name>
            <given-names>Nazli</given-names>
            <surname>Goharian</surname>
          </string-name>
          .
          <article-title>Learning to Rank for Consumer Health Search: A Semantic Approach</article-title>
          , pages
          <fpage>640</fpage>
          -
          <lpage>646</lpage>
          . Springer International Publishing,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Guido</given-names>
            <surname>Zuccon</surname>
          </string-name>
          .
          <article-title>Understandability biased evaluation for information retrieval</article-title>
          .
          <source>In Proc. of ECIR</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Guido</surname>
            <given-names>Zuccon</given-names>
          </string-name>
          , Joao Palotti, Lorraine Goeuriot, Liadh Kelly, Mihai Lupu, Pavel Pecina, Henning Mueller, Julie Budaher, and
          <string-name>
            <given-names>Anthony</given-names>
            <surname>Deacon</surname>
          </string-name>
          .
          <source>The IR Task at the</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>