<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Improving Personalized Consumer Health Search:</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hua Yang</string-name>
          <email>huayangchn@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Teresa Goncalves</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Department, University of Evora</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ZhongYuan University of Technology</institution>
          ,
          <addr-line>Zhengzhou</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>CLEF 2018 eHealth Consumer Health Search task aims to investigate the e ectiveness of the information retrieval systems in providing health information to common health consumers. Compared to previous years, this year's task includes ve subtasks and adopts new data corpus and set of queries. This paper presents the work of University of Evora participating in two subtasks: IRtask-1 and IRtask-2. It explores the use of learning to rank techniques as well as query expansion approaches. A number of eld based features are used for training a learning to rank model and a medical concept model proposed in previous work is re-employed for this year's new task. Word vectors and UMLS are used as query expansion sources. Four runs were submitted to each task accordingly.</p>
      </abstract>
      <kwd-group>
        <kwd>health information search</kwd>
        <kwd>learning to rank</kwd>
        <kwd>query expansion</kwd>
        <kwd>UMLS</kwd>
        <kwd>word vectors</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        CLEF 2018 eHealth Consumer Health Search (CHS) task is a continuation of
the previous CLEF eHealth information retrieval (IR) tasks that started on
2013 [
        <xref ref-type="bibr" rid="ref3 ref8">8, 3</xref>
        ]. Search engines are commonly used by health consumers seeking
better understanding about health problems or medical conditions. This task
aims to research on the problem of retrieving web pages to a health consumer
for his information needs. The 2018 CHS task includes 5 subtasks and uses a
new web corpus and a new set of queries [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>This paper describes the University of Evora (UEvora) approach to CLEF
2018 eHealth CHS subtasks IRTask-1 and IRtask-2. While IRTask-1 is a standard
ad-hoc search that aims at retrieving relevant information to people seeking
health advice on the web, IRTask-2 is a personalized search task. This task
develops on top of the IRTask-1 and aims to personalize the retrieved list of
search results to match user expertise.</p>
      <p>
        The following questions were investigated by conducting experiments on
CLEF 2018 eHealth CHS Task:
1. How does a model learned from data based on 2016 and 2017 CLEF eHealth
IR task [
        <xref ref-type="bibr" rid="ref6">6, 12</xref>
        ] perform on this year's new data collection and new set of
queries (learning to rank features exploring)?
2. When applying query expansion techniques, as an expansion source, will
domain speci c word embeddings (built from a medical related training corpus)
outperform a domain speci c thesaurus?
3. How does the medical concepts model proposed in previous work [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] perform
on a new task?
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <p>
        To answer the questions proposed in the previous section, di erent approaches
were employed in this work. To answer the rst question, learning to rank
techniques were used: a number of features were explored and assessments results
from 2016 and 2017 CLEF eHealth IR task were used for training a model. For
the second question, a pre-trained word vectors model was used as a source of
query expansion and the result is compared to the one retrieved with query
expansion using the domain speci c United Medical Language System (UMLS)
thesaurus. Finally, to tackle the third question, the model proposed in previous
work [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] was re-employed and tested with this year's new task (composed of a a
new data corpus and new set of queries).
2.1
      </p>
      <sec id="sec-2-1">
        <title>Pre-processing</title>
        <p>All queries were pre-processed with characters lower-casing, stop words removing
and Porter Stemmer stemming. The default stop words list available in the IR
platform Terrier 4.21 was used.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Learning to rank</title>
        <p>The assessment results from 2016 and 2017 CLEF eHealth IR task were employed
and used to train a learning to rank model where a number of elds based features
were explored in this work.</p>
        <p>
          Features extracted. In this work, a simple group of features on di erent elds
were extracted for training a learning to rank model. Three information elds
were considered: title, H1 and else. One kind of the features were using a normal
weighting model on a single eld. BM25 and PL2 were used as the weighting
model, with each weighting model for every eld. Query independent features
and eld length were also taken into account. DL weighting model implementing
a simple document length was used [
          <xref ref-type="bibr" rid="ref4 ref5">5, 4</xref>
          ].
        </p>
        <sec id="sec-2-2-1">
          <title>1 http://terrier.org/</title>
          <p>
            Training a model. Di erent learning to rank algorithms were previously
explored (logistic regression, random forests, LambdaMART, AdaRank and
ListNet) and among all, LabmdaMART algorithm presented the best performance [
            <xref ref-type="bibr" rid="ref1 ref7">1,
7</xref>
            ]. As such, this algorithm was employed in this work.
          </p>
          <p>
            The assessment results from 2016 and 2017 CLEF eHealth IRtask [
            <xref ref-type="bibr" rid="ref6">12, 6</xref>
            ] were
used as the training data. For IRtask-1, the topical relevance results were used;
the result documents were scored with 0, 1 or 2 representing not relevant, relevant
or highly relevant, respectively. For IRtask-2, the understandability scores were
used; the scores ranged from -1 to 100, with a higher score representing higher
understandability. These results were used directly for training and no extra or
further processing was performed.
2.3
          </p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Query expansion with a medical concepts model</title>
        <p>
          A medical concepts model employed [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] as a source for query expansion. First,
cTAKES2, a Natural Language Processing tool, was used to identify the medical
concepts present in a query. Next, the following techniques were applied:
medical phrase concepts processing, medical term concepts processing and query
expansion. Finally, the new terms were added building the new expanded query.
        </p>
        <p>
          Two di erent expansion sources were used: UMLS and word vectors model.
For UMLS based expansion, selected terms were added to the original query and
the approach employed our previous work [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. The word vector model was trained
using 2011 and 2012 TREC Medical Records Track collection and Word2vec with
a skipgram architecture was used as the training tool [11]. The vector dimension
was set to 1000 and a total of 25,469 vectors were included in the model.
2.4
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Pseudo Relevance Feedback</title>
        <p>Besides the query expansion techniques, the pseudo relevance feedback was also
tested for automatic expansion during retrieval process. The number of words
was set to 10 and the number of top-ranked documents from which those words
were extracted was set to 3 in Terrier 4.2 platform.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments and Results</title>
      <p>This section rst brie y presents the IR platform employed in this work, the
dataset and queries for the task, as well as the evaluation measures used for the
assessments. Next is the description of the techniques employed in our
experiments.
3.1</p>
      <sec id="sec-3-1">
        <title>IR model</title>
        <p>Terrier platform version 4.2 was chosen as IR model of the system. The Okapi
BM25 weighting model was used with all the parameters set to default values.</p>
        <sec id="sec-3-1-1">
          <title>2 http://ctakes.apache.org/index.html</title>
          <p>3.2</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Dataset</title>
        <p>The dataset used in CLEF 2018 CHS task consisted of web pages acquired from
the CommonCrawl. By submitting the task queries to the Microsoft Bing APIs
repeatedly over a period of time, an initial list of websites used for acquisition
was returned. Some URLs domains were excluded and a number of know reliable
health websites were added3. Totally, a number of 1,903 sites were included in
the list.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Queries</title>
        <p>
          The basic query set used for CLEF 2018 CHS task consisted of 50 queries written
in English and were issued by the general public to the search service [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>For IRtask-1, this basic set of 50 queries was used as the input to participating
systems. An example of a query is shown in Figure 1.</p>
        <p>IRtask-2 was based on IRtask-1 with 7 variations for each query. The rst
4 variations were issued by people with no medical backgrounds while the
remaining were issued by medical experts. An example for IRtask-2 is shown in
Figure 2.
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Evaluation Measures</title>
        <p>Di erent evaluation measures were used for IRtask-1 and IRtask-2. For IRtask-1
they were: Normalized Discounted Cumulative Gain at depth 10 (NDCG@10),
binary preference-based measure (Bpref) and Rank Biased Precision (RBP) [10].</p>
        <p>For IRtask-2, uRBPgr with alpha was be used for the assessment. Based on
RBP, uRBPgr is calculated as
uRBP = (1</p>
        <p>K
) X k 1r(k)u(k)
k=1
(1)
where the u(k) function is a graded gain function for the understandability
dimension. The parameter attempts to model user behaviour and was set to</p>
        <sec id="sec-3-4-1">
          <title>3 https://sites.google.com/view/clef-ehealth-2018/</title>
          <p>task-3-consumer-health-search/
0.8. The r(k) function is the standard RBP gain function: the value is 1 if the
document at rank k is relevant and 0 if it is irrelevant [10].</p>
          <p>
            In IRTask 2, each topic has 7 query variations. A parameter alpha
capturing user expertise is used when evaluating results for query variations.
Setting alpha to increasing values, an increasing level of medical expertise across
the query variations is modeled [
            <xref ref-type="bibr" rid="ref6">6, 10</xref>
            ]. For this task the value was set to
f0:0; 0:2; 0:4; 0:5; 0:6; 0:8;
1:0g to query variations 1 to 7, respectively.
3.5
          </p>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>Experiments</title>
        <p>Four experiments were conducted for each sub-task. Next paragraphs discuss the
techniques used.</p>
        <p>
          Runs for IRTask-1. UEvoraIRTask1Run1 is based on the Medical Concepts
Model presented in previous work [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. First, cTAKES is used to identify the
medical concepts in a pre-processed query. Next, the identi ed concepts are
expanded with UMLS; an extra weight of 2.0 and 1.5 were set for words expanded
from a phrase concept or from a term concept, respectively. Then, a phrase is
reconstructed into a loose phrase with the maximum interval words set to 2;
the loose phrase is regarded as a must check item during the retrieval process.
Finally, these processed phrases and terms with extra weights are added to the
query.
        </p>
        <p>For UEvoraIRTask1Run2 the same techniques are used but the expansion is
performed with the pre-trained word vectors model.</p>
        <p>UEvoraIRTask1Run3 uses a ranking model trained with the topical
assessments from 2016 and 2017 CLEF eHealth IR task (see sub-section 2.2).</p>
        <p>UEvoraIRTask1Run4 uses the similar techniques to UEvoraIRTask1Run3,
yet using ve folders cross validation to obtain a learning to rank model.
Runs for IRTask-2. Similar techniques and parameters were used for
IRTask2. UEvoraIRTask2Run1 uses the same approach employed UEvoraIRTask1Run1;
the queries and their variations were processed and issued to the retrieval
system. UEvoraIRTask2Run2 uses a similar approach to UEvoraIRTask2Run1 with
queries expanded using a pre-trained word vectors model and for
UEvoraIRTask2Run3 the understandability assessments from 2016 and 2017 CLEF eHealth
IR task were used for training a learning to rank model. Finally and for
UEvoraIRTask2Run4, the same techniques of UEvoraTask2run3 were employed and a
ve cross validation was done when training the learning to rank model.
3.6</p>
      </sec>
      <sec id="sec-3-6">
        <title>Results</title>
        <p>The assessments for the CLEF eHealth 2018 IR tasks are still being conducted,
so they are not available at this time.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and future work</title>
      <p>This working note reports the UEvora team participation in two di erent tasks of
CLEF 2018 eHealth CHS. A number of eld based features were explored while
applying learning to rank techniques. Based on previous work, both UMLS and
a word vector model were applied for performing query expansion.</p>
      <p>As the future work, the methods proposed in this paper will be further
analyzed: di erent learning to rank features will be explored and an ensemble
algorithm will be investigated.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgement</title>
      <p>This work was supported by EACEA under the Erasmus Mundus Action 2,
Strand 1 project LEADER { Links in Europe and Asia for engineering,
eDucation, Enterprise and Research exchanges.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Olivier</given-names>
            <surname>Chapelle</surname>
          </string-name>
          and
          <string-name>
            <given-names>Yi</given-names>
            <surname>Chang</surname>
          </string-name>
          . \
          <article-title>Yahoo! learning to rank challenge overview"</article-title>
          .
          <source>In: Proceedings of the Learning to Rank Challenge</source>
          .
          <year>2011</year>
          , pp.
          <volume>1</volume>
          {
          <fpage>24</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Lorraine</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          et al. \
          <article-title>Meta-analysis of the second phase of empirical and user-centered evaluations"</article-title>
          .
          <source>In: Public Technical Report, Khresmoi Project</source>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Jimmy</surname>
          </string-name>
          et al. \
          <article-title>Overview of the CLEF 2018 Consumer Health Search Task." In: CLEF 2018 Evaluation Labs</article-title>
          and Workshop: Online Working Notes, CEUR-WS,
          <year>September</year>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Craig</given-names>
            <surname>Macdonald</surname>
          </string-name>
          , Rodrygo LT Santos, and Iadh Ounis. \
          <article-title>The whens and hows of learning to rank for web search"</article-title>
          .
          <source>In: Information Retrieval 16.5</source>
          (
          <issue>2013</issue>
          ), pp.
          <volume>584</volume>
          {
          <fpage>628</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Craig</given-names>
            <surname>Macdonald</surname>
          </string-name>
          et al. \
          <article-title>About learning models with multiple query dependent features"</article-title>
          .
          <source>In: ACM Transactions on Information Systems (TOIS) 31.3</source>
          (
          <issue>2013</issue>
          ), p.
          <fpage>11</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Joao</given-names>
            <surname>Palotti</surname>
          </string-name>
          et al. \
          <article-title>Clef 2017 task overview: The ir task at the ehealth evaluation lab"</article-title>
          . In: Working Notes of Conference and
          <article-title>Labs of the Evaluation (CLEF) Forum</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          .
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Luca</given-names>
            <surname>Soldaini</surname>
          </string-name>
          and
          <string-name>
            <given-names>Nazli</given-names>
            <surname>Goharian</surname>
          </string-name>
          . \
          <article-title>Learning to rank for consumer health search: a semantic approach"</article-title>
          .
          <source>In: European Conference on Information Retrieval</source>
          . Springer.
          <year>2017</year>
          , pp.
          <volume>640</volume>
          {
          <fpage>646</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Hanna</given-names>
            <surname>Suominen</surname>
          </string-name>
          et al. \
          <source>Overview of the CLEF eHealth Evaluation Lab</source>
          <year>2018</year>
          .
          <article-title>"</article-title>
          <source>In: CLEF 2018 - 8th Conference and Labs of the Evaluation Forum, Lecture Notes in Computer Science (LNCS)</source>
          , Springer,September,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Hua</given-names>
            <surname>Yang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Teresa</given-names>
            <surname>Goncalves</surname>
          </string-name>
          . \
          <article-title>UEvora at CLEF eHealth 2017 Task 3"</article-title>
          . In: Working Notes of Conference and
          <article-title>Labs of the Evaluation (CLEF) Forum</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          .
          <year>2017</year>
          . Guido Zuccon. \
          <article-title>Understandability biased evaluation for information retrieval"</article-title>
          .
          <source>In: European Conference on Information Retrieval</source>
          . Springer.
          <year>2016</year>
          , pp.
          <volume>280</volume>
          {
          <fpage>292</fpage>
          .
          <string-name>
            <surname>Guido</surname>
          </string-name>
          Zuccon et al. \
          <article-title>Integrating and evaluating neural word embeddings in information retrieval"</article-title>
          .
          <source>In: Proceedings of the 20th Australasian document computing symposium. ACM</source>
          .
          <year>2015</year>
          , p.
          <fpage>12</fpage>
          .
          <string-name>
            <surname>Guido</surname>
          </string-name>
          Zuccon et al. \
          <article-title>The IR Task at the CLEF eHealth evaluation lab 2016: user-centred health information retrieval"</article-title>
          .
          <source>In: CLEF 2016-Conference and Labs of the Evaluation Forum</source>
          . Vol.
          <volume>1609</volume>
          .
          <year>2016</year>
          , pp.
          <volume>15</volume>
          {
          <fpage>27</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>