<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>QUT IElab at CLEF 2018 Consumer Health Search Task: Knowledge Base Retrieval for Consumer Health Search</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jimmy</string-name>
          <email>jimmy@hdr.qut.edu.au</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guido Zuccon</string-name>
          <email>g.zuccon@qut.edu.au</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bevan Koopman</string-name>
          <email>bevan.koopman@csiro.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Australian E-Health Research Centre, CSIRO</institution>
          ,
          <addr-line>Brisbane</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Queensland University of Technology</institution>
          ,
          <addr-line>Brisbane</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Surabaya (UBAYA)</institution>
          ,
          <addr-line>Surabaya</addr-line>
          ,
          <country country="ID">Indonesia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we describe our participation to the CLEF 2018 Consumer Health Search Task, sub task IRTask1. This track aims to evaluate and advance search technologies aimed at supporting consumers to nd health advice online. Our solution addressed this challenge by extending the Entity Query Feature Expansion model (EQFE), a knowledge base (KB) query expansion method. In previous work we showed that Wikipedia, UMLS and CHV can be e ective as basis for CHS query expansions within the EQFE model. To obtain the query expansion terms, rst, we mapped entity mentions to KB entities by performing exact matching. After mapping, we used the Title of the mapped KB entities as the source for expansion terms. For our rst three expanded query sets, we expanded the original queries sourcing expansion terms from each of Wikipedia, the UMLS, and the CHV. For our fourth expanded query set, we combined expansion terms from Wikipedia and CHV.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The CLEF 2018 Consumer Health Search (CHS) Task aims to retrieve
information relevant to people seeking health advice on the web [11, 13], and is a
continuation of the similar task in CLEF 2017 [
        <xref ref-type="bibr" rid="ref3 ref8">3, 8</xref>
        ], but with a new, more
focused document corpus in place of the more general Clueweb12B document
corpus. To address this task we applied and extended the Entity Query Feature
Expansion model (EQFE), a knowledge base (KB) query expansion method [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
which we have recently found performing competitively on the previous CLEF
e-Health IR challenges [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. By producing query expansions using EQFE, we seek
to overcome the common issue of poor query formulation in CHS; EQFE does
so by reformulating the consumer's health query with more e ective terms (e.g.,
less ambiguous, synonyms, etc.).
      </p>
      <p>One of the major challenges in CHS is the vocabulary mismatch between
people's query terms and the terms used in high quality health web resources.
One source of high quality health related terms is the Uni ed Medical Language
Q: Query
W: Words in query
M: Entity mention
E: Entity
C: Categories</p>
      <p>L : Links
A: Aliases
B: Body
P: Parent
R: Related</p>
      <p>Query
M
E
E</p>
      <p>W
Wikipedia</p>
      <p>E</p>
      <p>UMLS</p>
      <p>E</p>
      <p>CHV
C</p>
      <p>L</p>
      <p>A</p>
      <p>B</p>
      <p>A</p>
      <p>B</p>
      <p>P</p>
      <p>R</p>
      <p>A</p>
      <p>B</p>
      <p>P</p>
      <p>
        R
System (UMLS) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] { in our approach we use the UMLS as one of the sources
for query expansion. However, UMLS concepts are rarely mentioned in consumer
health queries: Keselman et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] showed that only 8.1% of the possible n-grams
constructed from consumer queries can be mapped (i.e., exact match) to UMLS
concepts.
      </p>
      <p>
        In constrast, Wikipedia is a crowdsourced, general purpose KB allowing
people to promote and describe new concepts or augment existing concepts. While
general purpose, Wikipedia contains considerable and detailed health
information that has been e ectively used in health related information retrieval [
        <xref ref-type="bibr" rid="ref5 ref9">5, 9</xref>
        ] {
in our approach we use Wikipedia as one of the sources for query expansion.
      </p>
      <p>
        In addition to UMLS and Wikipedia, we also used the Consumer Health
Vocabulary (CHV) [
        <xref ref-type="bibr" rid="ref6">12, 6</xref>
        ] which was built to provide a mapping between consumer
health terms and UMLS concepts. This mapping was constructed by extracting
n-grams from MedlinePlus queries and various health-focused bulletin boards;
then, automatically mapping these n-grams to UMLS via exact match
comparison. Any un-mapped n-grams are then manually mapped to the UMLS [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. From
2007, the CHV is available as part of the UMLS entries with CHV" as source
(i.e., tuples in table MRCONSO with attribute \SAB" equal to \CHV").
2
      </p>
      <p>Our KB Query Expansion Model for CLEF 2018
We implemented the Entity Query Feature Expansion model for retrieval on the
Wikipedia, UMLS, and CHV as the KB. For the Wikipedia KB, a single entity
is represented by a single Wikipedia page (the page title identi es the entity).
Beyond titles, Wikipedia also contains many page features useful in a retrieval
scenario: entity title (E), categories (C), links (L), aliases (A), and body (B).
As for the UMLS and CHV KBs, a single entity is represented by the most
frequently used terms for a single concept unique identi er (CUI). Features of
Run Id Source of Expansion Terms
1 The title of Wikipedia KB entities
2 The title of UMLS KB entities
3 The title of CHV KB entities
4 The combination of expansion terms from Wikipedia and CHV KBs
a UMLS and CHV KB entity are aliases (A), body (B), parent concepts (P),
and related concepts (R). Figure 1 shows the features we used for mapping the
queries to entities in the KB and as the source of expansion terms. We formally
de ne the query expansion model as:
#^q = X X</p>
      <p>
        M
f
f #f(EM;SE)
(1)
where M are the entity mentions and contain uni-, bi-, and tri-gram generated
from the query; f is a function used to extract the expansion terms. f (0; 1)
is a weighting factor. #f(EM;SE) is a function to map entity mention M to the
KB features EM (e.g., \Title", \Aliases", \Links", \Body", etc.) and extract
expansion terms from source of expansion SE (e.g.,\Title", \Aliases", etc.).
Description of Runs
We submitted 4 runs as described in Table 1. To produce this submission, we
indexed the CLEF2018 corpus using Elasticsearch 5.1.1, with stopping and Porter
stemming. As underlying retrieval model we used BM25F, with btitle = 0:90,
bbody = 0:45 and k1 = 1:2 as these settings were found to be optimal for the
CLEF 2016 eHealth collection. Further, BM25F allows to specify boosting
factors for matches occurring in di erent elds of the indexed web page. We consider
only the title eld and the body eld, with boost factors 1 and 3, respectively.
These were found to be the optimal weights for BM25F for the CLEF 2016
eHealth collection [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] { and we hope these values do translate well into the new
CLEF 2018 CHS collection.
      </p>
    </sec>
    <sec id="sec-2">
      <title>To obtain Run 1, we:</title>
      <p>1. indexed Wikipedia pages with Medicine infobox type and pages with infobox
containing links to medical terminologies such as Mesh, UMLS, SNOMED
CT, etc.
2. extracted uni-, bi-, tri-grams of the original query that matched CHV
entities.
3. exact matched the extracted n-grams to the Wikipedia's aliases.
4. used the title of the matched entities as expansion terms</p>
    </sec>
    <sec id="sec-3">
      <title>To obtain Run 2, we: 1. indexed all English and non-obsolete UMLS concepts.</title>
      <p>Run 1
Run 2
Run 3
Run 4</p>
      <p>Run 1</p>
      <p>
        2. extracted uni-, bi-, tri-grams of the original query that matched entities in
the UMLS (via QuickUMLS [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]).
3. exact matched the extracted n-grams to the UMLS's aliases.
4. used the title of the matched entities as expansion terms.
      </p>
      <p>
        To obtain Run 3, we:
1. indexed English and non-obsolete CHV concepts that associated to the four
key aspects of medical decision criteria (i.e., symptoms, diagnostic test,
diagnoses, and treatments) as used in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
2. extracted uni-, bi-, tri-grams of the original query that matched entities in
the CHV.
3. exact matched the extracted n-grams to the CHV's aliases.
4. used the title of the matched entities as expansion terms.
      </p>
      <p>To obtain Run 4, we combined expansion terms obtained from the Wikipedia
and CHV KBs (run 1 and run 3). Using CLEF 2016 collection, we found that this
combination performed the best when compared to other possible combinations.
3</p>
      <sec id="sec-3-1">
        <title>Discussion</title>
        <p>1 https://github.com/jimmyoentung/RunsCorrelation
highly (positively) correlated. This may have been because queries in Run 4 may
have been expanded using mostly by terms from the CHV KB (as used in Run
3).
4</p>
      </sec>
      <sec id="sec-3-2">
        <title>Conclusions</title>
        <p>In this working notes paper we have discussed the methods used by the QUT
IElab team in their participation to the CLEF 2018 Consumer Health Search
task (subtask 1 { ad-hoc retrieval). We submitted a total of four runs; evaluation
results are not available at this stage.</p>
        <p>Acknowledgment: Jimmy conducted this research as part of his doctoral study
which is sponsored by the Indonesia Endowment Fund for Education (Lembaga
Pengelola Dana Pendidikan / LPDP).
11. Suominen, H., Kelly, L., Goeuriot, L., Kanoulas, E., Azzopardi, L., Spijker, R., Li,
D., Neveol, A., Ramadier, L., Robert, A., Palotti, J., Jimmy, Zuccon, G.: Overview
of the clef ehealth evaluation lab 2018. CLEF 2018 - 8th Conference and Labs
of the Evaluation Forum, Lecture Notes in Computer Science (LNCS), Springer
(September 2018)
12. Zeng, Q.T., Tse, T.: Exploring and developing consumer health vocabularies.
Journal of the American Medical Informatics Association 13(1), 24{29 (2006)
13. Jimmy, Zuccon, G., Palotti, J., Goeuriot, L., Kelly, L.: Overview of the CLEF 2018
Consumer Health Search Task. In: CLEF 2018 Evaluation Labs and Workshop:
Online Working Notes. CEUR-WS (2018)</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bodenreider</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>The uni ed medical language system (umls): integrating biomedical terminology</article-title>
          .
          <source>Nucleic acids research 32(suppl 1)</source>
          ,
          <source>D267{D270</source>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Dalton</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dietz</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Allan</surname>
          </string-name>
          , J.:
          <article-title>Entity Query Feature Expansion Using Knowledge Base Links</article-title>
          .
          <source>In: SIGIR'14</source>
          . pp.
          <volume>365</volume>
          {
          <issue>374</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kanoulas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spijker</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
          </string-name>
          , G.:
          <article-title>Clef 2017 ehealth evaluation lab overview</article-title>
          . In:
          <article-title>International Conference of the Cross-Language Evaluation Forum for European Languages</article-title>
          . pp.
          <volume>291</volume>
          {
          <fpage>303</fpage>
          . Springer (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Jimmy</surname>
            , Zuccon,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koopman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Boosting Titles Does Not Generally Improve Retrieval E ectiveness</article-title>
          .
          <source>In: ADCS'16</source>
          . pp.
          <volume>25</volume>
          {
          <issue>32</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Jimmy</surname>
            , Zuccon,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koopman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , et al.:
          <article-title>Choices in knowledge-base retrieval for consumer health search</article-title>
          .
          <source>In: European Conference on Information Retrieval</source>
          . pp.
          <volume>72</volume>
          {
          <fpage>85</fpage>
          . Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Keselman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Divita</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Browne</surname>
            ,
            <given-names>A.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leroy</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>ZengTreitler</surname>
          </string-name>
          , Q.:
          <article-title>Consumer health concepts that do not map to the umls: where do they t?</article-title>
          <source>Journal of the American Medical Informatics Association</source>
          <volume>15</volume>
          (
          <issue>4</issue>
          ),
          <volume>496</volume>
          {
          <fpage>505</fpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Limsopatham</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Macdonald</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ounis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Inferring conceptual relationships to improve medical records search</article-title>
          .
          <source>In: Proceedings of the Tenth Conference on Open Research Areas in Information Retrieval</source>
          . pp.
          <volume>1</volume>
          {
          <issue>8</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Palotti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jimmy</surname>
            ,
            <given-names>P.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lupu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanbury</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>CLEF 2017 Task Overview: The IR task at the ehealth evaluation lab</article-title>
          . In:
          <string-name>
            <surname>CEUR-WS Proceedings</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Soldaini</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yates</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goharian</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frieder</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Retrieving medical literature for clinical decision support</article-title>
          .
          <source>In: European Conference on Information Retrieval</source>
          . pp.
          <volume>538</volume>
          {
          <fpage>549</fpage>
          . Springer (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Soldaini</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goharian</surname>
          </string-name>
          , N.:
          <article-title>Quickumls: a fast, unsupervised approach for medical concept extraction</article-title>
          . In: MedIR workshop, sigir (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>