<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Team GU-IRLAB at CLEF eHealth 2016: Task 3</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luca Soldaini</string-name>
          <email>luca@ir.cs.georgetown.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Will Edman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nazli Goharian</string-name>
          <email>nazli@ir.cs.georgetown.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Georgetown University Washington</institution>
          ,
          <addr-line>DC</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recent surveys have shown that a growing number internet users seek medical help online. Yet, recent research [12] has shown that many commercial search engine still struggle in completely satisfying the information need of users. In this work, we present a study on the use of medical terms for query reformulation. We use synonyms and hypernyms from a large medical ontology to generate alternative formulations for a query; Results obtained by the reformulated queries are fused using the Borda rank aggregation algorithm.</p>
      </abstract>
      <kwd-group>
        <kwd>medical information retrieval</kwd>
        <kwd>query reformulation</kwd>
        <kwd>Borda rank aggregation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        As reported by a 2013 Pew Survey [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], growing numbers of internet users look
for medical advice on the Internet, many with little or no medical experience.
However, search systems fail to bridge between the layman terms of Internet
users describing their conditions (e.g., \lump with blood spots on nose") and the
illness or disorder they are a icted by (e.g., \basal cell carcinoma"), as shown
by Zuccon, et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        In this manuscript, we present out e orts at the 2016 CLEF eHealth
Information Retrieval Task [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. We proposed a system that generates alternative
formulations of each query using the Uni ed Medical Language System1 (UMLS).
UMLS has been previously exploited to process medical content generated by
laypeople in information retrieval (e.g., [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]), question answering (e.g., [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]), and
information extraction (e.g., [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]) tasks. Thus, we take advantage of it in our
system.
      </p>
      <p>
        In detail, for each query, we use synonyms and hypernyms extracted from
UMLS to produce alternative formulations (example in Table 1); then, we
retrieve results for each generated query; nally, we combine the retrieved results
using Borda rank aggregation algorithm [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This approach was chosen due to
its encouraging performances on the 2015 CLEF eHealth Task 2 dataset [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and
on a small set of query results annotated by the authors.
1 https://www.nlm.nih.gov/research/umls/
original query
infant labored
breathing and tight
wheezing cough
reformulation using
      </p>
      <p>UMLS hypernyms
infant labored breathing and
tight wheezing pulmonary /
upper respiratory disease
reformulation using</p>
      <p>UMLS synonyms</p>
      <p>
        infant labored
respiration and tight
wheezing cough
In this section, we detail our methodology. A summary of the runs submitted to
the shared task is shown in Table 2.2.
As previously mentioned, we reformulate each query using synonyms and antonyms
from the UMLS metathesaurus. To identify concepts in the query, we use MetaMap
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], a tool extracting medical concepts from text documents and mapping them
speci c UMLS concepts. To prevent query drift in our modi ed queries, we
considered UMLS concepts from 16 semantic types that are typically associated with
the four aspect of the medical decision criteria (namely symptoms, diagnostic
tests, diagnoses, and treatments) as suggested by Limsopatham, et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>For each expression ei that is linked to a concept ci in UMLS, we consider the
set of atoms associated with ci as candidate synonyms for ei. To obtain
hypernyms for an expression ei associated with concept ci, we use UMLS relationships
database to obtain any concept cj such that there exists a relationship of type
PAR2 between ci and cj to the concept.</p>
      <p>It is often the case that synonym and hypernym identi ed through UMLS
are quite similar to each other. This is due to the design of UMLS, which favors
redundancy over correctness in aggregating multiple thesauri. To prevent
duplicate queries, we use the Porter stemmer to ensure that no two added terms have
the same stem, and we only add terms with an edit distance greater than four
from other added terms.</p>
      <p>
        To further prevent query drift, we reformulate the query only using those
synonyms and hypernyms that have been deemed useful. Usefulness was
estimated by considering the inverse document frequency (idf ) of each expression
in the collection [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Only those expressions whose idf is greater than 4 are used
to modify the original query. Finally, we limit the number of modi ed queries
for each concept to 8 and omit substitute expression with idf &gt; 11, as extremely
rare synonym/hypernym concepts are less likely to nd relevant results.
2 A PAR edge signi es that the returned concepts is a parent, or hypernym, to the
original concept.
IRTask1
IRTask1
IRTask1
IRTask2
IRTask2
IRTask2
      </p>
      <p>Run
guir en run1*
guir en run2
guir en run3
guir en run1*
guir en run3
guir en run3</p>
      <p>Preprocessing
stemming + case folding
stemming + case folding
stemming + case folding
stemming + case folding
stemming + case folding
stemming + case folding</p>
      <p>Query
Reformulation</p>
      <p>n/a
UMLS synonyms
UMLS hypernyms</p>
      <p>n/a
UMLS synonyms
UMLS hypernyms</p>
      <p>Rank
Aggregation
n/a
Borda
Borda
Borda
Borda
Borda</p>
      <p>
        Finally, once all the reformulated queries are generated, we submit each one of
them to a search engine and retrieve up to 1000 results for each one. We used the
Terrier index kindly provided by the organizers to retrieve relevant documents.
We decided to use Poisson model with Laplace after-e ect and normalization 2
(PL2) Divergence from Randomness (DFR) model for scoring the queries, as it
has been shown to be very e ective for tasks that require early precision [
        <xref ref-type="bibr" rid="ref10 ref6">6, 10</xref>
        ].
We use the Borda rank aggregation algorithm to combine results retrieved by
modi ed queries. In detail, for each modi ed query, each retrieved document is
given a score that is equal to the number of documents ranked below it. The total
score for each document is computed by summing the score for the document
for each modi ed query, and the aggregate ranking is created by placing each
score in descending order. For task 2, we use Borda ranking to combine results
from all queries in a topic.
      </p>
      <p>While we experimented with other forms of rank aggregation on the 2015
CLEF eHealth (average rank, Kemeny rank aggregation), we ultimately decided
to use Borda for all three submitted runs as it yields the best precision and
nDCG after ten retrieved documents.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Experimental Results</title>
      <p>As the ground truth for this task was not available at submission time, we could
not provide any experimental results for the proposed approach.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgments</title>
      <p>This work was partially supported by the National Science Foundation through
grant CNS-1204347 and REU award IIP-1362046.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Aronson</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.-M.</given-names>
            <surname>Lang</surname>
          </string-name>
          .
          <article-title>An overview of metamap: historical perspective and recent advances</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          ,
          <volume>17</volume>
          (
          <issue>3</issue>
          ):
          <volume>229</volume>
          {
          <fpage>236</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>A. B.</given-names>
            <surname>Can</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Baykal</surname>
          </string-name>
          .
          <article-title>MedicoPort: A medical search engine for all</article-title>
          .
          <source>Computer methods and programs in biomedicine</source>
          ,
          <volume>86</volume>
          (
          <issue>1</issue>
          ):
          <volume>73</volume>
          {
          <fpage>86</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>C.</given-names>
            <surname>Dwork</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Naor</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Sivakumar</surname>
          </string-name>
          .
          <article-title>Rank aggregation methods for the web</article-title>
          .
          <source>In Proceedings of the 10th international conference on World Wide Web</source>
          , pages
          <volume>613</volume>
          {
          <fpage>622</fpage>
          . ACM,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>S.</given-names>
            <surname>Fox</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Duggan</surname>
          </string-name>
          .
          <source>Health Online</source>
          <year>2013</year>
          . http://www.pewinternet.org/ Reports/2013/Health-online.aspx,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Grossman</surname>
          </string-name>
          and
          <string-name>
            <given-names>O.</given-names>
            <surname>Frieder</surname>
          </string-name>
          . Information Retrieval: Algorithms and Heuristics. Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>B.</given-names>
            <surname>He</surname>
          </string-name>
          and
          <string-name>
            <surname>I. Ounis.</surname>
          </string-name>
          <article-title>Term frequency normalisation tuning for bm25 and dfr models</article-title>
          .
          <source>In Advances in Information Retrieval</source>
          , pages
          <volume>200</volume>
          {
          <fpage>214</fpage>
          . Springer,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>N.</given-names>
            <surname>Limsopatham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Macdonald</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Ounis.</surname>
          </string-name>
          <article-title>Inferring conceptual relationships to improve medical records search</article-title>
          .
          <source>In Proceedings of the 10th Conference on Open Research Areas in Information Retrieval, OAIR '13</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>L.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Akbari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.-S.</given-names>
            <surname>Chua</surname>
          </string-name>
          .
          <article-title>A joint local-global approach for medical terminology assignment</article-title>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>J.</given-names>
            <surname>Palotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Zuccon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. J.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lupu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Pecina</surname>
          </string-name>
          .
          <article-title>CLEF eHealth evaluation lab 2015, task 2: Retrieving information about medical symptoms</article-title>
          .
          <source>CLEF</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>V.</given-names>
            <surname>Plachouras</surname>
          </string-name>
          and
          <string-name>
            <surname>I. Ounis.</surname>
          </string-name>
          <article-title>Usefulness of hyperlink structure for web information retrieval</article-title>
          .
          <source>In Proceedings of ACM SIGIR</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>A.</given-names>
            <surname>Yates</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Goharian</surname>
          </string-name>
          .
          <article-title>ADRTrace: detecting expected and unexpected adverse drug reactions from user reviews on social media sites</article-title>
          .
          <source>In Advances in Information Retrieval</source>
          , pages
          <volume>816</volume>
          {
          <fpage>819</fpage>
          . Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. G. Zuccon,
          <string-name>
            <given-names>B.</given-names>
            <surname>Koopman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Palotti</surname>
          </string-name>
          .
          <article-title>Diagnose this if you can</article-title>
          .
          <source>In Advances in Information Retrieval (ECIR)</source>
          , pages
          <fpage>562</fpage>
          {
          <fpage>567</fpage>
          . Springer,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. G. Zuccon,
          <string-name>
            <given-names>J.</given-names>
            <surname>Palotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lupu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pecina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mueller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Budaher</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Deacon</surname>
          </string-name>
          .
          <article-title>The IR task at the CLEF eHealth evaluation lab 2015 user-centred health information retrieval</article-title>
          .
          <source>In CLEF 2016 Evaluation Labs and Workshop:</source>
          Online Working Notes. CLEF,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>September 2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>