<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Study on Query Expansion with MeSH Terms and Elasticsearch. IMS Unipd at CLEF eHealth Task 3</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giorgio Maria Di Nunzio</string-name>
          <email>giorgiomaria.dinunzio@unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexandru Moldovan</string-name>
          <email>alexandru.moldovan@studenti.unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Information Engineering</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Padua</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we describe the rst participation of the Information Management Systems (IMS) group at CLEF eHealth 2018 Task 3, Consumer Health Search Task. In particular, we participated in the subtask IRTask 1: Ad-hoc Search which is a standard ad-hoc search task, aiming at retrieving information relevant to people seeking health advice on the web. The goal of our work is to evaluate 1) di erent query expansion strategies based on the recognition of Medical Subject Headings (MeSH) terms present in the original query; 2) di erent approaches to combine multiple ranking lists given the query expansions. We used Elasticsearch as search engine and the indexes provided by the organizers of this task.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        In this paper, we report the experimental results of our rst participation to
the CLEF eHealth Lab Task 3 [
        <xref ref-type="bibr" rid="ref3 ref4">4, 3</xref>
        ]: \Consumer Health Search". This task
investigates the problem of retrieving documents to support information needs of
health consumers that are confronted with a health problem.
      </p>
      <p>This work is part of a Master Degree thesis in Computer Engineering where
the main goal is to test the e ectiveness of some variants of query expansion
approaches based on the recognition of MeSH terms present in the original query.</p>
      <p>
        The contribution of our experiments to this task can be summarized as
follows:
{ A study of several query expansion approaches that takes into account the
relationships between MeSH terms [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ];
{ An evaluation of di erent document scoring strategies given the multiple
ranking list produced by the query expansions [
        <xref ref-type="bibr" rid="ref1 ref6">1, 6</xref>
        ].
      </p>
      <p>The remainder of the paper will introduce the methodology and a brief
summary of the experimental settings that we used in order to create the runs that
we submitted for the task.</p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <p>
        In this section, we describe the query expansion approaches [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] as well as the
document scoring strategies [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] that we used to create the expanded queries and
the ranked lists.
2.1
      </p>
      <p>Query expansion approaches
For the experiments of the query expansion approach, we used the English
version of the information need. Before running the automatic expansion algorithms,
we performed a manual check in order to correct spelling errors. For example,
for topic 189001, the term gonorrhea is misspelt as gonhrrea.</p>
      <p>Identi cation of MeSH terms After the manual cleaning process, we use the
MeshOnDemand1 API to identify the MeSH terms present in the query.</p>
      <p>For example, for topic 188001 \ca eine high blood pressure" we obtain the
following MeSH terms
{ Ca eine2
{ Hypertension3
plus one additional term</p>
      <p>{ Blood pressure4
Finding related MeSH terms For each MeSH term found in the previous
step, we use the MeSHRDF5 database to look for semantically related (MeSH)
terms. See for example Figure 1 that shows a part of the structure of the tree of
relations among concepts related to the MeSH term Ca eine.</p>
      <p>We choose a subset of all the possible relations (predicates) between terms
in the MeSHRDF database6, and we use this subset of predicates for query
expansion in the following way:
{ Baseline: the original query is used without any additional term.
{ Simple Expansion (SE): given a MeSH term identi ed in the rst step, all
the MesH entries related to that term are kept, except for the predicates
`meshv:Quali er', `meshv:seeAlso', `meshv:broader' e `meshv:broaderDescriptor'.</p>
      <p>Then we re-apply just once (not recursively) a SE for each `child' node.
{ SE broader: like SE, in addition, only for the original MeSH terms (those that
appear in the query) we expand the MesH terms by adding the predicates
`meshv:broader' e `meshv:broaderDescriptor'.</p>
      <sec id="sec-2-1">
        <title>1 https://meshb.nlm.nih.gov/MeSHonDemand</title>
        <p>2 https://meshb.nlm.nih.gov/record/ui?name=Caffeine
3 https://meshb.nlm.nih.gov/record/ui?name=Hypertension
4 https://meshb.nlm.nih.gov/record/ui?name=Blood&amp;#37;Pressure
5 https://id.nlm.nih.gov/mesh/
6 https://hhs.github.io/meshrdf/predicates
{ SE also: like SE broader, instead of adding the predicates `broader', we add
the MesH terms related with the predicate `meshv:seeAlso'.
{ SE broader also: a combination of SE broader and SE also.
{ SE child broader: rst, apply SE, then for each `child' node search for
`parents' di erent from the original MeSH terms.
{ SE recursive down: like SE, for each `child' we recursively apply SE until leaf
nodes are found (no more recursions).</p>
        <p>{ All in one: all the approaches at once.</p>
        <p>At the end of a query expansion process, we have three main data objects:
{ the original query q;
{ a vector m = (m1; m2; : : : ; mn) of MeSH terms associated with the original
query;
{ a list t of expanded terms of n elements where each element ti is another
vector of terms resulting from the iteration of the expansion approach, ti =
(ti1; ti2; : : : ; tij ).</p>
        <p>Given these three objects, we create a set of expanded queries that will be
used to create a number of ranked lists, as explained in the following subsections.
Building expanded query Give the vector of MesH terms m and the list of
expanded terms t, we create a set of expanded queries by means of the following
procedure:
{ For each term term mi of the vector of MeSH terms associated with the
original query,
{ we substitute mi with one of the terms in ti, for example ti1,
{ then, we build the expanded query by joining the original query with the
new vector mi = (m1; : : : ; ti1; : : : ; mn), q = q S mi .</p>
        <p>Therefore, at the end of the process we generate a set V of vectors of expanded
queries where the cardinality jV j is the sum of all the elements in the vectors of
the list t. For each vector of vk 2 V we obtain a list lk of ranked documents.
Merging ranked lists We use di erent approaches to merge the jV j ranked
list into a single list based on the combination of scores of the documents.
{ Average: given a document present in one or more lists, the scores associated
to the document are averaged. Then the documents are ordered in decreasing
order on the basis of this new score.
{ Sum: given a document present in one or more lists, we sum the scores
associated to the documents.
{ Normalized sum: like Sum but the sum of the scores are normalized by the
highest score in the ranked lists.
{ Round robin: for each rank r, we take the document of each list lk at r and
add it to the new ranked list if it has not already been seen.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>For the experiments, we used the Elasticsearch search engine7 and the indexes
provided by the organizers of the task. We used the BM25 ranking function with
default parameters.</p>
      <p>Given the constraints on the number of runs, four in total, that could be
submitted to the task, we submitted one baseline run and three query expansion
variant with the same scoring approach. We will evaluate all the other
combinations as soon as the qrels will be made available.</p>
      <p>In particular, we submitted the following runs that use the Sum (of the
scores) as the document scoring approach:
{ baseline.exp, a baseline run (plain BM25 with no expansion),
{ sum score simple.exp, simple expansion,
{ sum score broader also, simple expansion plus the two predicates broader
and also,
{ sum score recursive.exp, a recursive down approach.</p>
      <p>In Table ??, we show the number of query variants that are generated per
topic by three approaches.</p>
      <sec id="sec-3-1">
        <title>7 https://www.elastic.co/products/elasticsearch</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Final remarks and Future Work</title>
      <p>
        The aim of our rst participation to the CLEF eHealth Task 3 was to test the
e ectiveness of di erent query expansion approaches that use the MeSH term
RDF graph as well as di erent merging approaches of the ranking lists produced
by the query expansion approach. As future work, we will study the combination
of the MesH term expansion with the help of medical terminological records [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
for technologically assisted systematic reviews [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Nick</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Belkin</surname>
          </string-name>
          , Paul Kantor, Edward A.
          <string-name>
            <surname>Fox</surname>
          </string-name>
          , and
          <string-name>
            <surname>Joseph</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Shaw.</surname>
          </string-name>
          <article-title>Combining the evidence of multiple query representations for information retrieval</article-title>
          .
          <source>Information Processing &amp; Management</source>
          ,
          <volume>31</volume>
          (
          <issue>3</issue>
          ):
          <volume>431</volume>
          {
          <fpage>448</fpage>
          ,
          <year>1995</year>
          . The Second Text Retrieval Conference (TREC-2).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Maria Di Nunzio</surname>
          </string-name>
          .
          <article-title>A study of an automatic stopping strategy for technologically assisted medical reviews</article-title>
          .
          <source>In Advances in Information Retrieval - 40th European Conference on IR Research</source>
          , ECIR
          <year>2018</year>
          , Grenoble, France, March 26-29,
          <year>2018</year>
          , Proceedings, pages
          <volume>672</volume>
          {
          <fpage>677</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Jimmy</surname>
          </string-name>
          , Guido Zuccon, Joao Palotti, Lorraine Goeuriot, and
          <string-name>
            <surname>Liadh</surname>
          </string-name>
          . Kelly, editors.
          <source>Overview of the CLEF</source>
          <year>2018</year>
          <article-title>Consumer Health Search Task</article-title>
          .
          <article-title>CLEF 2018 Evaluation Labs</article-title>
          and Workshop: Online Working Notes. CEUR-WS,
          <year>September 2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Hanna</given-names>
            <surname>Suominen</surname>
          </string-name>
          , Liadh Kelly, Lorraine Goeuriot, Evangelos Kanoulas, Leif Azzopardi, Rene Spijker,
          <string-name>
            <given-names>Dan</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Aurelie</given-names>
            <surname>Neveol</surname>
          </string-name>
          , Lionel Ramadier, Aude Robert, Joao Palotti, Jimmy, and Guido Zuccon, editors.
          <source>Overview of the CLEF eHealth Evaluation Lab</source>
          <year>2018</year>
          .
          <source>CLEF 2018 - 8th Conference and Labs of the Evaluation Forum, volume Lecture Notes in Computer Science (LNCS)</source>
          . Springer,
          <year>September 2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Federica</given-names>
            <surname>Vezzani</surname>
          </string-name>
          , Giorgio Maria Di Nunzio, and
          <string-name>
            <given-names>Genevieve</given-names>
            <surname>Henrot</surname>
          </string-name>
          .
          <article-title>Trimed: A multilingual terminological database</article-title>
          .
          <source>In Proceedings of the Eleventh International Conference on Language Resources and Evaluation</source>
          ,
          <string-name>
            <surname>LREC</surname>
          </string-name>
          <year>2018</year>
          , Miyazaki, Japan, May 7-
          <issue>12</issue>
          ,
          <year>2018</year>
          .,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Theodore</surname>
            <given-names>B Wright</given-names>
          </string-name>
          , David Ball, and
          <string-name>
            <given-names>William</given-names>
            <surname>Hersh</surname>
          </string-name>
          .
          <article-title>Query expansion using mesh terms for dataset retrieval: Ohsu at the biocaddie 2016 dataset retrieval challenge</article-title>
          .
          <source>Database</source>
          ,
          <year>2017</year>
          :bax065,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>