<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Study on Reciprocal Ranking Fusion in Consumer Health Search. IMS UniPD at CLEF eHealth 2020 Task 2</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giorgio Maria Di Nunzio</string-name>
          <email>giorgiomaria.dinunzio@unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Marchesin</string-name>
          <email>stefano.marchesin@unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federica Vezzani</string-name>
          <email>federica.vezzani@unipd.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Information Engineering</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dept. of Linguistic and Literary Studies</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Dept. of Mathematics</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Padua</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we describe the results of the participation of the Information Management Systems (IMS) group at CLEF eHealth 2020 Task 2, Consumer Health Search Task. In particular, we participated in both subtasks: Ad-hoc IR and Spoken queries retrieval. The goal of our work was to evaluate the reciprocal ranking fusion approach over 1) di erent query variants; 2) di erent retrieval functions; 3) w/out pseudo-relevance feedback. The results show that, on average, the best performances are obtained by a ranking fusion approach together with pseudo-relevance feedback.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        CLEF eHealth is an evaluation challenge where the goal is to provide researchers
with datasets, evaluation frameworks, and events to evaluate the performance of
IR systems in the medical IR domain. In the CLEF eHealth 2020 edition [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the
organizers set up two tasks to evaluate retrieval systems on di erent domains. In
this paper, we report the results of our participation to the Task 2 \Consumer
Health Search" [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This task investigates the problem of retrieving documents
to support the needs of health consumers that are confronted with a health issue.
In particular, we participated in both the subtasks available: the Ad-hoc IR task
and the Spoken queries retrieval task.
      </p>
      <p>
        The contribution of our experiments to both subtasks can be summarized as
follows:
{ A study of a manual query variation approach similar to [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ];
{ An evaluation of a ranking fusion approach [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] on di erent document retrieval
strategies, with or without pseudo-relevance feedback [10].
id type text
151001 original anemia diet therapy
151001 variant 1 anaemia diet cure
151001 variant 2 diet treatment for the decrease in the total amount of red blood cells
(RBCs)or hemoglobin in the blood
152001 original emotional and mental disorders
152001 variant 1 psychiatric disorder
152001 variant 2 psychological disorder
152001 variant 3 mental illness
152001 variant 4 mental disease
152001 variant 5 mental disorder
152001 variant 6 nervous breakdown
152001 variant 7 emotional disturbance such as: anxiety, bipolar, conduct, eating,
obsessive-compulsive (OCD) and psychotic disorders
      </p>
      <p>The remainder of the paper will introduce the methodology and a brief
summary of the experimental settings that we used in order to create the o cial
runs that we submitted for this task.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <p>In this section, we describe the methodology for merging the ranking list provided
by di erent retrieval methods for di erent query variants.
2.1</p>
      <sec id="sec-2-1">
        <title>Subtask 1: Ad-hoc IR</title>
        <p>
          Query variants : In this subtask, we asked to an expert in the eld of medical
Terminology to rewrite the original English query into as many variants as she
preferred. The aim of the query rewriting was to describe in the best possible way
(given the knowledge of the user) the information need expressed by the query.
In table 1, we show the variants for the rst two queries (151001, 152001). These
examples show how the number of variants as well as the complexity of the
request (from a few keywords to complex sentences) may change across queries.
Retrieval models : For each query, we run three di erent retrieval models: the
Okapi BM25 model [9], the divergence from randomness model [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], the language
model using Dirichlet priors [11]. We used the RM3 Positional Relevance model
to implement a pseudo-relevance feedback strategy including query expansion [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
Ranking fusion : Given di erent ranking lists, we used the reciprocal ranking
fusion (RRF) approach to merge them [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
id type text
151001 participant 1 anemia diet changes
151001 participant 2 Diet for anemia
151001 participant 3 What food can i eat on this diet
152001 participant 1 causes of withdrawal
152001 participant 2 What diseases may cause mental health?
152001 participant 3 what mental health conditions can cause mood alterations cause
somebody to become more withdrawn
Query variants : In this subtask, there are already available a number of
query variants that were (audio) recorded by six users. For this task, we used
the di erent transcriptions of these audio les: clean transcript, default variant,
phone enhanced variant, video enhanced variant. In Table 2, we show three
examples of variants (out of six) for the rst two queries.
        </p>
        <p>Retrieval models : for this subtask, we used only the Okapi BM25 retrieval
model and the RM3 pseudo-relevance feedback model.</p>
        <p>Ranking fusion : given di erent ranking participants and di erent transcripts,
we used the RRF approach to merge them.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>In this section, we describe the experimental settings and the results for each
subtask.
3.1</p>
      <sec id="sec-3-1">
        <title>Search Engine</title>
        <p>For all the experiments, we used the Elasticsearch search engine4 and the indexes
provided by the organizers of the task. We used the following parameter settings
for each retrieval model:
{ BM25, k2 = 1.2, b = 0.75
{ LMDirichlet, = 2000
{ DFR, basic model = if, after e ect = b, normalization = h2</p>
        <p>The RM3 pseudo-relevance feedback model was implemented with the
following strategy: pick the 10 most relevant terms from the top 10 ranked documents,
add these terms to the original query with a weight equal 0.5 (while the original
terms are weighted 1.0), run the expanded query and produce the nal ranking
list.
4 https://www.elastic.co/products/elasticsearch
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Runs</title>
        <p>For each subtask, we submitted four runs.</p>
        <p>Subtask 1 . For the Ad-hoc retrieval subtask, the runs are:
{ clef bm25 orig: Only BM25 (no rank fusion) using the original query only;
{ clef original rrf: Reciprocal rank fusion with BM25, QLM, DFR models and
the original query;
{ clef original rm3 rrf: Reciprocal Rank fusion with BM25, QLM, DFR
approaches using RM3 pseudo relevance feedback and the original query;
{ clef variant rrf: BM25 and reciprocal rank fusion on the rankings produced
by the original and manual variants of the query.</p>
        <p>Subtask 2 . For the spoken queries retrieval subtask, the runs are:
{ bm25 rrf: Reciprocal rank fusion with BM25 on the six variants of the query;
{ bm25 rrf rm3: Reciprocal rank fusion with BM25 on the six variants of the
query using pseudo relevance feedback with 10 documents and 10 terms
(query weight 0.5);
{ bm25 all rrf: Reciprocal rank fusion with BM25 on all transcripts of the six
variants of the query (a total of 18 variants per query)
{ bm25 all rrf rm3: Reciprocal rank fusion of BM25 with all transcripts using</p>
        <p>RM3 pseudo relevance feedback.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Results</title>
        <p>The organizers of this task provided the results (averaged across topics) achieved
by many baselines compared to the runs of each participant. In Table 3, we show
a summary of these results.</p>
        <p>A preliminary analysis of the results shows that, in terms of standard
evaluation measures such as MAP, Rprec, and bref, the use of the RM3 relevance
feedback model improves the e ectiveness of the search engine (see Table 3).</p>
        <p>For subtask 1, the use of reciprocal ranking together with RM3 produced
satisfactory results, in most cases better than any baseline for many performance
measures. The run with manual query variants without relevance feedback did
not show any signi cant improvements.</p>
        <p>For subtask 2, the use of pseudo relevance feedback achieved better results.
It is interesting to see that, despite the noise of the formulation of the query
by di erent participants, Precision@5 (P 5) was better, in general, than most of
the baselines.</p>
        <p>In terms of understandability (rRBP) and credibility (cRBP) of the retrieved
results [12], we report in Table 4 the values of these two measures by cut-o
(0.50, 0.50, 0.95) and ordered by map (same ordering of Table 3). From this set
of results, one interesting thing emerges: the readability of the Ad-hoc manual
query variant seems to improve compared to the runs that use the original query.
This will be part of our future work.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
      <p>This work was partially supported by the ExaMode Project, as a part of the
European Union Horizon 2020 Program under Grant 825292.
9. Stephen E. Robertson and Hugo Zaragoza. The probabilistic relevance framework:
BM25 and beyond. Foundations and Trends in Information Retrieval, 3(4):333{
389, 2009.
10. Ian Ruthven and Mounia Lalmas. A survey on the use of relevance feedback for
information access systems. Knowl. Eng. Rev., 18(2):95{145, June 2003.
11. Chengxiang Zhai and John La erty. A study of smoothing methods for language
models applied to ad hoc information retrieval. In Proceedings of the 24th Annual
International ACM SIGIR Conference on Research and Development in
Information Retrieval, SIGIR '01, pages 334{342, New York, NY, USA, 2001. Association
for Computing Machinery.
12. Guido Zuccon. Understandability biased evaluation for information retrieval. In
Nicola Ferro, Fabio Crestani, Marie-Francine Moens, Josiane Mothe, Fabrizio
Silvestri, Giorgio Maria Di Nunzio, Claudia Hau , and Gianmaria Silvello, editors,
Advances in Information Retrieval, pages 280{292, Cham, 2016. Springer
International Publishing.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Gianni</given-names>
            <surname>Amati and Cornelis Joost Van Rijsbergen</surname>
          </string-name>
          .
          <article-title>Probabilistic models of information retrieval based on measuring the divergence from randomness</article-title>
          .
          <source>ACM Trans. Inf</source>
          . Syst.,
          <volume>20</volume>
          (
          <issue>4</issue>
          ):
          <volume>357</volume>
          {
          <fpage>389</fpage>
          ,
          <year>October 2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Gordon</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Cormack</surname>
          </string-name>
          ,
          <string-name>
            <surname>Charles L A Clarke</surname>
            , and
            <given-names>Stefan</given-names>
          </string-name>
          <string-name>
            <surname>Buettcher</surname>
          </string-name>
          .
          <article-title>Reciprocal rank fusion outperforms condorcet and individual rank learning methods</article-title>
          .
          <source>In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '09</source>
          , pages
          <fpage>758</fpage>
          {
          <fpage>759</fpage>
          , New York, NY, USA,
          <year>2009</year>
          .
          <article-title>Association for Computing Machinery</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>D. Frank</given-names>
            <surname>Hsu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Isak</given-names>
            <surname>Taksa</surname>
          </string-name>
          .
          <article-title>Comparing rank and score combination methods for data fusion in information retrieval</article-title>
          .
          <source>Information Retrieval</source>
          ,
          <volume>8</volume>
          (
          <issue>3</issue>
          ):
          <volume>449</volume>
          {
          <fpage>480</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Lorraine</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          , Hanna Suominen, Liadh Kelly, Zhengyang Liu, Gabriella Pasi, Gabriela Saez Gonzales, Marco Viviani, and
          <string-name>
            <given-names>Chenchen</given-names>
            <surname>Xu</surname>
          </string-name>
          .
          <article-title>Overview of the CLEF eHealth 2020 task 2: Consumer health search with ad hoc and spoken queries</article-title>
          .
          <source>In Working Notes of Conference and Labs of the Evaluation (CLEF) Forum, CEUR Workshop Proceedings</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Lorraine</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          , Hanna Suominen, Liadh Kelly, Antonio Miranda-Escalada, Martin Krallinger, Zhengyang Liu, Gabriella Pasi, Gabriela Saez Gonzales, Marco Viviani, and
          <string-name>
            <given-names>Chenchen</given-names>
            <surname>Xu</surname>
          </string-name>
          .
          <article-title>Overview of the CLEF eHealth evaluation lab 2020</article-title>
          . In Avi Arampatzis, Evangelos Kanoulas, Theodora Tsikrika, Stefanos Vrochidis, Hideo Joho, Christina Lioma, Carsten Eickho , Aurelie Neveol, and Linda Cappellato andNicola Ferro, editors,
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction: Proceedings of the Eleventh International Conference of the CLEF Association (CLEF</source>
          <year>2020</year>
          ) , LNCS Volume number:
          <volume>12260</volume>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Yuanhua</given-names>
            <surname>Lv and ChengXiang Zhai</surname>
          </string-name>
          .
          <article-title>Positional relevance model for pseudo-relevance feedback</article-title>
          .
          <source>In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '10</source>
          , pages
          <fpage>579</fpage>
          {
          <fpage>586</fpage>
          , New York, NY, USA,
          <year>2010</year>
          .
          <article-title>Association for Computing Machinery</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Maria Di Nunzio</surname>
          </string-name>
          , Federica Beghini, Federica Vezzani, and
          <string-name>
            <given-names>Genevieve</given-names>
            <surname>Henrot</surname>
          </string-name>
          .
          <article-title>An interactive two-dimensional approach to query aspects rewriting in systematic reviews. IMS unipd at CLEF ehealth task 2</article-title>
          .
          <source>In Working Notes of CLEF 2017 - Conference and Labs of the Evaluation Forum</source>
          , Dublin, Ireland,
          <source>September 11-14</source>
          ,
          <year>2017</year>
          .,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Maria Di Nunzio</surname>
          </string-name>
          , Giacomo Ciu reda, and Federica Vezzani.
          <article-title>Interactive sampling for systematic reviews</article-title>
          .
          <source>IMS unipd at CLEF</source>
          <year>2018</year>
          <article-title>ehealth task 2</article-title>
          . In Working Notes of CLEF 2018 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , Avignon, France,
          <source>September 10-14</source>
          ,
          <year>2018</year>
          .,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>