<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Learning to rank for Consumer Health Search</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hua Yang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaoming Liu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Binbin Zheng</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guan Yang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Informatics, University of Évora.</institution>
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computer Science, Zhongyuan University of Technology</institution>
          ,
          <addr-line>Zhengzhou</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>CLEF 2021 eHealth Consumer Health Search task aims to investigate the efectiveness of the information retrieval systems in providing health information to common health consumers. Compared to previous years, this year's task includes three sub-tasks and adopts a new data corpus and set of queries. This paper presents the work of the Zhongyuan University of Technology participating in Subtask 1. It explores the use of learning to rank techniques in consumer health search. A number of retrieval features are used, and eight diferent learning to rank algorithms are then applied to train the models. The best four models are used to re-rank the documents and four runs are submitted to the subtask.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;consumer health</kwd>
        <kwd>information retrieval</kwd>
        <kwd>learning to rank</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Methods</title>
      <p>
        In the information retrieval area, machine learning techniques can be applied to build ranking
models for the information retrieval systems, and this is known as Learning to Rank (LTR) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>Typically, the training data consists of three elements: training queries Q, the associated
documents D, and the corresponding relevance judgments or the gold standard qrel file for
the query and document pairs. The learning algorithms are then used to generate a learning
to rank model. The creation of testing data for evaluation is very similar to the creation of
the training data which includes the testing queries and the associated documents. To these
testing queries, the learning to rank model is jointly used with a retrieval model and to sort the
documents according to their relevance to the query, and return a corresponding ranked list of
the documents as the response to the query.</p>
      <p>
        Learning to rank methods has been proposed based on diferent machine learning algorithms.
Typically, existing learning to rank can be categorized into three main groups: pointwise,
pairwise, and listwise approaches. The pointwise approaches, for example, MART [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and
Random Forests [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], regard the relevance degrees as numerical or ordinal scores, and the
learning to rank problem is formulated as a regression or a classification problem. The pairwise
approaches, for example, RankBoost [9], LambdaMART [10], and RankNet [11] deal with
the ranking problem by treating documents pairs as training instances, and trains models
via the minimization of related risks. The listwise approaches, for example, ListNet [12] and
AdaRank [13], regard an entire set of documents associated with a query as instances in the
training, and trains a ranking function through the minimization of a listwise loss function.
Table 1 summarizes a number of the widely used algorithms according to each LTR approach.
      </p>
      <p>In this paper, the dataset and the assessment results from the 2018 CLEF eHealth IR task are
used for training the learning to rank models. A number of retrieval features are explored.</p>
      <sec id="sec-2-1">
        <title>2.1. Features Explored for Learning to Rank</title>
        <p>In this work, only the regularly used information retrieval features are used to train learning
to rank models. They are extracted from a group of 22 diferent retrieval models [ 14, 15], as
presented in Table 2.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Training Learning to Rank Models</title>
        <p>
          We build models using eight state-of-the-art learning to rank methods, including two
pointwise algorithms, two pair-wise algorithms, and four list-wise algorithms. The point-wise
algorithms are MART [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] utilizing gradient boosting regression trees, and Random Forests [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]
using regression. The pair-wise algorithms are RankNet [11] employing relative entropy as a
loss function and gradient descent to train a neural network model, and RankBoost [9] based
on boosting. The list-wise algorithms include AdaRank [13] based on boosting, Coordinate
Ascent [16] where the ranking scores are calculated as weighted combinations of the feature
values, LambdaMART [10] combining MART and LambdaRank and directly optimize NDCG in
training, and ListNeT [12] based on neural networks.
        </p>
        <p>
          The dataset and the topical relevance assessments of the 2018 CLEF eHealth IRtask [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] are
used as the training data. In the assessment files, the corresponding documents are scored with
0, 1, or 2, which represent not relevant, relevant, or highly relevant, respectively.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments and Results</title>
      <p>This section first presents the experimental settings, the dataset and queries for the subtask,
and the evaluation measures used for the assessments. Then we describe the experiments we
performed and analyze the results.</p>
      <sec id="sec-3-1">
        <title>3.1. Experimental Settings</title>
        <p>Terrier1 platform version 5.4 is chosen as the IR model of the system. The Okapi BM25 weighting
model is used as the retrieval model, with all the parameters set to default values (k_1 = 1.2d,
k_3 = 8d, b = 0.75d). All developed learning to rank models are implemented with RankLib2
version 2.15.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Dataset</title>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Topics</title>
        <p>
          The dataset of the CLEF 2021 CHS task is basically constructed using the collection introduced
in CLEF 2018 IR task, and extended with additional webpages and social media content. Totally,
the collection consists of over 5 million medical webpages from selected domains acquired from
the CommonCrawl and other resources [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>Totally 55 topics are used in the CLEF 2021 CHS task, and they are based on realistic search
scenarios. These topics are divided into two sets. The reddit-topics set includes 25 topics that are
based on use cases from discussion forums. These queries are extracted and manually selected
from Google trends to best fit each use case. The patients-topics set includes 30 topics which are
based on discussions with multiple sclerosis and diabetes patients. These queries are manually
generated by experts from established search scenarios. Figure 1 presents the example topics
used in the task.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Pre-processing</title>
        <p>All queries are pre-processed with characters lower-casing, stop words removing and Porter
Stemmer stemming. The default stop words list available in the IR platform Terrier 5.4 is used.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Evaluation Measures</title>
        <p>The task takes into account 3 dimensions in the relevance evaluation: topical relevance,
understandability, and credibility. The ability of systems to retrieve relevant, readable, and credible
documents for the topics, and the ability of systems to retrieve all kinds of documents (web or
2https://sourceforge.net/p/lemur/wiki/RankLib/
social media) are both considered. Evaluation measures used are NDCG@10, BPref, and RBP, as
well as other metrics adapted to other relevance dimensions such as uRBP.</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.6. Experiments</title>
        <p>Using the data from the CLEF 2018 ehealth IR task, we totally train eight learning to rank
models. The loss function used to train the learning to rank model is NDCG@10. We choose
the best four performed LTR models and use them in this year’s task. The evaluation of these
top four LTR models is presented in Table 3.</p>
        <p>The top 1,000 relevant documents for each query are retrieved using the BM25 retrieval model
in Terrier. The selected four models are then used to re-rank the initial results obtained with
the BM25 retrieval model, and four runs are generated for the final submission.</p>
      </sec>
      <sec id="sec-3-7">
        <title>3.7. Results</title>
        <p>For each topic, 250 documents have been assessed in three relevance dimensions. And we
compare our four run results to the six baselines, as shown in Table 4.</p>
        <p>We first compare the performance among our four implemented models. The best result
was obtained by the model m_rf which used Random Forests learning to rank algorithm, then
followed by the model r_rb with RankBoost algorithm and the model m_lm with LambdaMART
algorithm. On average, the model m_mr with MART algorithm achieved the worst result,
although it showed somewhat better results in MAP and two cRBP measures when compared
to the model m_lm.</p>
        <p>Then we compare the best model m_rf with the baselines. When compared in MAP, this
model was able to surpass all baselines. In Bpref, the model showed better results than the
DirichletLM_qe baseline, but failed with other baselines. In the rRBP measures, the model
showed better results than the two DirichletLM baselines. In the cRBP and the RBP measures,
the model surpassed the baseline BM25 and the two DirichletLM baselines.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and Future Work</title>
      <p>This paper reports the ZUT team participation in the CLEF 2021 eHealth CHS Subtask 1. Using
the data from the CLEF 2018 eHealth IR task, a number of retrieval features are explored and
eight learning to rank algorithms are used to train the LTR models. The top performed LTR
models are used in the CLEF 2021 eHealth IR task Subtask1. In the future work, the methods
binary
rRBP
graded
rRBP
binary
cRBP
graded
cRBP
binary
RBP
graded
RBP
proposed in this paper will be further analyzed: diferent learning to rank features will be
explored, and an ensemble algorithm will be investigated.
[9] Y. Freund, R. Iyer, R. E. Schapire, Y. Singer, An eficient boosting algorithm for combining
preferences, Journal of machine learning research 4 (2003) 933–969.
[10] Q. Wu, C. J. Burges, K. M. Svore, J. Gao, Adapting boosting for information retrieval
measures, Information Retrieval 13 (2010) 254–270.
[11] C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, G. Hullender, Learning
to rank using gradient descent, in: Proceedings of the 22nd international conference on
Machine learning, 2005, pp. 89–96.
[12] Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, H. Li, Learning to rank: from pairwise approach
to listwise approach, in: Proceedings of the 24th international conference on Machine
learning, ACM, 2007, pp. 129–136.
[13] J. Xu, H. Li, Adarank: a boosting algorithm for information retrieval, in: Proceedings of
the 30th annual international ACM SIGIR conference on Research and development in
information retrieval, ACM, 2007, pp. 391–398.
[14] C. Macdonald, R. L. Santos, I. Ounis, B. He, About learning models with multiple query
dependent features, ACM Transactions on Information Systems (TOIS) 31 (2013) 11.
[15] C. Macdonald, R. L. Santos, I. Ounis, The whens and hows of learning to rank for web
search, Information Retrieval 16 (2013) 584–628.
[16] D. Metzler, W. B. Croft, Linear feature-based models for information retrieval, Information
Retrieval 10 (2007) 257–274.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Suominen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Alemany</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Bassani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Brew-Sam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cotik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Filippo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>González-Sáez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Luque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mulhem</surname>
          </string-name>
          , G. Pasi,
          <string-name>
            <given-names>R.</given-names>
            <surname>Roller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Seneviratne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Upadhyay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vivaldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Viviani</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Xu, Overview of the clef ehealth evaluation lab 2021</article-title>
          .,
          <source>in: CLEF 2021 - 11th Conference and Labs of the Evaluation Forum, Lecture Notes in Computer Science (LNCS)</source>
          , Springer,
          <year>September 2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Suominen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kanoulas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Azzopardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Spijker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Névéol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ramadier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Robert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Palotti</surname>
          </string-name>
          , Jimmy,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Zuccon, Overview of the clef ehealth evaluation lab 2018</article-title>
          .,
          <source>in: CLEF 2018 - 8th Conference and Labs of the Evaluation Forum, Lecture Notes in Computer Science (LNCS)</source>
          , Springer,September„
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Jimmy</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Palotti</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Kelly</surname>
          </string-name>
          ,
          <article-title>Overview of the clef 2018 consumer health search task., in: CLEF 2018 Evaluation Labs</article-title>
          and Workshop: Online Working Notes, CEUR-WS,
          <year>September</year>
          „
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          , G. Pasi,
          <string-name>
            <given-names>H.</given-names>
            <surname>Suominen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Bassani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Brew-Sam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Gonzalez-Saez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. G.</given-names>
            <surname>Upadhyay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mulhem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Seneviratne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Viviani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Consumer health search at clef ehealth 2021, in: CLEF 2021 Evaluation Labs</article-title>
          and Workshop: Online Working Notes, CEUR-WS,
          <year>September 2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jimmy</surname>
          </string-name>
          , G. Zuccon,
          <string-name>
            <given-names>J.</given-names>
            <surname>Palotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <article-title>Overview of the clef 2018 consumer health search task (</article-title>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.-Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          , et al.,
          <article-title>Learning to rank for information retrieval</article-title>
          ,
          <source>Foundations and Trends® in Information Retrieval</source>
          <volume>3</volume>
          (
          <year>2009</year>
          )
          <fpage>225</fpage>
          -
          <lpage>331</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Friedman</surname>
          </string-name>
          ,
          <article-title>Greedy function approximation: a gradient boosting machine</article-title>
          ,
          <source>Annals of statistics</source>
          (
          <year>2001</year>
          )
          <fpage>1189</fpage>
          -
          <lpage>1232</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Breiman</surname>
          </string-name>
          , Random forests,
          <source>Machine learning 45</source>
          (
          <year>2001</year>
          )
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>