<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>October</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Ranking Learning-to-Rank Methods∗</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Djoerd Hiemstra</string-name>
          <email>hiemstra@cs.utwente.nl</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Niek Tax</string-name>
          <email>n.tax@tue.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sander Bockting</string-name>
          <email>bockting.sander@kpmg.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Eindhoven University of Technology</institution>
          ,
          <addr-line>Eindhoven</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>KPMG Netherlands</institution>
          ,
          <addr-line>Amstelveen</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Twente</institution>
          ,
          <addr-line>Enschede</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <volume>1</volume>
      <issue>2017</issue>
      <abstract>
        <p>We present a cross-benchmark comparison of learning-to-rank methods using two evaluation measures: the Normalized Winning Number and the Ideal Winning Number. Evaluation results of 87 learning-to-rank methods on 20 datasets show that ListNet, SmoothRank, FenchelRank, FSMRank, LRUF and LARF are Pareto optimal learning-to-rank methods, listed in increasing order of Normalized Winning Number and decreasing order of Ideal Winning Number.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Information systems → Learning to rank;</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>Like most information retrieval methods, learning-to-rank methods
are evaluated on benchmark datasets, such as the many datasets
provided by Microsoft and the datasets provided by Yahoo and Yandex.
These learning-to-rank datasets ofer feature set representations of
the to-be-ranked documents instead of the documents themselves.
Therefore, any diference in ranking performance is due to the
ranking algorithm and not the features used. This opens up a unique
opportunity for cross-benchmark comparison of learning-to-rank
methods. In this paper, we compare learning to rank methods based
on a sparse set of evaluation results on many benchmark datasets.
∗The full version of this work was published by Tax, Bockting and Hiemstra [1].</p>
      <p>The Normalized Winning Number is the Winning Number divided
by the Ideal Winning Number. The Normalized Winning
Number gives insight in the ranking accuracy of the learning to rank
method. The Ideal Winning Number gives insight in the degree
of certainty concerning the ranking accuracy. We report the best
performing methods by Normalized Winning Number and Ideal
Winner Number.
3</p>
    </sec>
    <sec id="sec-3">
      <title>RESULTS</title>
      <p>The figure shows that LRUF beats almost all other methods with
an Ideal Winning Number of almost 500 measures and datasets.
If we move to the right of the figure, we increase our confidence
in the results. That is, we are more confident about the results of
ListNet as its Ideal Winning Number is close to 1000 measures and
datasets. However, ListNet is outperformed on half, so about 500,
of the datasets and measures.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>