CCS CONCEPTS

October

Ranking Learning-to-Rank Methods∗

Djoerd Hiemstra

hiemstra@cs.utwente.nl 2

Niek Tax

n.tax@tue.nl 0

Sander Bockting

bockting.sander@kpmg.nl 1 0 Eindhoven University of Technology , Eindhoven , The Netherlands 1 KPMG Netherlands , Amstelveen , The Netherlands 2 University of Twente , Enschede , The Netherlands

2017

1 2017

We present a cross-benchmark comparison of learning-to-rank methods using two evaluation measures: the Normalized Winning Number and the Ideal Winning Number. Evaluation results of 87 learning-to-rank methods on 20 datasets show that ListNet, SmoothRank, FenchelRank, FSMRank, LRUF and LARF are Pareto optimal learning-to-rank methods, listed in increasing order of Normalized Winning Number and decreasing order of Ideal Winning Number.

CCS CONCEPTS

• Information systems → Learning to rank;

INTRODUCTION

Like most information retrieval methods, learning-to-rank methods are evaluated on benchmark datasets, such as the many datasets provided by Microsoft and the datasets provided by Yahoo and Yandex. These learning-to-rank datasets ofer feature set representations of the to-be-ranked documents instead of the documents themselves. Therefore, any diference in ranking performance is due to the ranking algorithm and not the features used. This opens up a unique opportunity for cross-benchmark comparison of learning-to-rank methods. In this paper, we compare learning to rank methods based on a sparse set of evaluation results on many benchmark datasets. ∗The full version of this work was published by Tax, Bockting and Hiemstra [1].

The Normalized Winning Number is the Winning Number divided by the Ideal Winning Number. The Normalized Winning Number gives insight in the ranking accuracy of the learning to rank method. The Ideal Winning Number gives insight in the degree of certainty concerning the ranking accuracy. We report the best performing methods by Normalized Winning Number and Ideal Winner Number. 3

RESULTS

The figure shows that LRUF beats almost all other methods with an Ideal Winning Number of almost 500 measures and datasets. If we move to the right of the figure, we increase our confidence in the results. That is, we are more confident about the results of ListNet as its Ideal Winning Number is close to 1000 measures and datasets. However, ListNet is outperformed on half, so about 500, of the datasets and measures.