=Paper=
{{Paper
|id=Vol-2007/LEARNER2017_keynote_1
|storemode=property
|title=Ranking Learning-to-Rank Methods
|pdfUrl=https://ceur-ws.org/Vol-2007/LEARNER2017_keynote_1.pdf
|volume=Vol-2007
|authors=Djoerd Hiemstra,Niek Tax,Sander Bockting
|dblpUrl=https://dblp.org/rec/conf/ictir/HiemstraTB17
}}
==Ranking Learning-to-Rank Methods==
Ranking Learning-to-Rank Methods∗ Djoerd Hiemstra Niek Tax Sander Bockting University of Twente Eindhoven University of Technology KPMG Netherlands Enschede, The Netherlands Eindhoven, The Netherlands Amstelveen, The Netherlands hiemstra@cs.utwente.nl n.tax@tue.nl bockting.sander@kpmg.nl ABSTRACT The Normalized Winning Number is the Winning Number divided We present a cross-benchmark comparison of learning-to-rank by the Ideal Winning Number. The Normalized Winning Num- methods using two evaluation measures: the Normalized Win- ber gives insight in the ranking accuracy of the learning to rank ning Number and the Ideal Winning Number. Evaluation results method. The Ideal Winning Number gives insight in the degree of 87 learning-to-rank methods on 20 datasets show that ListNet, of certainty concerning the ranking accuracy. We report the best SmoothRank, FenchelRank, FSMRank, LRUF and LARF are Pareto performing methods by Normalized Winning Number and Ideal optimal learning-to-rank methods, listed in increasing order of Nor- Winner Number. malized Winning Number and decreasing order of Ideal Winning Number. 3 RESULTS Figure 1 shows the Normalized Winning Number as function of the CCS CONCEPTS Ideal Winning Number for 87 learning-to-rank methods over 20 • Information systems → Learning to rank; datasets and all investigated evaluation measures: Mean Average Precision and nDCG at 3, 5, 10. The figure labels the Pareto optimal 1 INTRODUCTION algorithms and also the Rank-2 Pareto optima in a smaller font, which are the labels of the algorithms with exactly one algorithm Like most information retrieval methods, learning-to-rank methods having a higher value on both axes. In addition, Linear Regression are evaluated on benchmark datasets, such as the many datasets pro- and the ranking method of simply sorting on the best single feature vided by Microsoft and the datasets provided by Yahoo and Yandex. are labeled as baselines. These learning-to-rank datasets offer feature set representations of the to-be-ranked documents instead of the documents themselves. Therefore, any difference in ranking performance is due to the rank- ing algorithm and not the features used. This opens up a unique opportunity for cross-benchmark comparison of learning-to-rank methods. In this paper, we compare learning to rank methods based on a sparse set of evaluation results on many benchmark datasets. 2 DATASETS AND METHODS Evaluation results of 87 learning-to-rank methods on 20 well-known benchmark datasets are collected using a systematic literature re- view [1]. We included papers that report the mean average precision Figure 1: Winning numbers of 87 learning to rank methods. or nDCG at 3, 5 or 10 documents retrieved. Papers that used differ- ent or additional features, or that reported no baseline performance The figure shows that LRUF beats almost all other methods with that allowed us to check validity of the results, were excluded from an Ideal Winning Number of almost 500 measures and datasets. the analysis. If we move to the right of the figure, we increase our confidence The Winning Number of a learning-to-rank method is defined in the results. That is, we are more confident about the results of as the number of other methods that a method beats over the set of ListNet as its Ideal Winning Number is close to 1000 measures and datasets. So, a method with a high Winning Number beats many datasets. However, ListNet is outperformed on half, so about 500, other methods on many datasets. For every method, we find a dif- of the datasets and measures. ferent set of datasets on which the method was evaluated. The Ideal Winning Number is the maximum Winning Number that 4 CONCLUSION the method can achieve on all datasets on which it was evaluated. Based on a cross-benchmark comparison of 87 learning-to-rank methods on 20 datasets, we conclude that ListNet, SmoothRank, ∗ The full version of this work was published by Tax, Bockting and Hiemstra [1]. FenchelRank, FSMRank, LRUF and LARF are Pareto optimal learning- to-rank methods, listed in increasing order of Normalized Winning LEARNER’17, October 1, 2017, Amsterdam, The Netherlands Number and decreasing order of Ideal Winning Number [1]. Copyright ©2017 for this paper by its authors. Copying permitted for private and academic purposes. REFERENCES [1] Niek Tax, Sander Bockting, and Djoerd Hiemstra. 2015. A cross-benchmark comparison of 87 learning to rank methods. Information processing & management 51, 6 (2015), 757–772. (Awarded IPM Best Paper of 2015)