A Comprehensive Dataset for Modern Learning to Rank Solutions (Abstract)

Domenico Dato

Sean MacAvaney

0 2

Franco Maria Nardini

Rafaele Perego

Nicola Tonellotto

1 2

Istella

Italy

ISTI-CNR

Italy

2 0 University of Glasgow , UK 1 University of Pisa , Italy 2 [1] D. Dato , S. MacAvaney, F. M. Nardini, R. Perego, N. Tonellotto , The Istella22 Dataset: Bridging Traditional and Neural Learning to Rank Evaluation , in: Proc. ACM SIGIR, 2022

In recent years, interest in neural Learning-to-Rank (LtR) approaches based on pre-trained language models has grown. These techniques have been demonstrated to be very efective at various ranking tasks, such as question answering and ad-hoc document ranking. The main reason for this success is the ability of deep neural networks to understand complex language patterns and learn to extract efective features from text. In the same time frame, feature-based LtR methods reached maturity, and research on this area focused primarily on specific aspects such as eficiency or diversification. These two research areas progressed almost entirely disjointly and the efectiveness of neural LtR approaches compared to traditional feature-based LtR methods has not yet been well-established. A major reason that left the two areas well separated is the lack of publicly-available datasets enabling a direct comparison. LtR datasets providing query-document feature vectors do not contain the raw query and document text while the benchmarks often used for evaluating neural models, e.g., MS-MARCO, TREC Robust, etc., provide text but do not provide query-document feature vectors. In this presentation, we introduce Istella22, a new dataset that enables such comparisons by providing both query/document text and strong query-document feature vectors used by an industrial search engine. The dataset, detailed in a resource paper that will be presented at ACM SIGIR 2022 [1], consists of a comprehensive corpus of 8.4M web documents, a collection of query-document pairs including 220 hand-crafted features, relevance judgments on a 5-graded scale, and a set of 2,198 textual queries used for testing purposes. Istella22 enables a fair evaluation of traditional learning-to-rank and transfer ranking techniques on the same data. LtR models exploit the feature-based representations of training samples while pre-trained transformer-based neural rankers can be evaluated on the corresponding textual content of queries and documents. Through preliminary experiments on Istella22, we find that neural re-ranking approaches lag behind LtR models in terms of efectiveness. However, LtR models identify the scores from neural models as strong signals.