X (S. Mizzaro);

When Truncated Rankings Are Better and How to Measure That - Abstract⋆

Enrique Amigó

Stefano Mizzaro

Damiano Spina

UNED NLP

IR Group

Madrid

Spain

0 RMIT University , Melbourne , Australia 1 University of Udine , Italy

2022

000 0 0001

In this work we provide both theoretical and experimental contributions for the truncated ranking evaluation, where systems have a stopping criteria to truncate the ranking at the right position to avoid retrieving those irrelevant documents at the end. We first define formal properties to analyze how efectiveness metrics behave when evaluating truncated rankings. Our theoretical analysis shows that de-facto standard metrics do not satisfy desirable properties to evaluate truncated rankings: only Observational Information Efectiveness (OIE) - a metric based on Shannon's information theory satisfies them all. We then perform experiments to compare several metrics on nine TREC data sets. According to our experimental results, the most appropriate metrics for truncated rankings are OIE and a novel extension of Rank-Biased Precision that adds a user efort factor penalizing the retrieval of irrelevant documents.

eol>Information Retrieval Evaluation Evaluation measures Ranking Cutof