Introduction

Living Ranking: from online to real-time information retrieval evaluation

Lamjed Ben Jabeur

Laure Soulier

Paul Mousset

Lynda Tamine

Toulous

Route Narbonne

Toulouse

France

UPMC Univ Paris

Paris

jabeur

mousset

tamine}@irit.fr

laure.soulier@lip

The Living Labs for Information Retrieval Evaluation (LL4IR) initiative have provided a novel framework for evaluating retrieval models that involve real users. In this position paper, we propose an extension to the LL4IR framework that enables to evaluate real-time IR.

Introduction Extending LL4IR with Living Ranking

The \Living Ranking" is an extension component pluggable to LL4IR framework, as illustrated in Figure ref g:ll4R-extention. In contrast to o ine rankings that must be provided in advance through LL4IR participant's API, the new component stands as a new source that provides API with rankings generated on-the- y for each online submitted query while maintaining the initial framework over ow. Experimental c

PSPayarstritciecipimpaanntt r

A A r Living Ranking

To do so, participants must provide a ranking algorithm which may be executed online via the Living Ranking component. Ranking algorithms provided by participants may respect a standard interface with well-de ned input and output formats. For instance, the format of the rankings issued from the Living Ranking component could be structured as the one currently required to participants. We outline that this architecture allows to restrict the visibility of real-time submitted queries and eventually of documents, which avoid bias in the algorithm design and gives more credibility to evaluation results. However, this component might be resource-consuming. One solution could be to execute ranking algorithms on demand, for instance when changes occurred on the result set. Such on-demand strategy may balance between e ciency and e ectiveness.

The integration of the Living Ranking component within the LL4IR framework suggests some changes or brings further enhancements detailed below: - Framework architecture: Living Ranking should o er a exible interface so participants can easily implement their algorithm without requiring complex infrastructure for all tiers. We suggest implementing algorithms in sandbox-based scripts (i.e., JavaScript) that support online execution under strict constraints.

- Challenge Organization: Since test queries and produced rankings may not be visible, we suggest to introduce a debugging phase with simulated queries, standing before uploading ranking algorithms. This would help participants to validate the e ectiveness and e ciency of their algorithms.

- Evaluation Metric: Living Ranking components allow to produce additional evaluation metrics in terms of algorithm computation resources (e.g., execution time and used memory). Although this type of metric is not commonly used in IR, we think that such metrics are relevant for evaluating real-time IR models. 3 Conclusion We propose in this paper to extend the LL4IR framework with a Living Ranking component in the aim of providing an evaluation framework for real-time ranking. This approach may add some technical complexity. We are also aware of the additional e orts to be deployed for benchmark organization but we believe that the proposed extension would open LL4IR to other retrieval tasks that attract a lot of interest in IR community, namely real-time search tasks. References

[1]

Kille ,

Lommatzsch ,

Turrin , A. ,

Larson ,

Brodt ,

Seiler , and

Hopfgartner . Stream-based recommendations: Online and o ine evaluation as a service . In CLEF 2015 , 2015 .

[2]

Schuth ,

Balog , and

Kelly . Overview of the living labs for information retrieval evaluation (ll4ir) clef lab 2015 . In CLEF 2015 , 2015 .