=Paper=
{{Paper
|id=None
|storemode=property
|title=Cumulated Relative Position: A Metric for Ranking Evaluation
|pdfUrl=https://ceur-ws.org/Vol-964/paper9.pdf
|volume=Vol-964
|dblpUrl=https://dblp.org/rec/conf/iir/AngeliniFJKPSS13
}}
==Cumulated Relative Position: A Metric for Ranking Evaluation==
Cumulated Relative Position: A Metric for Ranking Evaluation (Extended Abstract)? Marco Angelini3 , Nicola Ferro1 , Kalervo Järvelin2 , Heikki Keskustalo2 , Ari Pirkola2 , Giuseppe Santucci3 , and Gianmaria Silvello1 1 University of Padua, Italy {ferro,silvello}@dei.unipd.it 2 University of Tampere, Finland {kalervo.jarvelin,heikki.keskustalo,ari.pirkola}@uta.fi 3 “La Sapienza” University of Rome, Italy {angelini,santucci}@dis.uniroma1.it Abstract. The development of multilingual and multimedia informa- tion access systems calls for proper evaluation methodologies to ensure that they meet the expected user requirements and provide the desired effectiveness. In this paper, we propose a new metric for ranking evalu- ation, the CRP. 1 Introduction and Motivations The development of information access systems calls for proper evaluation method- ologies in particular for what is concerned with the evaluation of rankings. A range of evaluation metrics, such as MAP and nDCG, are widely used and they are particularly suitable to the evaluation of Information Retrieval (IR) tech- niques in terms of the quality of the output ranked lists, and often to some degree suitable to the evaluation of user experience regarding retrieval. Unfor- tunately, the traditional metrics do not take deviations from optimal document ranking sufficiently into account. We think that a proper evaluation metric for ranked result lists in IR should: (a) explicitly handle graded relevance including negative gains for unhelpful documents, and (b) explicitly take into account doc- ument misplacements in ranking either too early or too late given their degree of relevance and the optimal ranking. In the present paper, we propose such a new evaluation metric, the Cumulated Relative Position (CRP). We start with the observation that a document of a given degree of relevance may be ranked too early or too late regarding the ideal ranking of documents for a query. Its relative position may be negative, indicating too early ranking, zero indicating correct ranking, or positive, indicating too late ranking. By cu- mulating these relative rankings we indicate, at each ranked position, the net effect of document displacements, the CRP. CRP explicitly handles: (a) graded ? The extended version of this abstract has been published in [1]. relevance, and (b) document misplacements either too early or too late given their degree of relevance and the ideal ranking. Thereby, CRP offers several ad- vantages in IR evaluation: (i) at any number of retrieved documents examined (rank) for a given query, it is obvious to interpret and it gives an estimate of ranking performance; (ii) it is not dependent on outliers since it focuses on the ranking of the result list; (iii) it is directly user-oriented in reporting the devia- tion from ideal ranking when examining a given number of documents; the effort wasted in examining a suboptimal ranking is made explicit. 2 Definition of Cumulated Relative Position We define the set of relevance degrees as (REL, ≤) such that there is an order between the elements of REL. For example, for the set REL = {nr, pr, fr, hr}, nr stands for “non relevant”, pr for “partially relevant”, fr for “fairly relevant”, hr stands for “highly relevant”, and it holds nr ≤ pr ≤ fr ≤ hr. We define a function RW : REL → Z as a monotonic function which maps each relevance degree (rel ∈ REL) into an relevance weight (wrel ∈ Z), e.g. RW(hr) = 3. This function allows us to associate an integer number to a rele- vance degree. We define with D the set of documents we take into account, with N ∈ N a natural number, and with DN the set of all possible vectors of length N containing different orderings of the documents in D. We can also say that a vector in DN represents a ranking list of length N of the documents D retrieved by an IR system. Let us consider a vector v ∈ DN , a natural number j ∈ [1, N ], and a relevance degree rel ∈ REL, then the ground truth function is defined as: GT :DN × N → REL (1) v[j] 7→ rel Equation 1 allows us to associate a relevance degree to the document d ∈ D retrieved at position j of the vector v, i.e. it associates a relevance judgment to each retrieved document in a ranked list. In the following, we define with r ∈ DN the vector of documents retrieved and ranked by a run r, with i ∈ DN the ideal vector containing the best ranking of the documents in the pool (e.g. all highly relevant documents are grouped together in the beginning of the vector followed by fairly relevant ones and so on and so forth), and with w ∈ DN the worst-case vector containing the worst rank of the documents retrieved by the pool (e.g. all the relevant documents are put in the end of the vector in the inverse relevance order). From function GT we can point out a set called relevance support defined as: RS(v, rel) = {j ∈ [1, N ] | GT(v, j) = rel} (2) N which, given a vector v ∈ D – it can be a run vector r, the ideal vector i, or the worst-case vector w – and a relevance degree rel, contains the indexes j of the documents of v with which the given relevance degree (rel) relevance is associated. Given the ideal vector i and a relevance degree rel, we can define the mini- mum rank in i as the first position in which we find a document with relevance degree equal to rel. In the same way, we can define the maximum rank in i as the last position in which we find a document with relevance degree equal to rel. In formulas, they become: mini (rel) = min RS(i, rel) (3) maxi (rel) = max RS(i, rel) Given a vector v and a document at position j ∈ [1, N ], we can define the Relative Position (RP) as: 0 if mini GT(v, j) ≤ j ≤ maxi GT(v, j) RP(v, j) = j − mini (GT v, j) if j < mini GT(v, j) (4) j − maxi (GT v, j) if j > maxi GT(v, j) RP allows for pointing out misplaced documents and understanding how much they are misplaced with respect to the ideal case i . Zero values denote documents which are within the ideal interval, positive values denote documents which are ranked below their ideal interval, and negative values denote docu- ments which are above their ideal interval. Note that the greater the absolute value of RP(v, j) is, the bigger is the distance of the document at position j from its ideal interval. From equation 4, it follows that RP(i, j) = 0, ∀j ∈ [1, N ]. Given a vector v and a document at position j ∈ [1, N ], we can define the Cumulated Relative Position (CRP) as: j X CRP(v, j) = RP(v, k) (5) k=1 For each position j, CRP sums the values of RP up to position j included. From equation 5, it follows that CRP(i, j) = 0, ∀j ∈ [1, N ]. We can point out the following properties for CRP: – CRP can only be zero or negative before reaching the rank of the recall base (R); – the faster the CRP curve goes down before R, the worse the run is; – after R the CRP curve is non-decreasing; – after that the last relevant document has been encountered, CRP remains constant; – the sooner we reach the x-axis (balance point: br ), the better the run is. In Figure 1 we can see a sketch of the CRP for a topic of a run. For a given topic there are two fixed values which are the rank of recall base (R) and the 4 Cumulated Relative Position − Run APL985LC−360.txt; Topic 360 x 10 1.5 CRP min R br N 1 Cumulated Relative Position 0.5 Ideal Rank 0 CRP(r, R) Run −0.5 CRP(w, R) Worst −1 0 100 200 300 400 500 600 700 800 900 1000 Rank N = Number of Retrieved Documents (fixed) R = Rank of the Recall Base (fixed) br = Rank of the Balance Point of the run CRP(r, R) = Loss Value of the run (CRP@R) CRP(w, R) = Loss Value of the worst-case min = Rank of the Turn-Around Point of the run Fig. 1. Cumulative Relative Position sketch for a topic of a given run (on the left) and the CRP curve of a real run taken from TREC7. number of retrieved documents (N ); this allows us to compare systems on the R basis. The principal indicator describing the CRP curve of a topic for a given run which is the recovery value (ρ) defined as the ratio between R and br : ρ = bRr . The recovery-value is always between 0 and 1 (0 < ρ ≤ 1) where ρ = 1 indicates a perfect ranking and ρ → 0 a progressively worse ranking. Please note that ρ → 0 when br → ∞. 3 Final Remarks We think that the CRP offers several advantages in IR evaluation because (a) it is obvious to interpret and it gives an estimate of ranking performance as a single measure; (b) it is independent on outliers since it focuses on the ranking of the result list; (c) it directly reports the effort wasted in examining suboptimal rankings; (d) it is based on graded relevance. Acknowledgements The work reported in this paper has been supported by the PROMISE network of excellence (contract n. 258191) project as a part of the 7th Framework Program of the European commission (FP7/2007-2013). References 1. M. Angelini, N. Ferro, K. Järvelin, H. Keskustalo, A. Pirkola, G. Santucci, and G. Silvello. Cumulated Relative Position: A Metric for Ranking Evaluation. In In- formation Access Evaluation meets Multilinguality, Multimodality, and Visual An- alytics. Proc. of the 3rd Int. Conf. of the CLEF Initiative (CLEF 2012). Lecture Notes in Computer Science (LNCS) 7488, Springer, Heidelberg, Germany, 2012.