=Paper= {{Paper |id=None |storemode=property |title=Cumulated Relative Position: A Metric for Ranking Evaluation |pdfUrl=https://ceur-ws.org/Vol-964/paper9.pdf |volume=Vol-964 |dblpUrl=https://dblp.org/rec/conf/iir/AngeliniFJKPSS13 }} ==Cumulated Relative Position: A Metric for Ranking Evaluation== https://ceur-ws.org/Vol-964/paper9.pdf
      Cumulated Relative Position: A Metric for
      Ranking Evaluation (Extended Abstract)?

    Marco Angelini3 , Nicola Ferro1 , Kalervo Järvelin2 , Heikki Keskustalo2 , Ari
            Pirkola2 , Giuseppe Santucci3 , and Gianmaria Silvello1
                              1
                             University of Padua, Italy
                         {ferro,silvello}@dei.unipd.it
                         2
                           University of Tampere, Finland
            {kalervo.jarvelin,heikki.keskustalo,ari.pirkola}@uta.fi
                     3
                       “La Sapienza” University of Rome, Italy
                      {angelini,santucci}@dis.uniroma1.it



        Abstract. The development of multilingual and multimedia informa-
        tion access systems calls for proper evaluation methodologies to ensure
        that they meet the expected user requirements and provide the desired
        effectiveness. In this paper, we propose a new metric for ranking evalu-
        ation, the CRP.




1     Introduction and Motivations

The development of information access systems calls for proper evaluation method-
ologies in particular for what is concerned with the evaluation of rankings. A
range of evaluation metrics, such as MAP and nDCG, are widely used and they
are particularly suitable to the evaluation of Information Retrieval (IR) tech-
niques in terms of the quality of the output ranked lists, and often to some
degree suitable to the evaluation of user experience regarding retrieval. Unfor-
tunately, the traditional metrics do not take deviations from optimal document
ranking sufficiently into account. We think that a proper evaluation metric for
ranked result lists in IR should: (a) explicitly handle graded relevance including
negative gains for unhelpful documents, and (b) explicitly take into account doc-
ument misplacements in ranking either too early or too late given their degree
of relevance and the optimal ranking. In the present paper, we propose such a
new evaluation metric, the Cumulated Relative Position (CRP).
    We start with the observation that a document of a given degree of relevance
may be ranked too early or too late regarding the ideal ranking of documents
for a query. Its relative position may be negative, indicating too early ranking,
zero indicating correct ranking, or positive, indicating too late ranking. By cu-
mulating these relative rankings we indicate, at each ranked position, the net
effect of document displacements, the CRP. CRP explicitly handles: (a) graded
?
    The extended version of this abstract has been published in [1].
relevance, and (b) document misplacements either too early or too late given
their degree of relevance and the ideal ranking. Thereby, CRP offers several ad-
vantages in IR evaluation: (i) at any number of retrieved documents examined
(rank) for a given query, it is obvious to interpret and it gives an estimate of
ranking performance; (ii) it is not dependent on outliers since it focuses on the
ranking of the result list; (iii) it is directly user-oriented in reporting the devia-
tion from ideal ranking when examining a given number of documents; the effort
wasted in examining a suboptimal ranking is made explicit.


2    Definition of Cumulated Relative Position

We define the set of relevance degrees as (REL, ≤) such that there is an order
between the elements of REL. For example, for the set REL = {nr, pr, fr, hr},
nr stands for “non relevant”, pr for “partially relevant”, fr for “fairly relevant”,
hr stands for “highly relevant”, and it holds nr ≤ pr ≤ fr ≤ hr.
   We define a function RW : REL → Z as a monotonic function which maps
each relevance degree (rel ∈ REL) into an relevance weight (wrel ∈ Z), e.g.
RW(hr) = 3. This function allows us to associate an integer number to a rele-
vance degree.
   We define with D the set of documents we take into account, with N ∈ N
a natural number, and with DN the set of all possible vectors of length N
containing different orderings of the documents in D. We can also say that a
vector in DN represents a ranking list of length N of the documents D retrieved
by an IR system. Let us consider a vector v ∈ DN , a natural number j ∈ [1, N ],
and a relevance degree rel ∈ REL, then the ground truth function is defined as:


                               GT :DN × N → REL
                                                                                  (1)
                                      v[j] 7→ rel

    Equation 1 allows us to associate a relevance degree to the document d ∈ D
retrieved at position j of the vector v, i.e. it associates a relevance judgment to
each retrieved document in a ranked list.
    In the following, we define with r ∈ DN the vector of documents retrieved
and ranked by a run r, with i ∈ DN the ideal vector containing the best ranking
of the documents in the pool (e.g. all highly relevant documents are grouped
together in the beginning of the vector followed by fairly relevant ones and so
on and so forth), and with w ∈ DN the worst-case vector containing the worst
rank of the documents retrieved by the pool (e.g. all the relevant documents are
put in the end of the vector in the inverse relevance order).
    From function GT we can point out a set called relevance support defined as:

                    RS(v, rel) = {j ∈ [1, N ] | GT(v, j) = rel}                   (2)
                                  N
    which, given a vector v ∈ D – it can be a run vector r, the ideal vector i,
or the worst-case vector w – and a relevance degree rel, contains the indexes j
 of the documents of v with which the given relevance degree (rel) relevance is
 associated.
     Given the ideal vector i and a relevance degree rel, we can define the mini-
 mum rank in i as the first position in which we find a document with relevance
 degree equal to rel. In the same way, we can define the maximum rank in i as
 the last position in which we find a document with relevance degree equal to rel.
 In formulas, they become:
                                                       
                            mini (rel) = min RS(i, rel)
                                                                              (3)
                            maxi (rel) = max RS(i, rel)
    Given a vector v and a document at position j ∈ [1, N ], we can define the
 Relative Position (RP) as:

                                                                   
          0
          
                             
                                if mini GT(v, j) ≤ j ≤ maxi GT(v, j)
                                                    
RP(v, j) = j − mini (GT v, j)   if j < mini GT(v, j)                   (4)
                                                   
           j − maxi (GT v, j) if j > maxi GT(v, j)
          

    RP allows for pointing out misplaced documents and understanding how
 much they are misplaced with respect to the ideal case i . Zero values denote
 documents which are within the ideal interval, positive values denote documents
 which are ranked below their ideal interval, and negative values denote docu-
 ments which are above their ideal interval. Note that the greater the absolute
 value of RP(v, j) is, the bigger is the distance of the document at position j
 from its ideal interval. From equation 4, it follows that RP(i, j) = 0, ∀j ∈ [1, N ].
    Given a vector v and a document at position j ∈ [1, N ], we can define the
 Cumulated Relative Position (CRP) as:

                                           j
                                           X
                             CRP(v, j) =         RP(v, k)                         (5)
                                           k=1

    For each position j, CRP sums the values of RP up to position j included.
 From equation 5, it follows that CRP(i, j) = 0, ∀j ∈ [1, N ].
    We can point out the following properties for CRP:

  – CRP can only be zero or negative before reaching the rank of the recall base
    (R);
  – the faster the CRP curve goes down before R, the worse the run is;
  – after R the CRP curve is non-decreasing;
  – after that the last relevant document has been encountered, CRP remains
    constant;
  – the sooner we reach the x-axis (balance point: br ), the better the run is.

    In Figure 1 we can see a sketch of the CRP for a topic of a run. For a given
 topic there are two fixed values which are the rank of recall base (R) and the
                                                                                                                   4         Cumulated Relative Position − Run APL985LC−360.txt; Topic 360
                                                                                                               x 10
                                                                                                         1.5
            CRP           min R                   br          N

                                                                                                          1




                                                                          Cumulated Relative Position
                                                                                                         0.5

                  Ideal
                                                                   Rank

                                                                                                          0

    CRP(r, R)
                                     Run
                                                                                                        −0.5



 CRP(w, R)
                                       Worst
                                                                                                         −1
                                                                                                               0       100   200      300       400     500       600       700      800     900   1000
                                                                                                                                                        Rank




                  N = Number of Retrieved Documents (fixed)
                  R = Rank of the Recall Base (fixed)
                  br = Rank of the Balance Point of the run
                  CRP(r, R) = Loss Value of the run (CRP@R)
                  CRP(w, R) = Loss Value of the worst-case
                  min = Rank of the Turn-Around Point of the run



Fig. 1. Cumulative Relative Position sketch for a topic of a given run (on the left) and
the CRP curve of a real run taken from TREC7.


number of retrieved documents (N ); this allows us to compare systems on the
R basis.
   The principal indicator describing the CRP curve of a topic for a given run
which is the recovery value (ρ) defined as the ratio between R and br : ρ = bRr .
   The recovery-value is always between 0 and 1 (0 < ρ ≤ 1) where ρ = 1
indicates a perfect ranking and ρ → 0 a progressively worse ranking. Please note
that ρ → 0 when br → ∞.

3       Final Remarks
We think that the CRP offers several advantages in IR evaluation because (a)
it is obvious to interpret and it gives an estimate of ranking performance as a
single measure; (b) it is independent on outliers since it focuses on the ranking of
the result list; (c) it directly reports the effort wasted in examining suboptimal
rankings; (d) it is based on graded relevance.

Acknowledgements The work reported in this paper has been supported by
the PROMISE network of excellence (contract n. 258191) project as a part of
the 7th Framework Program of the European commission (FP7/2007-2013).

References
1. M. Angelini, N. Ferro, K. Järvelin, H. Keskustalo, A. Pirkola, G. Santucci, and
   G. Silvello. Cumulated Relative Position: A Metric for Ranking Evaluation. In In-
   formation Access Evaluation meets Multilinguality, Multimodality, and Visual An-
   alytics. Proc. of the 3rd Int. Conf. of the CLEF Initiative (CLEF 2012). Lecture
   Notes in Computer Science (LNCS) 7488, Springer, Heidelberg, Germany, 2012.