=Paper=
{{Paper
|id=Vol-1653/paper_4
|storemode=property
|title=Improving the Efficiency of Retrieval Effectiveness Evaluation: Finding a Few Good Topics with Clustering?
|pdfUrl=https://ceur-ws.org/Vol-1653/paper_4.pdf
|volume=Vol-1653
|authors=Kevin Roitero,Stefano Mizzaro
|dblpUrl=https://dblp.org/rec/conf/iir/RoiteroM16
}}
==Improving the Efficiency of Retrieval Effectiveness Evaluation: Finding a Few Good Topics with Clustering?==
<pdf width="1500px">https://ceur-ws.org/Vol-1653/paper_4.pdf</pdf>
<pre>
         Improving the Efficiency of Retrieval
    Effectiveness Evaluation: Finding a Few Good
               Topics with Clustering?

                        Kevin Roitero1 and Stefano Mizzaro2
    1
      Dept. of Maths, Computer Science, and Physics. University of Udine, Italy
                         kevin.roitero@outlook.com
    2
      Dept. of Maths, Computer Science, and Physics. University of Udine, Italy
                              mizzaro@uniud.it


        Abstract. We consider the issue of using fewer topics in the effectiveness
        evaluation of information retrieval systems. Previous work has shown
        that using fewer topics is theoretically possible; one of the main issues
        that remains to be solved is how to find such a small set of a few good
        topics. To this aim, in this paper we try a novel approach based on
        clustering of topics. We consider various algorithms, metrics, and various
        features of topics that can be helpful in identifying such a set.


1    Introduction
Effectiveness evaluation is of paramount importance in the Information Retrieval
(IR) field. The effectiveness evaluation method commonly used by Text REtrieval
Conference (TREC) and other similar initiatives is based on a set of topics,
which are textual descriptions of information needs, and on computing relevance
assessments for these topics in a pooled set of documents. Participants run their
own retrieval systems on the topics provided, and return a ranked list of the
retrieved documents. Those are then pooled and their relevance is judged by
human assessors. The process of creating these relevance assessments is very
expensive in terms of money (circa $30M from 1999 to 2009 according to [3])
and in terms of human effort; every technique developed for reducing the topic
set size is a significant progress in the efficiency of the whole evaluation process.
    We present a novel approach, based on clustering, to select such a good subset
of topics. This paper is structured as follows. Sect. 2 describes the state of the
art and the research question, Sect. 3 describes the experiments performed. We
conclude in Sect. 4, where we also provide some directions for future work.


2    Background and Research Question
The starting point of our research is shown in Table 1. Each row represents a sys-
tem (or, in TREC terms, run). Each column represents a topic. Each AP(si , tj )
represents the effectiveness of the system si on the topic tj . AP stands for the
Average Precision measure, while MAP stands for Mean AP.
      Table 1: AP and MAP, for n topics and m systems (from [2], Table 1)

                       APs        t1      ···      tn        MAP
                        s1    AP(s1 , t1 ) · · · AP(s1 , tn ) MAP(s1 )
                         ..                 ..                   ..
                          .                    .                  .
                        sm    AP(sm , t1 ) · · · AP(sm , tn ) MAP(sm )


Fig. 1: Best, worst, and average-topic set for AH99 (from [2], Figures 2 and 3)


    Guiver et al. [2] show that a topic set reduction is possible, and that some
topic sets are in fact more informative than others. Starting with a larger set
of topics, it is possible to discover a good-topic subset; that is, a subset which
has a value of correlation of 0.96 for Pearson’s ρ (henceforth, ρ) and a value of
Kendall’s τ (henceforth, τ ) above 0.853 . This behavior is shown in Fig. 1; the
plot shows the ρ and τ values for the three types of topic subsets for each topic
set size: the best/worst, that is the topic subset with the highest/lowest value
of correlation to the full topic set, and the average, that is the average value for
ρ and τ of all the topic subsets considered. This result is found by exhaustive
search (and heuristic search when exhaustive was not feasible): enumerating all
subsets and measuring the correlation of the MAP obtained on those subsets only
with the MAP on the full topic set. Berto et al. [1] find further confirmations
and generalization of these results.
    It is reasonable to conjecture that similar topics will evaluate systems in a
similar way, and therefore they might be redundant: by removing some of the
similar topics the final system evaluation should not change much. When look-
ing for smaller topic subsets a natural strategy is then: (i) to cluster topics into
a set (henceforth we call this set a cluster-set of topics), and (ii) to select a
topic representative of each cluster and form a topic subset (henceforth we call
this set a one-for-cluster -topic subset). Therefore, we inquire into the following
3
    The values are justified on the basis of previous proposals: [4] states that evaluation
    schemes with values τ above 0.9 should be considered equivalent, while there is
    noticeable changes in the ranking with values below 0.8; [1] chose the value in the
    middle, 0.85; values of ρ are higher so they chose 0.96 as its value.
                   Table 2: Composition of each TREC dataset
    Dataset Description                            Number of Number of
                                                      Topics   Systems
    AH99      “Ad Hoc” track from TREC-8                  50        96
    TB06      “Terabyte” track from TREC 2006            149        49
    R04       “Robust” track from TREC 2004              249        82
    MQ07      “Million Query” track from TREC 2007      1153        26


research question (RQ): “Can clustering help to find a good-topic subset?”. Clus-
tering may depend on several aspects; in this paper we show some preliminary
exploration of this space. Table 2 shows the datasets used in these experiments.


3    Clustering of Topics

To answer our research question we build a cluster-set of topics. We use the
columns of Table 1 as vectors, so that each topic is represented with an m-
dimensional vector tk = {AP(s1 , tk ), . . . , AP(sm , tk )}, where AP(si , tj ) repre-
sents the effectiveness of the evaluation of the system si on the topic tj . We use
hierarchical clustering, with complete method to join clusters, and the cosine
distance. Then, in order to compare the one-for-cluster set with the average set
(similar to the comparisons done in [1, 2]), we compare the correlation values of
the one-for-cluster topic subsets and the average-topic subsets, across each topic
set size. Figure 2 shows the results for two datasets (the other two show similar
behavior). In the x axis we find the topic set size (that is equal to the number of
clusters) while in the y axis we find the values of ρ and τ correlations of MAP
values with the full topic set. The one-for-cluster-topic subset does not feature a
higher correlation than the average-topic subset; on the contrary, it usually has
a lower correlation. This can be caused by the fact that experimentally we found
that there is a cluster which has much more topics than the others: this cluster
“attracts” the MAP of the average set, penalizing the one-for-cluster set, which
takes only one item from that cluster. Furthermore, there is a not negligible
overlap between the best and the worst set of topics, as discussed in [2].
    We also tried other methods and features to cluster topics: we considered
other methods to join clusters (i.e., single, average, Ward); we considered an-
other clustering algorithm, k-means; we changed from cosine distance to other
ones (i.e., Euclidean, Manhattan, . . . ). We used also other topics features: we
considered the set formed by the identifiers (and the numbers) of the documents
(and the relevant ones) in the pooled set, the textual description of topics (only
the title field and the full description), plus a set of experiments in which we
considered combinations of all the features and all the parameters. In all these
experiments the one-for-cluster topic subset shows the same correlation values
(or worse) than the average-topic subset.
    These results seem to rule out the possibility of using clustering to select a
few good topics, at least when the columns of Table 1 are used as vectors.
                                                         AH99                                                                                                                       R04


                     1.0


                                                                                                                              1.0
                                                                                ●●●● ●●●●●
                                                                                   ●●●●●   ●●●●● ●●
                                                                                         ●●●●●●●                                                                                                     ●●●
                                                                                                                                                                                                       ●
                                                                                                                                                                                                       ●●●
                                                                                                                                                                                                         ●
                                                                                                                                                                                                         ●●
                                                                                                                                                                                                          ●●
                                                                                                                                                                                                           ●
                                                                                                                                                                                                           ●●
                                                                                                                                                                                                            ●●
                                                                                                                                                                                                             ●
                                                                                                                                                                                                             ●●
                                                                                                                                                                                                              ●●
                                                                                                                                                                                                               ●
                                                                                                                                                                                                               ●●
                                                                                                                                                                                                                ●●
                                                                                                                                                                                                                 ●
                                                                                                                                                                                                                 ●●
                                                                                                                                                                                                                  ●●
                                                                                                                                                                                                                   ●
                                                                                                                                                                                                                   ●●
                                                                                                                                                                                                                    ●●
                                                                                                                                                                                                                     ●
                                                                                                                                                                                                                     ●●
                                                                                                                                                                                                                      ●●
                                                                                                                                                                                                                       ●
                                                                                                                                                                                                                       ●●
                                                                                                                                                                                                                        ●●
                                                                                                                                                                                                                         ●
                                                                                                                                                                                                                         ●●
                                                                                                                                                                                                                          ●●
                                                                                                                                                                                                                           ●
                                                                                                                                                                                                                           ●●
                                                                                                                                                                                                                            ●●
                                                                                                                                                                                                                             ●
                                                                                                                                                                                                                             ●●
                                                                                                                                                                                                                              ●●
                                                                                                                                                                                                                               ●
                                                                                                                                                                                                                               ●●
                                                                                                                                                                                                                                ●●
                                                                                                                                                                                                                                 ●
                                                                                                                                                                                                                                 ●●
                                                                                                                                                                                                                                  ●●
                                                                                                                                                                                                                                   ●
                                                                                                                                                                                                                                   ●●
                                                                                                                                                                                                                                    ●●
                                                                                                                                                                                                                                     ●
                                                                                                                                                                                                                                     ●●
                                                                                                                                                                                                                                      ●●
                                                                                                                                                                                                                                       ●
                                                                                                                                                                                                                                       ●●
                                                                                                                                                                                                                                        ●●
                                                                                                                                                                                                                                         ●
                                                                                                                                                                                                                                         ●●
                                                                                                                                                                                                                                          ●
                                                                    ●●   ●●●
                                                                       ●●●   ●● ●●
                                                                           ●●●                                                                                                ●●
                                                                                                                                                                               ●●
                                                                                                                                                                                ●●
                                                                                                                                                                                 ●
                                                                                                                                                                                 ●●
                                                                                                                                                                                  ●●
                                                                                                                                                                                   ●
                                                                                                                                                                                   ●●
                                                                                                                                                                                    ●●
                                                                                                                                                                                     ●
                                                                                                                                                                                     ●●
                                                                                                                                                                                      ●●
                                                                                                                                                                                       ●
                                                                                                                                                                                       ●●
                                                                                                                                                                                        ●●
                                                                                                                                                                                         ●
                                                                                                                                                                                         ●●
                                                                                                                                                                                          ●●
                                                                                                                                                                                           ●
                                                                                                                                                                                           ●●
                                                                                                                                                                                            ●●
                                                                                                                                                                                             ●
                                                                                                                                                                                             ●●
                                                                                                                                                                                              ●●
                                                                                                                                                                                               ●
                                                                                                                                                                                               ●●
                                                                                                                                                                                                ●●
                                                                                                                                                                                                 ●
                                                                                                                                                                                                 ●
                                                                                                                                                                                                 ●
                                                                                                                                                                                                  ●
                                                                                                                                                                                                  ●
                                                                                                                                                                                                  ●
                                                                                                                                                                                                  ●
                                                                                                                                                                                                   ●
                                                                                                                                                                                                   ●
                                                                                                                                                                                                   ●
                                                                                                                                                                                                   ●
                                                                                                                                                                                                   ●
                                                                                                                                                                                                   ●
                                                                                                                                                                                                    ●
                                                                                                                                                                                                    ●
                                                                                                                                                                                                    ●
                                                                                                                                                                                                    ●
                                                                                                                                                                                                     ●
                                                                                                                                                                                                     ●
                                                                                                                                                                                                     ●
                                                                                                                                                                                                     ●
                                                                                                                                                                                                     ●●
                                                                                                                                                                                                      ●
                                                                                                                                                                                                      ●●
                                                                                                                                                                                                       ●
                                                                                                                                                                                                       ●●
                                                                                                                                                                                                        ●
                                                                                                                                                                                                        ●●
                                                                                                                                                                                                         ● ●
                                                              ●● ●● ● ●●                                                                                              ●●
                                                                                                                                                                       ●
                                                                                                                                                                       ●●
                                                                                                                                                                        ●●
                                                                                                                                                                         ●
                                                                                                                                                                         ●●
                                                                                                                                                                          ●●
                                                                                                                                                                           ●
                                                                                                                                                                           ●●
                                                                                                                                                                            ●●
                                                                                                                                                                             ●
                                                                                                                                                                             ●●●           ●●
                                                                                                                                                                                            ●●
                                                                                                                                                                                             ●
                                                                                                                                                                                             ●●
                                                                                                                                                                                              ●●
                                                                                                                                                                                               ●
                                                                                                                                                                                               ●●
                                                                                                                                                                                                ●●
                                                                                                                                                                                                 ●
                                                   ●●    ●● ●
                                                      ●● ●● ●● ●●●                                                                                             ●●
                                                                                                                                                                ●●
                                                                                                                                                                 ●
                                                                                                                                                                 ●●
                                                                                                                                                                  ●●
                                                                                                                                                                   ●
                                                                                                                                                                   ●●
                                                                                                                                                                    ●●
                                                                                                                                                                     ●
                                                                                                                                                                     ●●     ●   ●
                                                                                                                                                                                ●●
                                                                                                                                                                                 ●
                                                                                                                                                                                 ●●
                                                                                                                                                                                  ●●
                                                                                                                                                                                   ●
                                                                                                                                                                                   ●●
                                                                                                                                                                                    ●●
                                                                                                                                                                                     ●
                                                                                                                                                                                     ●●
                                                                                                                                                                                      ●●
                                                                                                                                                                                       ●
                                                                                                                                                                                       ●●
                                                                                                                                                                                        ●●
                                                                                                                                                                                         ●
                                                                                                                                                                                         ●●
                                                                                                                                                                                          ●●
                                                                                                                                                                                           ●
                                                   ●● ●●                                                                                                   ●●
                                                                                                                                                            ●●
                                                                                                                                                             ●
                                                                                                                                                             ●●
                                                                                                                                                              ●●
                                                                                                                                                               ●           ●
                                                                                                                                                                           ●
                                                                                                                                                                           ●●●
                                                                                                                                                                             ● ●
                                                                                                                                                                               ●
                                                                                                                                                                               ●
                                             ●●                                                                                                           ●
                                                                                                                                                          ●●
                                                                                                                                                           ●             ●●
                                                                                                                                                                          ●  ●●
                                                                                                                                                                              ●
                                          ●● ● ●                                                                                                     ●●
                                                                                                                                                      ●
                                                                                                                                                       ●
                                                                                                                                                       ●
                                                                                                                                                       ●
                                                                                                                                                        ●
                                                                                                                                                        ●●
                                                                                                                                                         ●
                                                                                                                                                         ●           ●
                                                                                                                                                                     ●
                                                                                                                                                                      ●
                                                                                                                                                                      ●●
                                                                                                                                                                       ●
                                                                                                                                                                       ●●
                                                                                                                                                                        ●●
                                                                                                                                                                         ●
                                        ●● ●                                                                                                        ●●
                                                                                                                                                     ●            ●●
                                                                                                                                                                   ●●
                                                                                                                                                                    ●
                                                                                                                                                                     ●
                                      ●   ●                                                                                                        ●
                                                                                                                                                  ●●●          ●
                                                                                                                                                               ●●
                                                                                                                                                                ●●
                                                                                                                                                                 ●
                                                                                                                                                                 ●●●
                                     ● ●●                                                                                                       ●
                                                                                                                                                ●●●            ●
                     0.8
                                   ●●●                                                                                                                        ●


                                                                                                                              0.8
                                  ●●                                                                                                          ●●
                                                                                                                                               ●            ●●
                                                                                                                                               ●        ●
                                                                                                                                                        ●●●●●●
                                ●●
                                                                                                                                             ●
                                                                                                                                             ●
                                                                                                                                              ●       ●●
                                                                                                                                                       ●
                                                                                                                                                       ●●● ●●●
                                ●
                                                                                                                                            ●
                                                                                                                                             ●      ●● ● ●
                                                                                                                                                    ●
                                                                                                                                           ●● ●●●  ●●
                                                                                                                                                   ● ●
                                                                                                                                                     ●
                                                                                                                                               ● ●
                                                                                                                                           ● ●●● ●
Correlation (ρ, τ)


                                                                                                                                           ●


                                                                                                         Correlation (ρ, τ)
                                                                                                                                          ● ●
                                                                                                                                          ●
                                                                                                                                          ● ●
                                                                                                                                         ● ●●
                     0.6


                                                                                                                              0.6
                                                                                                                                           ●
                                                                                                                                         ● ●
                                                                                                                                        ●●●●
                                                                                                                                            ●
                                                                                                                                        ●●●
                     0.4


                                                                                                                              0.4
                     0.2


                                                                                                                              0.2
                                                                    ●     Kendall's τ: average
                                                                    ●     Kendall's τ: one per cluster
                                                                          Pearson's ρ: average
                                                                          Pearson's ρ: one per cluster
                     0.0


                                                                                                                              0.0
                            0         10               20            30             40            50                                0                   50                  100                 150                 200                 250

                                             Topic Set Size, Cardinality                                                                                          Topic Set Size, Cardinality


                                            Fig. 2: ρ and τ for one-for-cluster and average set


   4                       Conclusions and Future Work
   In summary, we did not find any evidence supporting our research question
   clustering does not seem usable to find a good-topic subset. Furthermore, we
   believe we provided convincing evidence that our results are independent from
   the parameters used to form the cluster-set or from the topic features used. We
   started from the hypothesis that clustering can be used to chose a subset of a
   few informative topics in the evaluation of information retrieval effectiveness. We
   believe that we have provided convincing evidence that performing clustering is
   not effective in the choice of a good-topic subset, at least when straightforward
   clustering strategies are used. It might be that more complex approaches, that
   we intend to try in the future, provide better results: for example, we did not
   yet try dimensionality reduction.


   Bibliography
   [1] A. Berto, S. Mizzaro, and S. Robertson. On using fewer topics in information
       retrieval evaluations. In Proceedings of the 2013 Conference on the Theory
       of Information Retrieval, pages 30–37. ACM, 2013.
   [2] J. Guiver, S. Mizzaro, and S. Robertson. A few good topics: Experiments in
       topic set reduction for retrieval evaluation. ACM Transactions on Informa-
       tion Systems (TOIS), 27(4):1–26, 2009.
   [3] G. Tassey, B. R. Rowe, D. W. Wood, A. N. Link, and D. A. Simoni. Economic
       impact assessment of NIST’s text retrieval conference (TREC) program. Re-
       port prepared for National Institute of Technology (NIST), 2010.
   [4] E. M. Voorhees. Evaluation by highly relevant documents. In Proceedings
       of the 24th annual international ACM SIGIR conference on Research and
       development in information retrieval, pages 74–82. ACM, 2001.

</pre>