1. Introduction

SEBD

A Dimension Importance Estimation-based Framework for Query Performance Prediction

Guglielmo Faggioli

Nicola Ferro

Rafaele Perego

Nicola Tonellotto

0 0 3University of Pisa , Pisa , Italy 1 ISTI-CNR , Pisa , Italy 2 University of Padua , Padua , Italy

2025

33 16 19

Recent developments in the dense Information Retrieval (IR) domain have shown the ties between the links between the latent dimensions and the retrieval efectiveness. In detail, Dimension IMportance Estimators (DIMEs) have been proposed to identify a subspace of the original dense representation space where the retrieval is more efective. On a diferent research line, Query Performance Prediction (QPP) techniques focus on determining the performance of an IR system in the absence of human-made relevance judgements. In this extended abstract, we illustrate the efectiveness of QPP models that exploit the DIME mechanisms to formulate the predictions. In particular, the QPPs illustrated here rely on measuring how much the retrieval insists on dimensions considered relevant by a DIME model to establish how likely the retrieval was efective. To evaluate the efectiveness of the proposed approach, we consider two well-known IR collections, TREC Deep Learning '19 and '20, and dense IR approaches, TAS-B and Contriever, and show that the DIME-based QPPs achieve state-of-the-art results when predicting the performance of both IR systems on both collections.

1. Introduction

suboptimal retrieval that can lead to low performance. Our experimentation suggests that our hypothesis holds. By measuring the correlation between the representations of the retrieved documents with the importance of the dimensions determined using a DIME, we can efectively predict the performance of a set of state-of-the-art IR systems. More in detail, we can overcome the current state-of-the-art QPPs when predicting the performance for two popular dense encoders, Contriever [ 10 ] and TAS-B [ 11 ], on two TREC collections, Deep Learning 2019 and 2020.

The remainder of this work is organized as follows: Section 2 introduces the DIME framework and describes the QPP employed in this work. Section 3 reports our experimental evaluation, while in Section 4, we draw our conclusions and outline the future work.

2. Methodology

We introduce here the notation and background on DIMEs as proposed in [ 8 ] and we describe the proposed QPP developed in this work.

2.1. Background on Dimension IMportance Estimation

Consider a query for which the user wants to retrieve documents from a corpus . We define ℛ(, ; ) the ranked list produced by the IR model in response to . Assuming relevance judgments are available, we can compute a measure ℳ(ℛ(, ; )) that takes as input the list of retrieved documents and outputs a performance score. This work focuses on dense IR models employing an encoder to project the text (i.e., the query and the documents) into a dimensional embedding space R. The encoder is often a neural network trained with the objective of maximising the dot product between a query and corresponding relevant documents. Therefore, the score assigned to a document in response to a query is (, ) = ⟨(), ()⟩. With an abuse of notation, we call ℛ(, ; , ⟨⟩) the ranker that takes in input the query and the corpus, embeds them in the −dimensional space using , computes the dot product ⟨⟩ between the query and each document, and ranks the documents accordingly. We define the masked dot, ⟨⃗, ⃗⟩∖{} = ∑︀

=1;̸= · , the dot product between two arbitrary vectors ⃗ and ⃗, where the -th dimension is ignored. Faggioli et al. [ 8 ] experimentally showed that, given a query , it exists a set ⊂ {1, ..., } s.t.

ℳ(ℛ(, ; , ⟨⟩)) < ℳ(ℛ(, ; , ⟨⟩∖ ))

In other terms, given an encoder and a query there is a set of dimensions that are harmful to the retrieval: by simply discarding those dimensions when computing the dot product, it is possible to improve the quality of the retrieval1. Faggioli et al. [ 8 ] showed that the improvement depends on the collection considered and on the encoder , reaching peaks as big as +0.30 nDCG@10 points, moving from 0.5 to 0.8. Furthermore, they observe that the optimal dimensions are query-dependent, with each query being optimized by a diferent set of dimensions. While discarding some dimensions allows astonishing retrieval improvements, e.g. up to +73.4% in nDCG@10 when using TAS-B for RB ‘04 queries with 40% dimensions, determining which dimensions are optimal is not trivial. Therefore, Faggioli et al. [ 8 ] propose a novel class of models, called “Dimension IMportance Estimators (DIMEs)”, that rely on heuristics to determine which dimensions to preserve/remove. A DIME is a function : (R; ) → R that takes in input a representation of a query () ∈ R – and possibly some additional parameters – and outputs a vector ⃗ ∈ R that describes how much each dimension is important. The relation between the importance and of respectively dimensions and is defined as follows: ℳ(ℛ(, ; , ⟨⟩∖{}) < ℳ(ℛ(, ; , ⟨⟩∖{})) =⇒ >

In other terms, the -th dimension is more important than the -th ( > ) if the DIME considers it more likely the result will be worse by removing instead of when computing the dot product. 1Faggioli et al. [ 8 ] conjecture that the optimal subspace can be any subspace of the original embedding space but, to make the problem tractable, they focus only linear subspaces where some dimensions are removed. (1) (2)

The most efective DIMEs, according to Faggioli et al., are the Active Feedback DIME () and the LLM Pseudo Relevant Feedback DIME ( ). The former employs a relevant document (e.g., obtained by inspecting a query log or the user’s clicks) and the importance of a dimension is defined as follows:

(; ) = () · ( ), where () and () are respectively the -th dimensions of the query and relevant document’s representations. Similarly, is based on generating a pseudo-relevant document () by feeding the query to an LLM. The dimension importance in this case is defined as: (3) (4) (; ) = () · ( ())

Faggioli et. al. [ 8 ] employ the proposed DIMEs by selecting the most important dimensions with a ifxed .

2.2. The Proposed Query Performance Predictors

The predictors proposed here comprise an input and an aggregator component: • The input describes which input is used to compute the prediction. There are three options: the query vector, the document vectors, or the interaction vectors (the Hadamard product between the query and document vectors).

• The aggregator component describes how to combine the input vectors with the DIME.

All the predictors are instantiated by first inputting a query and computing the dimension importance using a DIME. Such DIME values are combined with the input vectors using the aggregator function. A predictor can be described as aggregator (input; DIME)). In terms of notation, the predictors are identified by ⟨input ID⟩-⟨aggregator ID⟩-⟨DIME ID⟩; for example, D-C-LLM indicates the QPP that uses documents as input (D), relies on the correlation aggregator (C) and estimates the dimension importance using as DIME. We now describe each class of components in more detail.

2.2.1. The input component.

Our predictors can be based on a single vector (e.g., they can consider only the representation of the query) or can employ multiple vectors. In our framework, in the former case, a statistic is computed for each vector to formulate the prediction. In the latter case, each vector’s statistic is computed individually and then aggregated by computing the mean. More in detail, let us call () the score that an aggregator component assigns to a vector . For the moment, we consider : R → R an arbitrary function that takes in input a vector and outputs a real number. Based on this definition, we can define three input components: “Q-” (Query), “D-” (Document), and “I-” (Interaction). Given an arbitrary aggregator function , the input components are defined as follows: • “Q-” input: given a query , the prediction is -() = (()) applied on the query vector). (i.e., the aggregator, directly • “D-” input: given documents 1, ..., , the prediction is -( 1, ..., ) = ∑︀=1 (( )) .

In this case, the aggregator is applied separately on each document vector and then averaged. • “I-” input: given a query and a documents 1, ..., , the prediction is -(, 1, ..., ) = ∑︀=1 (()∘( )) , where ∘ represents the Hadamard product (i.e., the element-wise multiplication between the two vectors).

Since the predictors based on the Q- input employ only the representation of the query and do not require access to the retrieved list of documents, they can be considered pre-retrieval predictors. On the contrary, predictors based on D- and I- input employ the top- documents retrieved, making them post-retrieval predictors. Additionally, notice that D- and I- predictors will have the number of documents considered as an additional hyper-parameter.

2.2.2. The aggregator component

Negative Importance (NI) aggregator. If a DIME considers a dimension to be detrimental, i.e., it would be better to remove it to increase the retrieval performance, this dimension should be as small as possible to obtain the best performance. Vice-versa, observing a high absolute value on such dimensions suggests non-efective retrieval. Therefore, the Negative Importance (NI) aggregator correlates the performance of the query with the inverse of the magnitude of the not important dimensions according to the DIME. We focus on the absolute value of the dimension: if the DIME would like to exclude it, the best case occurs when the absolute value is close to zero.

Let’s call − ⊂ {1, ..., } , with | − | = , the set of dimensions having the smallest relevance score according to an arbitrary DIME. In this case, the aggregator function can be defined as follows: NI(⃗; , − ) = ∑︀

∈ − abs()

Where ⃗ is the input vector for which we want to compute the aggregation. As mentioned before, this value can be used to instantiate a predictor based on Q-, D-, or I- input (respectively Q-NI, D-NI, and I-NI). Notice that the NI aggregator function has as a hyperparameter.

Positive Importance (PI) aggregator. The second aggregator function, called the Positive Importance (PI) aggregator, associates good performance with vectors having a large absolute value on dimensions considered important by the DIME. It can be considered the opposite of the NI aggregator. In line with the NI aggregator, we define +

⊂ {1, ..., } , with | +| = , the set of dimensions having the highest relevance score according to an arbitrary DIME. The aggregator function in this case is:

PI(⃗; , +) = ∑︀∈ + abs()

As before, the predictor has as a hyperparameter. The predictors are called Q-PI, D-PI, and I-PI, depending on which vector ⃗ is fed in input.

Ratio (R) aggregator. This aggregator computes the product between NI and PI. It is based on the same rationale as the previous two: large important dimensions are a positive signal, while large detrimental dimensions are a negative one. This aggregator is defined as follows: R(⃗; 1, 2, +1 , −2 ) = PI(⃗; 1, +1 ) · NI(⃗; 2, −2 ) = ∑︀ ∑︀∈ +1 abs() 2 ∈ −2 abs() · 1

Notice that, from a technical point of view, the two hyper-parameters 1 and 2 can be considered independent. To reduce the number of possible combinations to be tested, align it with other solutions, and make the approach more stable, we set 1 = and 2 = − , reducing the hyper-parameters to only . In other terms, the first dimensions are considered useful, while the remaining − dimensions are deemed detrimental. As in the other cases, the three variants of this approach are called Q-R, D-R, and I-R.

Alignment (A) aggregator. The Alignment aggregator measures the cosine similarity between the representation fed as input with a second vector constructed using the dimensions considered important by the DIME. More in detail, we call + the most relevant dimensions according to the DIME. We then construct a masking vector ⃗ s.t. = 1 if ∈ +, 0 otherwise. Then the score is computed as:

A(⃗; , +) = ⟨abs(⃗), ⃗⟩ |⃗|| abs(⃗)| (5) (6) (7) (8) 0.4 ρ s 'on0.2 rs a e P0.0

Similarly to the R aggregator, also in this case, we have a contribution both from negative and positive dimensions. Still, while the contribution of the positive dimensions is explicit through the dot product, the negative dimensions play a role in changing the normalisation value |⃗|. As before, is a hyperparameter.

Correlation (C) aggregator. Our final aggregator measures the correlation between the input vector ⃗ and the importance DIME vector ⃗:

C(⃗; ⃗) = corr(abs(⃗), ⃗) (9)

Multiple correlation functions can be used and in our experiments, we consider Kendall’s and Pearson’s correlations. More in detail, we do not select explicitly the correlation function but we treat it as a hyperparameter, choosing the optimal one according to the validation procedure described in Section 3.

3. Experimental Evaluation 3.1. Experimental Setup

In our experiments, we use our predictors to predict the performance of two IR models, Contriever [ 10 ] and TAS-B [ 11 ], on two collections, TREC Deep Learning 2019 (DL’ 19) [ 12 ] and TREC Deep Learning 2020 (DL’ 20) [ 13 ], and with respect to two evaluation measures P@10 and nDCG@10. We consider 5 state-of-the-art baselines, Clarity [ 14 ], ( %) [15], Weighted Information Gain (WIG) [16], Normalized Query Commitment (NQC) [17], and Score Magnitude and Variance (SMV) [18] as well as their Utility Estimation Framework (UEF) [19] enhanced counterparts. We consider a state-of-the-art QPP for dense models (DCWIG [ 4 ]) and BERTQPP [20]. To optimize the QPP hyperparameters, we adopt the wellknown two-fold cross-validation procedure in which queries are disjointly divided into two folds and, in turn, a fold is used to choose the hyperparameters and the other as a test set. The final performance is averaged across 30 repetitions, as commonly done in this setting [17, 21, 22, 23]. In terms of QPP evaluation measures, we report Pearson’s and Kendall’s between the actual and predicted performance. Additionally, instead of sMARE, we report 1-sMARE [24, 25], so that, in line with Pearson’s and Kendall’s , bigger values indicate more favourable results. All the results have been validated statistically using ANOVA [26] and Tukey’s honestly significant diference post-hoc comparison [ 27] with significance at 0.05 to correct for multiple comparisons. For all predictors, we validate the number of documents considered ∈ {5, 10, 25, 50, 100, 250, 500}. For the number of important dimensions , we validate its value by considering ∈ {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}.

3.2. Determining the Optimal QPP input and aggregator Function

We start our analysis by considering the aggregated performance based on diferent input and aggregator components proposed in this paper. 0.4 ρ s 'n0.2 o rs a e P0.0

3.3. Comparison With the State-of-the-Art

The previous section highlighted how the best input components are either D- or I-, while the best aggregator appear to be A, R and C. Therefore, in the remainder of this paper, we focus on the combinations of such approaches. Table 1 reports the performance of the current state-of-the-art approaches (top) compared to the proposed predictors based on either the LLM DIME (centre) or the Active Feedback DIME (bottom). Across all scenarios, predictors based on DIMEs are the most efective solutions. In general, the approaches based on the Active Feedback DIME (indicated with -REL) that employ a relevant document are more efective, regardless of the input and aggregator used. This makes sense considering that this DIME efectively employs a stronger relevance signal, compared to the LLM-based DIME, which uses only pseudo-relevance information. Nevertheless, with a few exceptions (e.g., WIG on DL’ 19 or NQC on DL’ 20 when predicting P@10 and evaluating with Pearson’s ), the predictors employing the LLM-based DIME are capable of overcoming all the state of the art approaches. To predict Precision, the most efective solutions are those employing the Active Feedback DIME, using K-

K- n( %) Clarity SMV NQC WIG UEFClarity UEFNQC UEFSMV UEFWIG BERTQPP DCWIG D-R-LLM D-A-LLM D-C-LLM I-R-LLM I-A-LLM I-C-LLM the set of retrieved documents as input and the Ratio aggregator. When it comes to predicting nDCG@10, the most efective solutions are either the one based on the Interaction input, Correlation aggregator, and the LLM-based DIME for DL’ 19 and the approach based on the Document input, Correlation aggregator, and Active feedback DIME for DL’ 20.

Since the Active Feedback-based DIME employ one relevant document, we also report in Figure 3 the average rank across diferent experimental settings for the predictors, excluding the ones based on the

Average Rank across settings 12 11 10 9 8 7 6

Active Feedback DIME. We can observe how predictors based on the LLM DIME are ranked, on average, above all the baseline predictors. In particular, the D-C-LLM predictor is on average ranked the highest (average rank 2.6). Nevertheless, the other predictors based on Ratio and Correlation aggregator (I-C-LLM, D-R-LLM, and I-R-LLM), are statistically equivalent (according to the Wilcoxon signed rank test) to the best, followed by the predictors based on the Alignment aggregator and DCWIG. The ifrst four approaches have statistically significantly higher ranks than any baseline.

Researchers and practitioners interested in using the DIME-base predictors should consider the following: • While using only the query representation as input (Q-) is suboptimal, the document (D-) and interaction (I-) inputs exhibit comparable results: the practitioner can validate the input depending on their setting. • Approaches based on the Alignment (-A-), Negative Importance (-NI-), and Positive importance (-PI-) should be avoided, as their performance is suboptimal compared to the approaches based on Ratio (-R-) and Correlation (-C-). Similarly to the input, the optimal aggregator between Ratio and Correlation should be validated. • If the practitioner has access to at least one relevant document for the query, the approaches based on the Active Feedback should be favoured (-REL). Nevertheless, predictors relying on the LLM-based DIME (-LLM) still overcome the current state-of-the-art performance.

4. Conclusion

In this work, we investigated how to employ DIMEs to carry out QPP. In particular, DIMEs are a class of models meant to determine the (query-specific) importance of each dimension in a latent embedding space used for dense IR. The predictors proposed in this work rely on measuring the alignment between the vectors involved in the retrieval and the importance estimated according to a DIME. The proposed QPPs can be instantiated with diferent inputs (the query, the documents, or their interaction vectors) and rely on diferent aggregations of such inputs. The most efective predictors are those based on either the documents or interaction representations that compute the correlation between such vectors and the dimension importance. The proposed approaches remarkably outperform the current state-of-the-art to predict the performance for two well-known dense models (Contriever and TAS-B) on two collections, DL’ 19 and DL’ 20. Among future work, we are interested in applying other DIME to instantiate our predictors as well as to develop QPP models that can serve as DIME themselves, for example, by providing insight on how each dimension contributes to the predicted performance of the system.

Declaration on Generative AI

The authors have not employed any Generative AI tools. International ACM SIGIR Conference on Research and Development in Information Retrieval, August 11-15, 2002, Tampere, Finland, ACM, 2002, pp. 299–306. URL: https://doi.org/10.1145/ 564376.564429. doi:10.1145/564376.564429. [15] R. Cummins, J. Jose, C. O’Riordan, Improved query performance prediction using standard deviation, in: W. Ma, J. Nie, R. Baeza-Yates, T. Chua, W. B. Croft (Eds.), Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, July 25-29, 2011, ACM, 2011, pp. 1089–1090. URL: https://doi.org/10. 1145/2009916.2010063. doi:10.1145/2009916.2010063. [16] Y. Zhou, W. B. Croft, Query performance prediction in web search environments, in: W. Kraaij, A. P. de Vries, C. L. A. Clarke, N. Fuhr, N. Kando (Eds.), SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, July 23-27, 2007, ACM, 2007, pp. 543–550. URL: https: //doi.org/10.1145/1277741.1277835. doi:10.1145/1277741.1277835. [17] A. Shtok, O. Kurland, D. Carmel, F. Raiber, G. Markovits, Predicting query performance by querydrift estimation, ACM Trans. Inf. Syst. 30 (2012) 11:1–11:35. URL: https://doi.org/10.1145/2180868. 2180873. doi:10.1145/2180868.2180873. [18] Y. Tao, S. Wu, Query performance prediction by considering score magnitude and variance together, in: J. Li, X. S. Wang, M. N. Garofalakis, I. Soborof, T. Suel, M. Wang (Eds.), Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014, Shanghai, China, November 3-7, 2014, ACM, 2014, pp. 1891–1894. URL: https://doi. org/10.1145/2661829.2661906. doi:10.1145/2661829.2661906. [19] A. Shtok, O. Kurland, D. Carmel, Using statistical decision theory and relevance models for queryperformance prediction, in: F. Crestani, S. Marchand-Maillet, H. Chen, E. N. Efthimiadis, J. Savoy (Eds.), Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, Geneva, Switzerland, July 19-23, 2010, ACM, 2010, pp. 259–266. URL: https://doi.org/10.1145/1835449.1835494. doi:10.1145/1835449.1835494. [20] N. Arabzadeh, M. Khodabakhsh, E. Bagheri, BERT-QPP: contextualized pre-trained transformers for query performance prediction, in: G. Demartini, G. Zuccon, J. S. Culpepper, Z. Huang, H. Tong (Eds.), CIKM ’21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1 - 5, 2021, ACM, 2021, pp. 2857– 2861. URL: https://doi.org/10.1145/3459637.3482063. doi:10.1145/3459637.3482063. [21] O. Zendel, A. Shtok, F. Raiber, O. Kurland, J. S. Culpepper, Information needs, queries, and query performance prediction, in: B. Piwowarski, M. Chevalier, É. Gaussier, Y. Maarek, J. Nie, F. Scholer (Eds.), Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, July 21-25, 2019, ACM, 2019, pp. 395–404. URL: https://doi.org/10.1145/3331184.3331253. doi:10.1145/3331184.3331253. [22] H. Zamani, W. B. Croft, J. S. Culpepper, Neural query performance prediction using weak supervision from multiple signals, in: K. Collins-Thompson, Q. Mei, B. D. Davison, Y. Liu, E. Yilmaz (Eds.), The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08-12, 2018, ACM, 2018, pp. 105–114. URL: https://doi.org/10.1145/3209978.3210041. doi:10.1145/3209978.3210041. [23] G. Faggioli, N. Ferro, C. Muntean, R. Perego, N. Tonellotto, A Geometric Framework for Query Performance Prediction in Conversational Search, in: Proceedings of 46th international ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2023 July 23–27, 2023, Taipei, Taiwan, ACM, 2023. doi:https://doi.org/10.1145/3539618.3591625. [24] G. Faggioli, O. Zendel, J. S. Culpepper, N. Ferro, F. Scholer, An enhanced evaluation framework for query performance prediction, in: D. Hiemstra, M. Moens, J. Mothe, R. Perego, M. Potthast, F. Sebastiani (Eds.), Advances in Information Retrieval - 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28 - April 1, 2021, Proceedings, Part I, volume 12656 of Lecture Notes in Computer Science, Springer, 2021, pp. 115–129. URL: https://doi.org/10.1007/978-3-030-72113-8_8. doi:10.1007/978-3-030-72113-8\_8. [25] G. Faggioli, O. Zendel, J. S. Culpepper, N. Ferro, F. Scholer, smare: a new paradigm to evaluate and understand query performance prediction methods, Inf. Retr. J. 25 (2022) 94–122. URL: https://doi.org/10.1007/s10791-022-09407-w. doi:10.1007/s10791-022-09407-w. [26] A. Rutherford, ANOVA and ANCOVA: a GLM approach, John Wiley & Sons, 2011. [27] J. W. Tukey, Comparing individual means in the analysis of variance, Biometrics 5 (1949) 99–114.

URL: http://www.jstor.org/stable/3001913.

[1]

Hauf , Predicting the efectiveness of queries and retrieval systems , SIGIR Forum 44 ( 2010 ) 88 . URL: https://doi.org/10.1145/1842890.1842906. doi: 10 .1145/1842890.1842906.

[2]

Carmel ,

Yom-Tov , Estimating the Query Dificulty for Information Retrieval , Synthesis Lectures on Information Concepts , Retrieval, and Services, Morgan & Claypool Publishers, 2010 . URL: https://doi.org/10.2200/S00235ED1V01Y201004ICR015. doi: 10 .2200/ S00235ED1V01Y201004ICR015.

[3]

Faggioli ,

Formal ,

Marchesin ,

Clinchant ,

Ferro , B. Piwowarski, Query Performance Prediction for Neural IR: Are We There Yet? , in: Advances in Information Retrieval - 45th European Conference on IR Research , ECIR 2023 , Dublin, Ireland, April 2- 6 , 2023 , 2023 , pp. 1 - 18 . URL: https://arxiv.org/abs/2302.09947. doi: 10 .48550/ARXIV.2302.09947.

[4]

Faggioli ,

Formal ,

Lupart ,

Marchesin ,

Clinchant ,

Ferro ,

Piwowarski , Towards query performance prediction for neural information retrieval: Challenges and opportunities , in: M. Yoshioka , J. Kiseleva , M. Aliannejadi (Eds.), Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval , ICTIR 2023 , Taipei, Taiwan, 23 July 2023 , ACM, 2023 , pp. 51 - 63 . URL: https://doi.org/10.1145/3578337.3605142. doi: 10 .1145/3578337.3605142.

[5]

Datta , S. MacAvaney,

Ganguly ,

Greene , A 'pointwise-query, listwise-document' based query performance prediction approach , in: Proceedings of 45th international ACM SIGIR conference research development in information retrieval , 2022 , pp. 2148 -- 2153 . URL: https: //doi.org/10.1145/3477495.3531821. doi: 10 .1145/3477495.3531821.

[6]

Datta ,

Ganguly ,

Mitra ,

Greene , A relative information gain-based query performance prediction framework with generated query variants , ACM Trans. Inf. Syst . 41 ( 2023 ) 38 : 1 - 38 : 31 . URL: https://doi.org/10.1145/3545112. doi: 10 .1145/3545112.

[7]

Arabzadeh ,

R. H.

Rad ,

Khodabakhsh , E. Bagheri, Noisy perturbations for estimating query dificulty in dense retrievers , in: I. Frommholz , F.

Hopfgartner , M.

Lee , M.

Oakes , M.

Lalmas , M.

Zhang , R. L. T. Santos (Eds.), Proceedings of the 32nd ACM International Conference on Information and Knowledge Management , CIKM 2023 , Birmingham, United Kingdom, October 21-25 , 2023 , ACM, 2023 , pp. 3722 - 3727 . URL: https://doi.org/10.1145/3583780.3615270. doi: 10 . 1145/3583780.3615270.

[8]

Faggioli ,

Ferro ,

Perego ,

Tonellotto , Dimension importance estimation for dense information retrieval , in: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24) , July 14-18 , 2024 , Washington, DC, USA, ACM, 2024 . URL: https://doi.org/10.1145/3626772.3657691. doi: 10 .1145/3626772.3657691.

[9]

Faggioli ,

Ferro ,

Perego ,

Tonellotto , Codime: a counterfactual approach for dimension importance estimation through click logs , in: SIGIR '25: The 48th International ACM SIGIR Conference on Research and Development in Information Retrieval July 13-18 , 2025 , ACM, 2025 .

[10]

Izacard ,

Caron ,

Hosseini ,

Riedel ,

Bojanowski ,

Joulin , E. Grave, Towards unsupervised dense information retrieval with contrastive learning , CoRR abs/2112 .09118 ( 2021 ). URL: https://arxiv.org/abs/2112.09118. arXiv: 2112 . 09118 .

[11]

Hofstätter ,

Lin ,

Yang ,

Lin ,

Hanbury , Eficiently teaching an efective dense retriever with balanced topic aware sampling , in: F. Diaz,

Shah ,

Suel ,

Castells ,

Jones , T. Sakai (Eds.), SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval , Virtual Event, Canada, July 11-15 , 2021 , ACM, 2021 , pp. 113 - 122 . URL: https://doi.org/10.1145/3404835.3462891. doi: 10 .1145/3404835.3462891.

[12]

Craswell ,

Mitra ,

Yilmaz ,

Campos ,

E. M.

Voorhees , Overview of the TREC 2019 deep learning track , CoRR abs/ 2003 .07820 ( 2020 ). URL: https://arxiv.org/abs/ 2003 .07820. arXiv: 2003 .07820.

[13]

Craswell ,

Mitra ,

Yilmaz ,

Campos , Overview of the TREC 2020 deep learning track , CoRR abs/2102 .07662 ( 2021 ). URL: https://arxiv.org/abs/2102.07662. arXiv: 2102 . 07662 .

[14]

Cronen-Townsend ,

Zhou , W. B. Croft , Predicting query performance , in: K. Järvelin,

Beaulieu ,

R. A.

Baeza-Yates , S. Myaeng (Eds.), SIGIR 2002: Proceedings of the 25th Annual