<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>SEBD</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A Dimension Importance Estimation-based Framework for Query Performance Prediction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Guglielmo Faggioli</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Ferro</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafaele Perego</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Tonellotto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>3University of Pisa</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ISTI-CNR</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Padua</institution>
          ,
          <addr-line>Padua</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>33</volume>
      <fpage>16</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>Recent developments in the dense Information Retrieval (IR) domain have shown the ties between the links between the latent dimensions and the retrieval efectiveness. In detail, Dimension IMportance Estimators (DIMEs) have been proposed to identify a subspace of the original dense representation space where the retrieval is more efective. On a diferent research line, Query Performance Prediction (QPP) techniques focus on determining the performance of an IR system in the absence of human-made relevance judgements. In this extended abstract, we illustrate the efectiveness of QPP models that exploit the DIME mechanisms to formulate the predictions. In particular, the QPPs illustrated here rely on measuring how much the retrieval insists on dimensions considered relevant by a DIME model to establish how likely the retrieval was efective. To evaluate the efectiveness of the proposed approach, we consider two well-known IR collections, TREC Deep Learning '19 and '20, and dense IR approaches, TAS-B and Contriever, and show that the DIME-based QPPs achieve state-of-the-art results when predicting the performance of both IR systems on both collections.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        suboptimal retrieval that can lead to low performance. Our experimentation suggests that our hypothesis
holds. By measuring the correlation between the representations of the retrieved documents with the
importance of the dimensions determined using a DIME, we can efectively predict the performance of
a set of state-of-the-art IR systems. More in detail, we can overcome the current state-of-the-art QPPs
when predicting the performance for two popular dense encoders, Contriever [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and TAS-B [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], on
two TREC collections, Deep Learning 2019 and 2020.
      </p>
      <p>The remainder of this work is organized as follows: Section 2 introduces the DIME framework and
describes the QPP employed in this work. Section 3 reports our experimental evaluation, while in
Section 4, we draw our conclusions and outline the future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>
        We introduce here the notation and background on DIMEs as proposed in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and we describe the
proposed QPP developed in this work.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Background on Dimension IMportance Estimation</title>
        <p>Consider a query  for which the user wants to retrieve documents from a corpus . We define ℛ(, ; )
the ranked list produced by the IR model  in response to . Assuming relevance judgments are available,
we can compute a measure ℳ(ℛ(, ; )) that takes as input the list of retrieved documents and
outputs a performance score. This work focuses on dense IR models employing an encoder  to project
the text (i.e., the query and the documents) into a  dimensional embedding space R. The encoder is
often a neural network trained with the objective of maximising the dot product between a query and
corresponding relevant documents. Therefore, the score assigned to a document  in response to a
query  is (, ) = ⟨(), ()⟩. With an abuse of notation, we call ℛ(, ; , ⟨⟩) the ranker that
takes in input the query and the corpus, embeds them in the −dimensional space using , computes
the dot product ⟨⟩ between the query and each document, and ranks the documents accordingly. We
define the masked dot, ⟨⃗, ⃗⟩∖{} = ∑︀</p>
        <p>
          =1;̸=  ·   , the dot product between two arbitrary vectors
⃗ and ⃗, where the -th dimension is ignored. Faggioli et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] experimentally showed that, given a
query , it exists a set  ⊂ {1, ..., } s.t.
        </p>
        <p>ℳ(ℛ(, ; , ⟨⟩)) &lt; ℳ(ℛ(, ; , ⟨⟩∖ ))</p>
        <p>
          In other terms, given an encoder  and a query  there is a set of dimensions that are harmful to
the retrieval: by simply discarding those dimensions when computing the dot product, it is possible to
improve the quality of the retrieval1. Faggioli et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] showed that the improvement depends on the
collection considered and on the encoder , reaching peaks as big as +0.30 nDCG@10 points, moving
from 0.5 to 0.8. Furthermore, they observe that the optimal dimensions are query-dependent, with
each query being optimized by a diferent set of dimensions. While discarding some dimensions allows
astonishing retrieval improvements, e.g. up to +73.4% in nDCG@10 when using TAS-B for RB ‘04 queries
with 40% dimensions, determining which dimensions are optimal is not trivial. Therefore, Faggioli et al.
[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] propose a novel class of models, called “Dimension IMportance Estimators (DIMEs)”, that rely on
heuristics to determine which dimensions to preserve/remove. A DIME is a function  : (R; ) → R 
that takes in input a representation of a query () ∈ R – and possibly some additional parameters
 – and outputs a vector ⃗ ∈ R that describes how much each dimension is important. The relation
between the importance  and  of respectively dimensions  and  is defined as follows:
ℳ(ℛ(, ; , ⟨⟩∖{}) &lt; ℳ(ℛ(, ; , ⟨⟩∖{})) =⇒  &gt;
        </p>
        <p>
          In other terms, the -th dimension is more important than the -th ( &gt;  ) if the DIME  considers
it more likely the result will be worse by removing  instead of  when computing the dot product.
1Faggioli et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] conjecture that the optimal subspace can be any subspace of the original embedding space but, to make the
problem tractable, they focus only linear subspaces where some dimensions are removed.
(1)
(2)
        </p>
        <p>The most efective DIMEs, according to Faggioli et al., are the Active Feedback DIME () and
the LLM Pseudo Relevant Feedback DIME ( ). The former employs a relevant document  (e.g.,
obtained by inspecting a query log or the user’s clicks) and the importance of a dimension is defined as
follows:</p>
        <p>(; ) = () · ( ),
where () and () are respectively the -th dimensions of the query and relevant document’s
representations. Similarly,  is based on generating a pseudo-relevant document  () by
feeding the query to an LLM. The dimension importance in this case is defined as:
(3)
(4)
 (;  ) = () · ( ())</p>
        <p>
          Faggioli et. al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] employ the proposed DIMEs by selecting the  most important dimensions with a
ifxed .
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. The Proposed Query Performance Predictors</title>
        <p>The predictors proposed here comprise an input and an aggregator component:
• The input describes which input is used to compute the prediction. There are three options: the
query vector, the document vectors, or the interaction vectors (the Hadamard product between
the query and document vectors).</p>
        <p>• The aggregator component describes how to combine the input vectors with the DIME.</p>
        <p>All the predictors are instantiated by first inputting a query and computing the dimension importance
using a DIME. Such DIME values are combined with the input vectors using the aggregator function.
A predictor can be described as aggregator (input; DIME)). In terms of notation, the predictors are
identified by ⟨input ID⟩-⟨aggregator ID⟩-⟨DIME ID⟩; for example, D-C-LLM indicates the QPP that
uses documents as input (D), relies on the correlation aggregator (C) and estimates the dimension
importance using  as DIME. We now describe each class of components in more detail.</p>
        <sec id="sec-2-2-1">
          <title>2.2.1. The input component.</title>
          <p>Our predictors can be based on a single vector (e.g., they can consider only the representation of the
query) or can employ multiple vectors. In our framework, in the former case, a statistic is computed
for each vector to formulate the prediction. In the latter case, each vector’s statistic is computed
individually and then aggregated by computing the mean. More in detail, let us call () the score
that an aggregator component assigns to a vector . For the moment, we consider  : R  → R an
arbitrary function that takes in input a vector and outputs a real number. Based on this definition, we
can define three input components: “Q-” (Query), “D-” (Document), and “I-” (Interaction). Given an
arbitrary aggregator function , the input components are defined as follows:
• “Q-” input: given a query , the prediction is -() = (())
applied on the query vector).
(i.e., the aggregator, directly
• “D-” input: given  documents 1, ..., , the prediction is -( 1, ..., ) = ∑︀=1 (( )) .</p>
          <p>In this case, the aggregator is applied separately on each document vector and then averaged.
• “I-” input: given a query  and a  documents 1, ..., , the prediction is -(,  1, ..., ) =
∑︀=1 (()∘( )) , where ∘ represents the Hadamard product (i.e., the element-wise
multiplication between the two vectors).</p>
          <p>Since the predictors based on the Q- input employ only the representation of the query and do
not require access to the retrieved list of documents, they can be considered pre-retrieval predictors.
On the contrary, predictors based on D- and I- input employ the top- documents retrieved, making
them post-retrieval predictors. Additionally, notice that D- and I- predictors will have the number of
documents considered  as an additional hyper-parameter.</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>2.2.2. The aggregator component</title>
          <p>Negative Importance (NI) aggregator. If a DIME considers a dimension to be detrimental, i.e., it
would be better to remove it to increase the retrieval performance, this dimension should be as small as
possible to obtain the best performance. Vice-versa, observing a high absolute value on such dimensions
suggests non-efective retrieval. Therefore, the Negative Importance (NI) aggregator correlates the
performance of the query with the inverse of the magnitude of the not important dimensions according
to the DIME. We focus on the absolute value of the dimension: if the DIME would like to exclude it, the
best case occurs when the absolute value is close to zero.</p>
          <p>Let’s call  − ⊂ {1, ..., } , with | − | = , the set of  dimensions having the smallest relevance score
 according to an arbitrary DIME. In this case, the aggregator function can be defined as follows:
NI(⃗; ,  − ) = ∑︀</p>
          <p>∈ − abs()</p>
          <p>Where ⃗ is the input vector for which we want to compute the aggregation. As mentioned before,
this value can be used to instantiate a predictor based on Q-, D-, or I- input (respectively Q-NI, D-NI,
and I-NI). Notice that the NI aggregator function has  as a hyperparameter.</p>
          <p>Positive Importance (PI) aggregator. The second aggregator function, called the Positive
Importance (PI) aggregator, associates good performance with vectors having a large absolute value on
dimensions considered important by the DIME. It can be considered the opposite of the NI aggregator.
In line with the NI aggregator, we define  +</p>
          <p>⊂ {1, ..., } , with | +| = , the set of  dimensions
having the highest relevance score  according to an arbitrary DIME. The aggregator function in
this case is:</p>
          <p>PI(⃗; ,  +) =
∑︀∈ + abs()</p>
          <p>As before, the predictor has  as a hyperparameter. The predictors are called Q-PI, D-PI, and I-PI,
depending on which vector ⃗ is fed in input.</p>
          <p>Ratio (R) aggregator. This aggregator computes the product between NI and PI. It is based on
the same rationale as the previous two: large important dimensions are a positive signal, while large
detrimental dimensions are a negative one. This aggregator is defined as follows:
R(⃗; 1, 2,  +1 ,  −2 ) = PI(⃗; 1,  +1 ) · NI(⃗;  2,  −2 ) = ∑︀
∑︀∈ +1 abs() 2
∈ −2 abs() · 1</p>
          <p>Notice that, from a technical point of view, the two hyper-parameters 1 and 2 can be considered
independent. To reduce the number of possible combinations to be tested, align it with other solutions,
and make the approach more stable, we set 1 =  and 2 =  −  , reducing the hyper-parameters to only
. In other terms, the first  dimensions are considered useful, while the remaining  −  dimensions are
deemed detrimental. As in the other cases, the three variants of this approach are called Q-R, D-R, and
I-R.</p>
          <p>Alignment (A) aggregator. The Alignment aggregator measures the cosine similarity between
the representation fed as input with a second vector constructed using the dimensions considered
important by the DIME. More in detail, we call  + the  most relevant dimensions according to the
DIME. We then construct a masking vector ⃗ s.t.  = 1 if  ∈  +, 0 otherwise. Then the score is
computed as:</p>
          <p>A(⃗; ,  +) = ⟨abs(⃗), ⃗⟩
|⃗|| abs(⃗)|
(5)
(6)
(7)
(8)
0.4
ρ
s
'on0.2
rs
a
e
P0.0</p>
          <p>Similarly to the R aggregator, also in this case, we have a contribution both from negative and
positive dimensions. Still, while the contribution of the positive dimensions is explicit through the dot
product, the negative dimensions play a role in changing the normalisation value |⃗|. As before,  is a
hyperparameter.</p>
          <p>Correlation (C) aggregator. Our final aggregator measures the correlation between the input
vector ⃗ and the importance DIME vector ⃗:</p>
          <p>C(⃗; ⃗) = corr(abs(⃗), ⃗)
(9)</p>
          <p>Multiple correlation functions can be used and in our experiments, we consider Kendall’s  and
Pearson’s  correlations. More in detail, we do not select explicitly the correlation function but we treat
it as a hyperparameter, choosing the optimal one according to the validation procedure described in
Section 3.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Evaluation</title>
      <sec id="sec-3-1">
        <title>3.1. Experimental Setup</title>
        <p>
          In our experiments, we use our predictors to predict the performance of two IR models, Contriever [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]
and TAS-B [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], on two collections, TREC Deep Learning 2019 (DL’ 19) [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] and TREC Deep Learning
2020 (DL’ 20) [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], and with respect to two evaluation measures P@10 and nDCG@10. We consider 5
state-of-the-art baselines, Clarity [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], ( %) [15], Weighted Information Gain (WIG) [16], Normalized
Query Commitment (NQC) [17], and Score Magnitude and Variance (SMV) [18] as well as their Utility
Estimation Framework (UEF) [19] enhanced counterparts. We consider a state-of-the-art QPP for dense
models (DCWIG [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]) and BERTQPP [20]. To optimize the QPP hyperparameters, we adopt the
wellknown two-fold cross-validation procedure in which queries are disjointly divided into two folds
and, in turn, a fold is used to choose the hyperparameters and the other as a test set. The final
performance is averaged across 30 repetitions, as commonly done in this setting [17, 21, 22, 23]. In terms
of QPP evaluation measures, we report Pearson’s  and Kendall’s  between the actual and predicted
performance. Additionally, instead of sMARE, we report 1-sMARE [24, 25], so that, in line with Pearson’s
 and Kendall’s  , bigger values indicate more favourable results. All the results have been validated
statistically using ANOVA [26] and Tukey’s honestly significant diference post-hoc comparison [ 27]
with significance at 0.05 to correct for multiple comparisons. For all predictors, we validate the number
of documents considered  ∈ {5, 10, 25, 50, 100, 250, 500}. For the number of important dimensions ,
we validate its value by considering  ∈ {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Determining the Optimal QPP input and aggregator Function</title>
        <p>We start our analysis by considering the aggregated performance based on diferent input and
aggregator components proposed in this paper.
0.4
ρ
s
'n0.2
o
rs
a
e
P0.0</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Comparison With the State-of-the-Art</title>
        <p>The previous section highlighted how the best input components are either D- or I-, while the best
aggregator appear to be A, R and C. Therefore, in the remainder of this paper, we focus on the
combinations of such approaches. Table 1 reports the performance of the current state-of-the-art
approaches (top) compared to the proposed predictors based on either the LLM DIME (centre) or the
Active Feedback DIME (bottom). Across all scenarios, predictors based on DIMEs are the most efective
solutions. In general, the approaches based on the Active Feedback DIME (indicated with -REL) that
employ a relevant document are more efective, regardless of the input and aggregator used. This
makes sense considering that this DIME efectively employs a stronger relevance signal, compared to the
LLM-based DIME, which uses only pseudo-relevance information. Nevertheless, with a few exceptions
(e.g., WIG on DL’ 19 or NQC on DL’ 20 when predicting P@10 and evaluating with Pearson’s  ), the
predictors employing the LLM-based DIME are capable of overcoming all the state of the art approaches.
To predict Precision, the most efective solutions are those employing the Active Feedback DIME, using
K-</p>
        <p>K-
n( %)
Clarity
SMV
NQC
WIG
UEFClarity
UEFNQC
UEFSMV
UEFWIG
BERTQPP
DCWIG
D-R-LLM
D-A-LLM
D-C-LLM
I-R-LLM
I-A-LLM
I-C-LLM
the set of retrieved documents as input and the Ratio aggregator. When it comes to predicting
nDCG@10, the most efective solutions are either the one based on the Interaction input, Correlation
aggregator, and the LLM-based DIME for DL’ 19 and the approach based on the Document input,
Correlation aggregator, and Active feedback DIME for DL’ 20.</p>
        <p>Since the Active Feedback-based DIME employ one relevant document, we also report in Figure 3 the
average rank across diferent experimental settings for the predictors, excluding the ones based on the</p>
        <p>Average Rank across settings
12 11 10 9 8 7 6</p>
        <p>Active Feedback DIME. We can observe how predictors based on the LLM DIME are ranked, on average,
above all the baseline predictors. In particular, the D-C-LLM predictor is on average ranked the highest
(average rank 2.6). Nevertheless, the other predictors based on Ratio and Correlation aggregator
(I-C-LLM, D-R-LLM, and I-R-LLM), are statistically equivalent (according to the Wilcoxon signed rank
test) to the best, followed by the predictors based on the Alignment aggregator and DCWIG. The
ifrst four approaches have statistically significantly higher ranks than any baseline.</p>
        <p>Researchers and practitioners interested in using the DIME-base predictors should consider the
following:
• While using only the query representation as input (Q-) is suboptimal, the document (D-) and
interaction (I-) inputs exhibit comparable results: the practitioner can validate the input depending
on their setting.
• Approaches based on the Alignment (-A-), Negative Importance (-NI-), and Positive importance
(-PI-) should be avoided, as their performance is suboptimal compared to the approaches based
on Ratio (-R-) and Correlation (-C-). Similarly to the input, the optimal aggregator between
Ratio and Correlation should be validated.
• If the practitioner has access to at least one relevant document for the query, the approaches
based on the Active Feedback should be favoured (-REL). Nevertheless, predictors relying on the
LLM-based DIME (-LLM) still overcome the current state-of-the-art performance.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this work, we investigated how to employ DIMEs to carry out QPP. In particular, DIMEs are a class
of models meant to determine the (query-specific) importance of each dimension in a latent embedding
space used for dense IR. The predictors proposed in this work rely on measuring the alignment between
the vectors involved in the retrieval and the importance estimated according to a DIME. The proposed
QPPs can be instantiated with diferent inputs (the query, the documents, or their interaction vectors)
and rely on diferent aggregations of such inputs. The most efective predictors are those based on either
the documents or interaction representations that compute the correlation between such vectors and the
dimension importance. The proposed approaches remarkably outperform the current state-of-the-art to
predict the performance for two well-known dense models (Contriever and TAS-B) on two collections,
DL’ 19 and DL’ 20. Among future work, we are interested in applying other DIME to instantiate
our predictors as well as to develop QPP models that can serve as DIME themselves, for example, by
providing insight on how each dimension contributes to the predicted performance of the system.</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
International ACM SIGIR Conference on Research and Development in Information Retrieval,
August 11-15, 2002, Tampere, Finland, ACM, 2002, pp. 299–306. URL: https://doi.org/10.1145/
564376.564429. doi:10.1145/564376.564429.
[15] R. Cummins, J. Jose, C. O’Riordan, Improved query performance prediction using standard
deviation, in: W. Ma, J. Nie, R. Baeza-Yates, T. Chua, W. B. Croft (Eds.), Proceeding of the 34th
International ACM SIGIR Conference on Research and Development in Information Retrieval,
SIGIR 2011, Beijing, China, July 25-29, 2011, ACM, 2011, pp. 1089–1090. URL: https://doi.org/10.
1145/2009916.2010063. doi:10.1145/2009916.2010063.
[16] Y. Zhou, W. B. Croft, Query performance prediction in web search environments, in: W. Kraaij,
A. P. de Vries, C. L. A. Clarke, N. Fuhr, N. Kando (Eds.), SIGIR 2007: Proceedings of the 30th
Annual International ACM SIGIR Conference on Research and Development in Information
Retrieval, Amsterdam, The Netherlands, July 23-27, 2007, ACM, 2007, pp. 543–550. URL: https:
//doi.org/10.1145/1277741.1277835. doi:10.1145/1277741.1277835.
[17] A. Shtok, O. Kurland, D. Carmel, F. Raiber, G. Markovits, Predicting query performance by
querydrift estimation, ACM Trans. Inf. Syst. 30 (2012) 11:1–11:35. URL: https://doi.org/10.1145/2180868.
2180873. doi:10.1145/2180868.2180873.
[18] Y. Tao, S. Wu, Query performance prediction by considering score magnitude and variance together,
in: J. Li, X. S. Wang, M. N. Garofalakis, I. Soborof, T. Suel, M. Wang (Eds.), Proceedings of the
23rd ACM International Conference on Conference on Information and Knowledge Management,
CIKM 2014, Shanghai, China, November 3-7, 2014, ACM, 2014, pp. 1891–1894. URL: https://doi.
org/10.1145/2661829.2661906. doi:10.1145/2661829.2661906.
[19] A. Shtok, O. Kurland, D. Carmel, Using statistical decision theory and relevance models for
queryperformance prediction, in: F. Crestani, S. Marchand-Maillet, H. Chen, E. N. Efthimiadis, J. Savoy
(Eds.), Proceeding of the 33rd International ACM SIGIR Conference on Research and Development
in Information Retrieval, SIGIR 2010, Geneva, Switzerland, July 19-23, 2010, ACM, 2010, pp.
259–266. URL: https://doi.org/10.1145/1835449.1835494. doi:10.1145/1835449.1835494.
[20] N. Arabzadeh, M. Khodabakhsh, E. Bagheri, BERT-QPP: contextualized pre-trained transformers
for query performance prediction, in: G. Demartini, G. Zuccon, J. S. Culpepper, Z. Huang,
H. Tong (Eds.), CIKM ’21: The 30th ACM International Conference on Information and Knowledge
Management, Virtual Event, Queensland, Australia, November 1 - 5, 2021, ACM, 2021, pp. 2857–
2861. URL: https://doi.org/10.1145/3459637.3482063. doi:10.1145/3459637.3482063.
[21] O. Zendel, A. Shtok, F. Raiber, O. Kurland, J. S. Culpepper, Information needs, queries, and query
performance prediction, in: B. Piwowarski, M. Chevalier, É. Gaussier, Y. Maarek, J. Nie, F. Scholer
(Eds.), Proceedings of the 42nd International ACM SIGIR Conference on Research and Development
in Information Retrieval, SIGIR 2019, Paris, France, July 21-25, 2019, ACM, 2019, pp. 395–404. URL:
https://doi.org/10.1145/3331184.3331253. doi:10.1145/3331184.3331253.
[22] H. Zamani, W. B. Croft, J. S. Culpepper, Neural query performance prediction using weak
supervision from multiple signals, in: K. Collins-Thompson, Q. Mei, B. D. Davison, Y. Liu, E. Yilmaz
(Eds.), The 41st International ACM SIGIR Conference on Research &amp; Development in
Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08-12, 2018, ACM, 2018, pp. 105–114. URL:
https://doi.org/10.1145/3209978.3210041. doi:10.1145/3209978.3210041.
[23] G. Faggioli, N. Ferro, C. Muntean, R. Perego, N. Tonellotto, A Geometric Framework for Query
Performance Prediction in Conversational Search, in: Proceedings of 46th international ACM
SIGIR Conference on Research &amp; Development in Information Retrieval, SIGIR 2023 July 23–27,
2023, Taipei, Taiwan, ACM, 2023. doi:https://doi.org/10.1145/3539618.3591625.
[24] G. Faggioli, O. Zendel, J. S. Culpepper, N. Ferro, F. Scholer, An enhanced evaluation framework
for query performance prediction, in: D. Hiemstra, M. Moens, J. Mothe, R. Perego, M. Potthast,
F. Sebastiani (Eds.), Advances in Information Retrieval - 43rd European Conference on IR Research,
ECIR 2021, Virtual Event, March 28 - April 1, 2021, Proceedings, Part I, volume 12656 of Lecture Notes
in Computer Science, Springer, 2021, pp. 115–129. URL: https://doi.org/10.1007/978-3-030-72113-8_8.
doi:10.1007/978-3-030-72113-8\_8.
[25] G. Faggioli, O. Zendel, J. S. Culpepper, N. Ferro, F. Scholer, smare: a new paradigm to evaluate
and understand query performance prediction methods, Inf. Retr. J. 25 (2022) 94–122. URL:
https://doi.org/10.1007/s10791-022-09407-w. doi:10.1007/s10791-022-09407-w.
[26] A. Rutherford, ANOVA and ANCOVA: a GLM approach, John Wiley &amp; Sons, 2011.
[27] J. W. Tukey, Comparing individual means in the analysis of variance, Biometrics 5 (1949) 99–114.</p>
      <p>URL: http://www.jstor.org/stable/3001913.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Hauf</surname>
          </string-name>
          ,
          <article-title>Predicting the efectiveness of queries and retrieval systems</article-title>
          ,
          <source>SIGIR Forum 44</source>
          (
          <year>2010</year>
          )
          <article-title>88</article-title>
          . URL: https://doi.org/10.1145/1842890.1842906. doi:
          <volume>10</volume>
          .1145/1842890.1842906.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Carmel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Yom-Tov</surname>
          </string-name>
          ,
          <article-title>Estimating the Query Dificulty for Information Retrieval</article-title>
          ,
          <source>Synthesis Lectures on Information Concepts</source>
          , Retrieval, and Services, Morgan &amp; Claypool Publishers,
          <year>2010</year>
          . URL: https://doi.org/10.2200/S00235ED1V01Y201004ICR015. doi:
          <volume>10</volume>
          .2200/ S00235ED1V01Y201004ICR015.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Faggioli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Formal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Marchesin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Clinchant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. Piwowarski,</surname>
          </string-name>
          <article-title>Query Performance Prediction for Neural IR: Are We There Yet?</article-title>
          ,
          <source>in: Advances in Information Retrieval - 45th European Conference on IR Research</source>
          , ECIR
          <year>2023</year>
          , Dublin, Ireland, April 2-
          <issue>6</issue>
          ,
          <year>2023</year>
          ,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          . URL: https://arxiv.org/abs/2302.09947. doi:
          <volume>10</volume>
          .48550/ARXIV.2302.09947.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Faggioli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Formal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lupart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Marchesin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Clinchant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Piwowarski</surname>
          </string-name>
          ,
          <article-title>Towards query performance prediction for neural information retrieval: Challenges and opportunities</article-title>
          , in: M.
          <string-name>
            <surname>Yoshioka</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kiseleva</surname>
          </string-name>
          , M. Aliannejadi (Eds.),
          <source>Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval</source>
          ,
          <string-name>
            <surname>ICTIR</surname>
          </string-name>
          <year>2023</year>
          , Taipei, Taiwan, 23
          <source>July</source>
          <year>2023</year>
          , ACM,
          <year>2023</year>
          , pp.
          <fpage>51</fpage>
          -
          <lpage>63</lpage>
          . URL: https://doi.org/10.1145/3578337.3605142. doi:
          <volume>10</volume>
          .1145/3578337.3605142.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Datta</surname>
          </string-name>
          , S. MacAvaney,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ganguly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Greene</surname>
          </string-name>
          ,
          <article-title>A 'pointwise-query, listwise-document' based query performance prediction approach</article-title>
          ,
          <source>in: Proceedings of 45th international ACM SIGIR conference research development in information retrieval</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>2148</fpage>
          --
          <lpage>2153</lpage>
          . URL: https: //doi.org/10.1145/3477495.3531821. doi:
          <volume>10</volume>
          .1145/3477495.3531821.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Datta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ganguly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Greene</surname>
          </string-name>
          ,
          <article-title>A relative information gain-based query performance prediction framework with generated query variants</article-title>
          ,
          <source>ACM Trans. Inf. Syst</source>
          .
          <volume>41</volume>
          (
          <year>2023</year>
          )
          <volume>38</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          :
          <fpage>31</fpage>
          . URL: https://doi.org/10.1145/3545112. doi:
          <volume>10</volume>
          .1145/3545112.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Arabzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. H.</given-names>
            <surname>Rad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Khodabakhsh</surname>
          </string-name>
          , E. Bagheri,
          <article-title>Noisy perturbations for estimating query dificulty in dense retrievers</article-title>
          , in: I.
          <string-name>
            <surname>Frommholz</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Hopfgartner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Oakes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Lalmas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , R. L. T. Santos (Eds.),
          <source>Proceedings of the 32nd ACM International Conference on Information and Knowledge Management</source>
          ,
          <string-name>
            <surname>CIKM</surname>
          </string-name>
          <year>2023</year>
          , Birmingham, United Kingdom,
          <source>October 21-25</source>
          ,
          <year>2023</year>
          , ACM,
          <year>2023</year>
          , pp.
          <fpage>3722</fpage>
          -
          <lpage>3727</lpage>
          . URL: https://doi.org/10.1145/3583780.3615270. doi:
          <volume>10</volume>
          . 1145/3583780.3615270.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Faggioli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Perego</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tonellotto</surname>
          </string-name>
          ,
          <article-title>Dimension importance estimation for dense information retrieval</article-title>
          ,
          <source>in: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24)</source>
          ,
          <source>July 14-18</source>
          ,
          <year>2024</year>
          , Washington, DC, USA, ACM,
          <year>2024</year>
          . URL: https://doi.org/10.1145/3626772.3657691. doi:
          <volume>10</volume>
          .1145/3626772.3657691.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Faggioli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Perego</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tonellotto</surname>
          </string-name>
          ,
          <article-title>Codime: a counterfactual approach for dimension importance estimation through click logs</article-title>
          ,
          <source>in: SIGIR '25: The 48th International ACM SIGIR Conference on Research and Development in Information Retrieval July 13-18</source>
          ,
          <year>2025</year>
          , ACM,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>G.</given-names>
            <surname>Izacard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Caron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hosseini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , E. Grave,
          <article-title>Towards unsupervised dense information retrieval with contrastive learning</article-title>
          ,
          <source>CoRR abs/2112</source>
          .09118 (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/2112.09118. arXiv:
          <volume>2112</volume>
          .
          <fpage>09118</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hofstätter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          ,
          <article-title>Eficiently teaching an efective dense retriever with balanced topic aware sampling</article-title>
          , in: F. Diaz,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Suel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Castells</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jones</surname>
          </string-name>
          , T. Sakai (Eds.),
          <source>SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , Virtual Event, Canada,
          <source>July 11-15</source>
          ,
          <year>2021</year>
          , ACM,
          <year>2021</year>
          , pp.
          <fpage>113</fpage>
          -
          <lpage>122</lpage>
          . URL: https://doi.org/10.1145/3404835.3462891. doi:
          <volume>10</volume>
          .1145/3404835.3462891.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Yilmaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Campos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Voorhees</surname>
          </string-name>
          ,
          <article-title>Overview of the TREC 2019 deep learning track</article-title>
          , CoRR abs/
          <year>2003</year>
          .07820 (
          <year>2020</year>
          ). URL: https://arxiv.org/abs/
          <year>2003</year>
          .07820. arXiv:
          <year>2003</year>
          .07820.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Yilmaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Campos</surname>
          </string-name>
          ,
          <article-title>Overview of the TREC 2020 deep learning track</article-title>
          ,
          <source>CoRR abs/2102</source>
          .07662 (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/2102.07662. arXiv:
          <volume>2102</volume>
          .
          <fpage>07662</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Cronen-Townsend</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , W. B.
          <string-name>
            <surname>Croft</surname>
          </string-name>
          ,
          <article-title>Predicting query performance</article-title>
          , in: K. Järvelin,
          <string-name>
            <given-names>M.</given-names>
            <surname>Beaulieu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Baeza-Yates</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          Myaeng (Eds.),
          <source>SIGIR 2002: Proceedings of the 25th Annual</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>