<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>IIR</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Counterfactual Dimension Importance Estimation (CoDIME) for Dense Information Retrieval‹</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Guglielmo Faggioli</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Ferro</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafaele Perego</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Tonellotto</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ISTI-CNR</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Padova</institution>
          ,
          <addr-line>Padova</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Pisa</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>15</volume>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Contextual dense representation models have revolutionized text processing by providing deeper semantic insights and enhancing Information Retrieval (IR) capabilities. These models represent text within a latent space, where shared underlying concepts are encoded beyond the explicit wording of the text. Nevertheless, previous studies indicate that certain dimensions within these dense embeddings can be irrelevant-or even detrimental-to retrieval success, depending on the specific information needs of the query. Limiting retrieval to a linear subspace that omits these less useful dimensions has been shown to improve performance. To tackle this issue, Dimension IMportance Estimators (DIMEs) were introduced to detect and remove harmful dimensions, thereby refining the representations of queries and documents to highlight only the valuable aspects. Current DIMEs mostly rely on pseudo-relevance feedback, which can be unreliable, or on explicit relevance judgments, which are often impractical to gather. Drawing inspiration from counterfactual analysis, we present Counterfactual DIMEs (CoDIMEs), a new technique that leverages noisy implicit feedback to assess the significance of each dimension. The CoDIME framework approximates the connection between how frequently a document is clicked and its alignment with particular query dimensions through a linear model. Empirical evidence demonstrates that CoDIME consistently outperforms traditional pseudorelevance feedback-based DIMEs and other unsupervised counterfactual methods that make use of implicit signals.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Dense text representations have demonstrated remarkable capability in capturing semantic meaning,
emerging as the dominant technology across numerous text-related tasks in Information Retrieval (IR)
and Natural Language Processing (NLP). These representation models are based on neural networks
that project the text onto a dense representation space where semantically similar contents tend to be
arranged closely. While these novel representations are more efective than traditional lexical approaches
(e.g., BM25 [16] and TF-IDF [10]) in handling the semantic gap, they are far less interpretable, even if
the dimensions of the representations are assumed to be associated with some latent semantic meaning.
Starting from this, Faggioli et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] propose the so-called Manifold Clustering Hypothesis which posits
that it is possible to find a query-wise subspace of the dense representation space where the retrieval
is more efective, i.e., where the representations of the query and its relevant documents are more aligned.
to find such a subspace, Faggioli et al. define the concept of Dimension IMportance Estimator ( DIME): a
model explicitly meant to estimate the query-dependent importance of each dimension to preserve only
the most important ones while discarding the others. In particular, Faggioli et al. propose DIMEs based on
Pseudo-Relevance Feedback (PRF) or relying on explicit feedback. The former is known to have variable
and not always consistent efectiveness, especially when it comes to dense models [ 12]. The latter, on the
other hand, can be much more challenging to gather. To overcome this limitation, in this paper, we propose
to employ an intermediate relevance signal, more reliable than PRF and far more available than explicit
feedback: implicit feedback. Implicit feedback leverages the analysis of user interactions, such as clicks
and dwell times, to infer weak relevance signals for retrieved content.Akin to past eforts in the domain,
we employ a set of simulated click logs to estimate the frequency of clicks on the links in a Search Engine
Result Page (SERP). By relying on such click frequencies, we devise a counterfactual modelling of the click
probabilities. This model is then used as a source of implicit feedback information, and we exploit it to
determine the importance of the dimensions in the dense representation space, thus instantiating a set of
novel Counterfactual DIMEs (CoDIMEs). In particular, we design a set of linear CoDIMEs that quantifies
the importance of a dimension by considering the characteristics of a linear model that regresses the
documents’ click frequency on the interaction between the query and the documents on such a dimension.
      </p>
      <p>Compared to CoRocchio [22], a state-of-the-art counterfactual approach, the CoDIME framework
achieves up to +0.235 nDCG@10 points, moving from 0.404 to 0.639 (+58%) (Dragon and Robust ‘04)
and +0.117 nDCG@100 points, moving from 0.356 to 0.473 (+33%) (Dragon and Robust ‘04).</p>
    </sec>
    <sec id="sec-2">
      <title>2. The CoDIME Framework</title>
      <p>
        We refer to the list of the top  documents retrieved in response to a query  as ℛ “ t1,...,u where  is
the i-th retrieved document. All users  who submitted the query  interact with ℛ, each one generating
a click log  such that , is either 1 or 0, depending on whether the user clicked on document  P ℛ or
not. Based on this historical information, we can compute the observed frequency of clicks of  for  and
 as: ˆ  “ |1| řP ,. In other terms, ˆ  describes the proportion of clicks received by a document  P ℛ
retrieved in response to . We can expect this frequency to be somewhat correlated with the relevance of
the top  documents retrieved in response to , but also with the position at which the document is shown
in the ℛ. Akin to CoRocchio, we debias click frequencies using the Inverse Propensity Score (IPS) [
        <xref ref-type="bibr" rid="ref6">6, 9</xref>
        ].
The debiased click frequency is thus defined as:  “ ˆ  ¨ p1{q´ Where  is the position at which
the document  was observed in ℛ and  is the propensity parameter. The debiased click frequencies
describe how likely it is that a document is clicked, regardless of where it is placed in the SERP. Finally,
we define p1 ,..., q as the list of debiased click frequencies of the top  documents for , included in ℛ.
      </p>
      <p>Given a query  and a document  and their respective dense representations q and d, we can define
the interaction  between them as the Hadamard product between their representations. Assuming
each dimension corresponds to a latent concept, observing strong interaction , between the query and
the document on the -th dimension indicates that the concept is prime for the query and the document.</p>
      <p>
        Linear CoDIME estimate the importance of a dimension for a query by examining the linear correlation
between the dimension’s interaction () and the debiased click frequency ( ) on the documents.
Correlation CoDIME The first linear CoDIME is inspired by the Oracle DIME as proposed by Faggioli
et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. More in detail, we define s and s as the mean debiased click frequency and the interaction
on the -th component for a given query and the corresponding retrieved documents, respectively.
Called  the Pearson’s correlation, the Correlation CoDIME, or CoDIME, is defined as follows:
 pq “  pp1 ,..., q,p1,,...,,qq.
      </p>
      <p>This CoDIME quantifies the linear correlation between the interactions on a given dimension and the
debiased click frequencies as their Pearson’s  correlation. If the interaction on a dimension between the
query and the documents aligns with the debiased click frequencies on such documents, the importance
will be 1, and the dimension likely belongs to the optimal subspace. If the interaction and the debiased
click probability are uncorrelated, the importance of the dimension will be zero. Finally, when the
interaction and the debiased click probability have a negative relation, the importance is negative, and
the dimension will likely be discarded.</p>
      <p>Slope CoDIME One of the major limitations of the Correlation CoDIME is that it cannot consider
how fast the interactions and the click frequencies tend to vary. In fact, the linear model that best fits
the points might be more or less steep. A steeper linear model indicates that the dimension is better
at separating the good and the bad documents. Vice-versa, if the linear model grows slowly, it is harder to
separate documents clicked often from rarely clicked documents. The value of the Correlation CoDIME
does not depend on such steepness, but only on how well a linear model fits the data. Therefore, we
propose a second linear CoDIME that explicitly quantifies the dimension’s importance based on the
slope of the linear model that best fits the data according to the Ordinary Least Square (OLS) approach.</p>
      <p>In more detail, let us call H P Rˆ2 a matrix such that its first column contains  1s and the second
column contains the values 1,,...,,. This is the regressor matrix, while we treat f “ r1 ,..., sJ
as the response variable. We fit a linear model using the OLS approach by computing b P R2:
b “ pHJHq´1HJf . Since we added a column of ones to the regressor matrix, the first element ,1
of b is the intercept of the OLS linear model while the second element ,2 of b is the slope.1 The
CoDIME is defined as:  pq “ ,2.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Evaluation</title>
      <p>
        Experimental Setup and Query Logs Simulation To assess the proposed counterfactual strategy we
consider three well-known state-of-the-art dense encoders: Contriever [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], TAS-B [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and Dragon [12],
ifne-tuned on MSMARCO. 2 Experiments are conducted on three well-known TREC collections: TREC
Robust 2004 (Robust ‘04) [19], TREC Deep Learning 2019 (DL ‘19) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and TREC Deep Learning 2020 (DL
‘20) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]3. The parameter  , describing the user’s patience in clicking a document of the SERP, is set to 1,
while the maximum depth of inspection is set to 20 documents unless specified diferently. Furthermore, the
experiments are conducted by repeating 1000 times for each topic the simulation of the click log. Diferently
from Faggioli et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], our CoDIMEs strategies choose the fraction  of representation dimension retained
by applying a 5-fold cross-validation on the validation set. The code and the data are publicly released.4
      </p>
      <p>
        The CoDIME approach is based on historical user feedback needed to instantiate the counterfactual
framework and estimate the click probabilities. In a real-world deployment with a consistent user base,
click logs are easy to collect and use for our purposes. Following the previous literature [
        <xref ref-type="bibr" rid="ref8">22, 8, 13, 15, 23, 21,
14, 9, 20</xref>
        ] on counterfactual implicit feedback and learning to rank, we simulate the interaction of the users
with the documents to generate a set of synthetic click logs. To do so, we need to simulate i) the selection
bias, ii) the position bias, and iii) the relevance bias. The selection bias is implemented by assuming that
every user interacts and inspects the SERP up to the document in position 1. To simulate the click propensity,
we model ˆ,, the probability of examination, as inversely proportional to the position, i.e., ˆpq “ ` 1 ˘ .
To simulate the relevance bias, we model ˆp,q, the probability that the user will click on a document ,
given its relevance to . More in detail, we consider three ideal user models: the perfect user (P) whose click
probability is directly proportional to the relevance of the document; the binarized user (B) that, clicks on a
non-relevant or partially relevant document with probability 0.1 and clicks on a relevant or highly relevant
document with probability 1; the near random user (R) that clicks on a non-relevant document with
probability 0.4 and clicks on a highly relevant document with probability 0.6. For the perfect and near-random
users, the probabilities are a linear spacing between the minimum and maximum probabilities, with as
many steps as the relevance grades. For the binarized user, the click probability of a document with
relevance within the lowest half of the grades is set to 0.1; otherwise, it is set to 1. The simulated click probability
is computed as: ˆp,,q “ ˆp,q¨ˆpq. In other terms, to simulate the click of a user on a document 
retrieved in position  in response to the query , we combine, by multiplying, the probability that the user
will click on such document given its relevance to the query (i.e., the relevance bias) and the probability that
the user will click on a document in position , regardless of its relevance (i.e., the position bias). We
consider the following baselines Vector PRF (VPRF) [11], LLM DIME [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], PRF DIME [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and CoRocchio [22].
Performance We report the comparison in terms of efectiveness between the diferent approaches.
Table 1 report the efectiveness of our solution and the competitors on DL ‘19. In bold, we report the
1We also experimented with a linear model without the intercept, obtaining slightly inferior empirical results.
2We use the model weights publicly available on https://huggingface.co/
3For space reasons, we report here only the results for DL ‘19. The interested reader can find the results for other collections, for
which we observe substantially similar patterns, in the original paper.
4https://github.com/guglielmof/25-SIGIR-FFPT
      </p>
      <p>R</p>
      <p>P</p>
      <p>TAS-B
B</p>
      <p>R</p>
      <p>P</p>
      <p>Contriever</p>
      <p>B</p>
      <p>R</p>
      <p>R</p>
      <p>P</p>
      <p>TAS-B</p>
      <p>B</p>
      <p>R
P</p>
      <p>Dragon</p>
      <p>B
retrieval only .674 .674 .674 .740 .740 .740 .717 .717 .717 .655 .655 .655 .726 .726 .726 .679 .679 .679</p>
      <p>VPRF-5 .664 .664 .664 .752 .752 .752 .721 .721 .721 .656 .656 .656 .743 .743 .743˚ .701 .701 .701
VPRF-20 .636 .636 .636 .732 .732 .732 .667 .667 .667 .643 .643 .643 .717 .717 .717 .651 .651 .651
DIME .742 .742 .742˚ .767 .767 .767 .749 .749 .749 .710 .710 .710 .746 .746 .746˚ .724 .724 .724˚
DIME .668 .668 .668 .740 .740 .740 .717 .717 .717 .655 .655 .655 .726 .726 .726 .682 .682 .682
CoRocchio .804˚ .766˚ .632 .830 .824˚ .724 .810˚ .780˚ .665 .761˚ .729˚ .634 .805˚ .796˚ .721 .771˚ .746˚ .642
CoDIME .851˚ .828˚ .810˚ .891˚ .854˚ .831˚ .856˚ .835˚ .804˚ .796˚ .786˚ .777˚ .830˚ .815˚ .792˚ .807˚ .781˚ .760˚
CoDIME .855˚ .829˚ .809˚ .897˚ .854˚ .842˚ .863˚ .839˚ .821˚ .796˚ .774˚ .770˚ .838˚ .817˚ .793˚ .821˚ .785˚ .775˚
near-random users in the 1-3 points range, up to 5 points in the worst scenarios. This is a desirable property:
in a real-world scenario, where the clicks are far more afected by noise than in a simulated environment,
a more stable solution as the Linear CoDIMEs ofers better guarantees of a good performance.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this work, we introduced CoDIME, a novel counterfactual framework for dimension importance
estimation in dense text representations, leveraging implicit user feedback to address challenges
in existing DIME approaches. By incorporating counterfactual modelling of click probabilities in
various dimension importance estimation strategies, our CoDIME approaches achieved state-of-the-art
performance in multiple dense IR testbeds. Compared to CoRocchio [22], a state-of-the-art counterfactual
approach, the CoDIME framework achieves up to +0.235 nDCG@10 points, moving from 0.404 to 0.639
(+58%) (Dragon and Robust ‘04) and +0.117 nDCG@100 points, moving from 0.356 to 0.473 (+33%)
(Dragon and Robust ‘04). These findings highlight the eficacy of counterfactual techniques and DIME
approaches in adapting dense representations and improving retrieval efectiveness.
This work is supported, in part, by the Spoke “FutureHPC &amp; BigData” of the ICSC – Centro Nazionale
di Ricerca in High-Performance Computing, Big Data and Quantum Computing, the Spoke
“Humancentered AI” of the M4C2 - Investimento 1.3, Partenariato Esteso PE00000013 - "FAIR - Future Artificial
Intelligence Research", the “Extreme Food Risk Analytics” (EFRA) project, Grant no. 101093026, funded
by European Union – NextGenerationEU, the FoReLab project (Departments of Excellence), the NEREO
PRIN project funded by the Italian Ministry of Education and Research Grant no. 2022AEFHAZ and
the CAMEO PRIN 2022 Project Grant no. 2022ZLL7MW.</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author did not use any AI tool.
[9] Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. Unbiased learning-to-rank with
biased feedback. In Maarten de Rijke, Milad Shokouhi, Andrew Tomkins, and Min Zhang,
editors, Proceedings of the Tenth ACM International Conference on Web Search and Data Mining,
WSDM 2017, Cambridge, United Kingdom, February 6-10, 2017, pages 781–789. ACM, 2017. doi:
10.1145/3018661.3018699. URL https://doi.org/10.1145/3018661.3018699.
[10] Karen Spärck Jones. A statistical interpretation of term specificity and its application in
retrieval. J. Documentation, 28(1):11–21, 1972. doi: 10.1108/00220410410560573. URL
https://doi.org/10.1108/00220410410560573.
[11] Hang Li, Ahmed Mourad, Shengyao Zhuang, Bevan Koopman, and Guido Zuccon. Pseudo relevance
feedback with deep language models and dense retrievers: Successes and pitfalls. ACM Trans. Inf.</p>
      <p>Syst., 41(3):62:1–62:40, 2023. doi: 10.1145/3570724. URL https://doi.org/10.1145/3570724.
[12] Sheng-Chieh Lin, Akari Asai, Minghan Li, Barlas Oguz, Jimmy Lin, Yashar Mehdad, Wen-tau
Yih, and Xilun Chen. How to train your dragon: Diverse augmentation towards generalizable
dense retrieval. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Findings of the Association
for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pages 6385–6400.
Association for Computational Linguistics, 2023. doi: 10.18653/V1/2023.FINDINGS-EMNLP.423.</p>
      <p>URL https://doi.org/10.18653/v1/2023.findings-emnlp.423.
[13] Harrie Oosterhuis and Maarten de Rijke. Diferentiable unbiased online learning to rank. In
Alfredo Cuzzocrea, James Allan, Norman W. Paton, Divesh Srivastava, Rakesh Agrawal, Andrei Z.
Broder, Mohammed J. Zaki, K. Selçuk Candan, Alexandros Labrinidis, Assaf Schuster, and Haixun
Wang, editors, Proceedings of the 27th ACM International Conference on Information and Knowledge
Management, CIKM 2018, Torino, Italy, October 22-26, 2018, pages 1293–1302. ACM, 2018. doi:
10.1145/3269206.3271686. URL https://doi.org/10.1145/3269206.3271686.
[14] Harrie Oosterhuis and Maarten de Rijke. Unifying online and counterfactual learning to
rank: A novel counterfactual estimator that efectively utilizes online interventions. In Liane
Lewin-Eytan, David Carmel, Elad Yom-Tov, Eugene Agichtein, and Evgeniy Gabrilovich, editors,
WSDM ’21, The Fourteenth ACM International Conference on Web Search and Data Mining, Virtual
Event, Israel, March 8-12, 2021, pages 463–471. ACM, 2021. doi: 10.1145/3437963.3441794. URL
https://doi.org/10.1145/3437963.3441794.
[15] Zohreh Ovaisi, Ragib Ahsan, Yifan Zhang, Kathryn Vasilaky, and Elena Zheleva. Correcting for
selection bias in learning-to-rank systems. In Yennun Huang, Irwin King, Tie-Yan Liu, and Maarten van
Steen, editors, WWW ’20: The Web Conference 2020, Taipei, Taiwan, April 20-24, 2020, pages 1863–1873.</p>
      <p>ACM / IW3C2, 2020. doi: 10.1145/3366423.3380255. URL https://doi.org/10.1145/3366423.3380255.
[16] Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford.</p>
      <p>Okapi at TREC-3. In Donna K. Harman, editor, Proceedings of The Third Text REtrieval Conference,
TREC 1994, Gaithersburg, Maryland, USA, November 2-4, 1994, volume 500-225 of NIST Special
Publication, pages 109–126. National Institute of Standards and Technology (NIST), 1994. URL
http://trec.nist.gov/pubs/trec3/papers/city.ps.gz.
[17] Henry Schefe. The analysis of variance. John Wiley &amp; Sons, 1959.
[18] John W. Tukey. Comparing individual means in the analysis of variance. Biometrics, 5(2):99–114,
1949. ISSN 0006341X, 15410420.
[19] Ellen M. Voorhees. Overview of the TREC 2004 robust track. In Ellen M. Voorhees
and Lori P. Buckland, editors, Proceedings of the Thirteenth Text REtrieval Conference,
TREC 2004, Gaithersburg, Maryland, USA, November 16-19, 2004, volume 500-261 of NIST
Special Publication. National Institute of Standards and Technology (NIST), 2004. URL
http://trec.nist.gov/pubs/trec13/papers/ROBUST.OVERVIEW.pdf.
[20] Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. Learning to rank with
selection bias in personal search. In Rafaele Perego, Fabrizio Sebastiani, Javed A. Aslam, Ian
Ruthven, and Justin Zobel, editors, Proceedings of the 39th International ACM SIGIR conference on
Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy, July 17-21, 2016, pages
115–124. ACM, 2016. doi: 10.1145/2911451.2911537. URL https://doi.org/10.1145/2911451.2911537.
[21] Shengyao Zhuang and Guido Zuccon. Counterfactual online learning to rank. In Joemon M.</p>
      <p>Jose, Emine Yilmaz, João Magalhães, Pablo Castells, Nicola Ferro, Mário J. Silva, and Flávio
Martins, editors, Advances in Information Retrieval - 42nd European Conference on IR Research,
ECIR 2020, Lisbon, Portugal, April 14-17, 2020, Proceedings, Part I, volume 12035 of Lecture Notes
in Computer Science, pages 415–430. Springer, 2020. doi: 10.1007/978-3-030-45439-5\_28. URL
https://doi.org/10.1007/978-3-030-45439-5_28.
[22] Shengyao Zhuang, Hang Li, and Guido Zuccon. Implicit feedback for dense passage retrieval: A
counterfactual approach. In Enrique Amigó, Pablo Castells, Julio Gonzalo, Ben Carterette, J. Shane
Culpepper, and Gabriella Kazai, editors, SIGIR ’22: The 45th International ACM SIGIR Conference
on Research and Development in Information Retrieval, Madrid, Spain, July 11 - 15, 2022, pages 18–28.</p>
      <p>ACM, 2022. doi: 10.1145/3477495.3531994. URL https://doi.org/10.1145/3477495.3531994.
[23] Shengyao Zhuang, Zhihao Qiao, and Guido Zuccon. Reinforcement online learning to rank with
unbiased reward shaping. Inf. Retr. J., 25(4):386–413, 2022. doi: 10.1007/S10791-022-09413-Y. URL
https://doi.org/10.1007/s10791-022-09413-y.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Nick</given-names>
            <surname>Craswell</surname>
          </string-name>
          , Bhaskar Mitra, Emine Yilmaz, and Daniel Campos.
          <article-title>Overview of the TREC 2020 deep learning track</article-title>
          . In Ellen M.
          <article-title>Voorhees</article-title>
          and Angela Ellis, editors,
          <source>Proceedings of the Twenty-Ninth Text REtrieval Conference</source>
          , TREC 2020,
          <string-name>
            <given-names>Virtual</given-names>
            <surname>Event</surname>
          </string-name>
          [Gaithersburg, Maryland, USA],
          <source>November 16-20</source>
          ,
          <year>2020</year>
          , volume
          <volume>1266</volume>
          of NIST Special Publication.
          <source>National Institute of Standards and Technology (NIST)</source>
          ,
          <year>2020</year>
          . URL https://trec.nist.gov/pubs/trec29/papers/OVERVIEW.DL.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Nick</given-names>
            <surname>Craswell</surname>
          </string-name>
          , Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and
          <string-name>
            <surname>Ellen</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Voorhees</surname>
          </string-name>
          .
          <article-title>Overview of the TREC 2019 deep learning track</article-title>
          .
          <source>CoRR</source>
          , abs/
          <year>2003</year>
          .07820,
          <year>2020</year>
          . URL https://arxiv.org/abs/
          <year>2003</year>
          .07820.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Guglielmo</given-names>
            <surname>Faggioli</surname>
          </string-name>
          , Nicola Ferro, Rafaele Perego, and
          <string-name>
            <given-names>Nicola</given-names>
            <surname>Tonellotto</surname>
          </string-name>
          .
          <article-title>Dimension importance estimation for dense information retrieval</article-title>
          .
          <source>In Grace Hui Yang</source>
          ,
          <string-name>
            <surname>Hongning</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Sam Han, Claudia Hauf, Guido Zuccon, and Yi Zhang, editors,
          <source>Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024</source>
          ,
          <string-name>
            <surname>Washington</surname>
            <given-names>DC</given-names>
          </string-name>
          , USA, July
          <volume>14</volume>
          -
          <issue>18</issue>
          ,
          <year>2024</year>
          , pages
          <fpage>1318</fpage>
          -
          <lpage>1328</lpage>
          . ACM,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .1145/3626772.3657691. URL https://doi.org/10.1145/3626772.3657691.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Guglielmo</given-names>
            <surname>Faggioli</surname>
          </string-name>
          , Nicola Ferro, Rafaele Perego, and
          <string-name>
            <given-names>Nicola</given-names>
            <surname>Tonellotto</surname>
          </string-name>
          .
          <article-title>Codime: a counterfactual approach for dimension importance estimation through click logs</article-title>
          .
          <source>In SIGIR '25: The 48th International ACM SIGIR Conference on Research and Development in Information Retrieval July 13-18</source>
          ,
          <year>2025</year>
          . ACM,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Hofstätter</surname>
          </string-name>
          ,
          <string-name>
            <surname>Sheng-Chieh</surname>
            <given-names>Lin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jheng-Hong</surname>
            <given-names>Yang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Jimmy</given-names>
            <surname>Lin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Allan</given-names>
            <surname>Hanbury</surname>
          </string-name>
          .
          <article-title>Eficiently teaching an efective dense retriever with balanced topic aware sampling</article-title>
          . In Fernando Diaz, Chirag Shah, Torsten Suel, Pablo Castells, Rosie Jones, and Tetsuya Sakai, editors,
          <source>SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , Virtual Event, Canada,
          <source>July 11-15</source>
          ,
          <year>2021</year>
          , pages
          <fpage>113</fpage>
          -
          <lpage>122</lpage>
          . ACM,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .1145/3404835.3462891. URL https://doi.org/10.1145/3404835.3462891.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Horvitz</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Thompson</surname>
          </string-name>
          .
          <article-title>A generalization of sampling without replacement from a finite universe</article-title>
          .
          <source>Journal of the American Statistical Association</source>
          ,
          <volume>47</volume>
          (
          <issue>260</issue>
          ):
          <fpage>663</fpage>
          -
          <lpage>685</lpage>
          ,
          <year>1952</year>
          . ISSN 01621459,
          <year>1537274X</year>
          . URL http://www.jstor.org/stable/2280784.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Gautier</given-names>
            <surname>Izacard</surname>
          </string-name>
          , Mathilde Caron, Lucas Hosseini,
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Riedel</surname>
          </string-name>
          , Piotr Bojanowski, Armand Joulin, and
          <string-name>
            <given-names>Edouard</given-names>
            <surname>Grave</surname>
          </string-name>
          .
          <article-title>Towards unsupervised dense information retrieval with contrastive learning</article-title>
          .
          <source>CoRR, abs/2112.09118</source>
          ,
          <year>2021</year>
          . URL https://arxiv.org/abs/2112.09118.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Rolf</given-names>
            <surname>Jagerman</surname>
          </string-name>
          , Harrie Oosterhuis, and Maarten de Rijke.
          <article-title>To model or to intervene: A comparison of counterfactual and online learning to rank from user interactions</article-title>
          .
          <source>In Benjamin Piwowarski</source>
          , Max Chevalier, Éric Gaussier, Yoelle Maarek,
          <string-name>
            <surname>Jian-Yun Nie</surname>
          </string-name>
          , and Falk Scholer, editors,
          <source>Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <string-name>
            <surname>SIGIR</surname>
          </string-name>
          <year>2019</year>
          , Paris, France,
          <source>July 21-25</source>
          ,
          <year>2019</year>
          , pages
          <fpage>15</fpage>
          -
          <lpage>24</lpage>
          . ACM,
          <year>2019</year>
          . doi:
          <volume>10</volume>
          .1145/3331184.3331269. URL https://doi.org/10.1145/3331184.3331269.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>