<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Importance Estimation⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giulio D'Erasmo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Trappolini</string-name>
          <email>trappolini@diag.uniroma1.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Tonellotto</string-name>
          <email>nicola.tonellotto@unipi.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabrizio Silvestri</string-name>
          <email>fsilvestri@diag.uniroma1.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Sapienza University of Rome</institution>
          ,
          <addr-line>Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Pisa</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recent advances in Information Retrieval (IR) have utilized high-dimensional embedding spaces to enhance the retrieval of relevant documents. The Manifold Clustering Hypothesis suggests that, although document embeddings are high-dimensional, the documents relevant to a specific query lie on a lower-dimensional manifold that depends on the query. This idea has motivated new retrieval methods, but current approaches still find it hard to clearly separate relevant signals from irrelevant noise. To address this issue, we present a new method called Eclipse, which uses information from both relevant and non-relevant documents. Our method calculates a centroid from the non-relevant documents and uses it as a reference to detect and estimate noisy dimensions in the relevant ones, leading to better retrieval results. Extensive experiments on three in-domain and one out-of-domain benchmarks demonstrate an average improvement of up to 21.03% (resp. 22.88%) in mAP(AP) and 12.04% (resp. 14.18%) in nDCG@10 w.r.t. the DIME-based baseline (resp. the baseline using all dimensions). Our results pave the way for more robust, pseudo-irrelevance-based retrieval systems in future IR research. We make the code available on Github1.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Dimension Importance Estimation</kwd>
        <kwd>Relevance Feedback</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Dense retrieval models [
        <xref ref-type="bibr" rid="ref12">17, 12, 18</xref>
        ] embed queries and documents into a latent space with many
dimensions, where vector similarities capture nuanced semantic relationships [19, 20]. However, while
some dimensions encode meaningful semantic distinctions, others may introduce noise or contain
non-discriminative information [
        <xref ref-type="bibr" rid="ref1 ref4 ref7">7, 1, 4</xref>
        ]. To address this issue, Dimension Importance Estimation
(DIME) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] was developed to identify and retain only the most informative dimensions, aiming to
enhance retrieval performance by filtering out those that either contribute little or mostly capture noise
[
        <xref ref-type="bibr" rid="ref2 ref8">2, 8, 21</xref>
        ]. Although DIME emphasizes relevant dimensions, the impact of irrelevant dimensions-those
that add noise or non-discriminative information-remains largely unexplored. Existing methods, such as
Rocchio’s algorithm [26], show that improving a query involves adjusting it to be more centered around
relevant documents, while also making it as far away as possible from irrelevant documents. We identify
that explicitly modeling both relevant and irrelevant feedback can significantly improve dimension
selection, thus improving dense retrieval performance. We introduce Eclipse, a novel method that
utilizes representations of both relevant and irrelevant documents to more accurately identify important
dimensions. In this paper, we explore how leveraging non-relevant documents through irrelevant
feedback can improve state-of-the-art DIME approaches. We evaluate ECLIPSE across state-of-the-art
TREC collections (Deep Learning 2019 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], 2020 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], DL-HARD 2021 [22], and Robust 2004 [28]),
demonstrating improvements of up to 21.03% (resp. 22.88%) in mAP(AP) and 12.04% (resp. 14.18%)
in nDCG@10 w.r.t the DIME-based baseline (resp. the baseline using all dimensions).
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Preliminaries</title>
      <p>In this section, we begin by outlining the classical Relevance Feedback model introduced by Rocchio
[26], followed by a comprehensive overview of the Dimension Importance Estimation paradigm.</p>
      <p>Rocchio. Rocchio’s algorithm is a foundational method in information retrieval, refining query
vectors by pulling them toward relevant documents and pushing them away from irrelevant ones. As
modern IR systems rely on high-dimensional embeddings, moving beyond traditional vector space
models requires exploring how to identify an optimal subset of query dimensions, rather than solely
optimizing entire query vectors.</p>
      <p>
        Dimension Importance Estimation (DIME). Faggioli et al. suggest that queries and documents
exist in a lower-dimensional, query-dependent subspace of their high-dimensional latent space R. By
projecting embeddings onto this subspace, a dense IR system can retain only the most informative
dimensions for distinguishing relevance. DIMEs assign importance scores to dimensions using a
querydependent function. This score allows the system to rank the dimensions, retaining those with higher
scores and discarding the less important ones. The selected dimensions thus form a low-dimensional,
query-dependent subspace of R. Two methods for estimating the importance of dimensions are PRF
DIME and LLM DIME. The PRF DIME method utilizes pseudo-relevance feedback by assuming that the
top- documents retrieved by a similarity measure, such as BM25 [25], are likely relevant to the query
[26, 30]. These documents are combined into a centroid vector p used to captures the alignment to the
query q, helping to rank and select the most relevant dimensions. LLM DIME, on the other hand, uses
a synthetic document a, generated by an LLM [
        <xref ref-type="bibr" rid="ref12 ref3 ref5">12, 24, 16, 23, 3, 5, 27</xref>
        ], assumed to be relevant to the
query.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Our Method: Eclipse</title>
      <p>In this section we introduce Eclipse, a novel framework designed to improve dense vector retrieval by
including non-relevant documents in the decision-making of dimension importance estimation.</p>
      <p>Formally, for a given query q ∈ R, which is embedded in a latent space using a bi-encoder, we follow
the same procedure as in DIME to retrieve a set of  documents from the corpus. These documents are
ranked using similarity measures such as cosine similarity or inner product. This set of documents,
denoted as  = {d1, d2, . . . , d}, contains pseudo-relevant documents, whose content captures
mainly relevant information and typically found at the top positions, and potentially pseudo-irrelevant
documents at the bottom positions, whose content captures mainly irrelevant information. Now, fixing
a parameter 0 &lt; − &lt; , we can define pseudo-irrelevant feedback by aggregating the embeddings of
the bottom − documents in  into an irrelevant representative embedding p as:
p =
−
1 − − 1
∑︁ d− .</p>
      <p>=0</p>
      <p>We define Eclipse as a weighted diference between a pseudo-relevant representative embedding p*
and the irrelevant representative embedding p as:
* () =  (q · p* ) −  (q · p).
(1)</p>
      <p>In Eq. (1), the embedding p* depends on the original DIME used to compute the relevant signal. This
formulation allows for the extension of any framework of DIME. Using pseudo-relevant feedback we
can instantiate the vector p  by aggregating the top 0 &lt; + &lt;  − − document embeddings from
1 ∑︀+
 as: p  = + =1 d.</p>
      <p>We can also instantiate an LLM-based approach using the following pipeline: (1) Zero-shot
prompting an LLM using the query ; (2) Use an encoder to embed the generated text into a latent vector
representation a ∈ R; (3) Set p = a.
Efectiveness metrics of our methods Eclipse (  ,  ) and baselines on diferent query sets and
biencoders. In bold, the best performance observed for each triple IR system, test collection, and evaluation
measure. Superscripts a and b indicate that the result is statistically significantly (p &lt; 0.05) better than Baseline
or standard DIMEs, respectively.</p>
      <p>AP</p>
      <p>AP</p>
      <p>Retained 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1
ANCE
TAS-B
ANCE
TAS-B
....222254446738 ....222255679501 .2...22277657422aab ..2..22276654711aab .241 ....333379867507 ....433307992630a .4...33419800070ab .4...33309898242ab .375 ..2..22255546433aab ..2..22266657434aab .2...22266659216aab ..22..2266556197aabb .239 ....554411776669a ....554422896792a .5...45429199691aab ..5..55402191072aab .480
....222265631775b ..22..2278645213aabb ..22..2287659550aabb ..22..2277658906aabb .238 ....333386889655 ..4..44301088023aab .4...43420801887ab .4...43321993054aab .374 ..2..11128792979abb ..2..22230112649aab .2...22231123684aab ..2..22232122282aab .197 ....444442130883b ....444445646662aa ....444456641798aa ..4..44475648578aa .428
DL ’20
RB ’04</p>
      <sec id="sec-3-1">
        <title>Lastly, the parameters ,</title>
        <p>∈ R control the balance between the relevant and irrelevant document
signals. Rather than using a convex combination, we apply independent weighting to each term. This
method provides greater flexibility and demonstrates superior performance in our experiments.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Setup</title>
      <p>
        In our experiments, we compare our proposed Eclipse against the state-of-the-art DIMEs for dense IR
systems. We experiment with three dense retrieval models: ANCE [29], Contriever [16], and TAS-B
[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], all of which have been fine-tuned using the MS MARCO [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] passage dataset.
      </p>
      <p>
        Datasets. We evaluate our methodology on three widely used benchmark collections for in-domain
evaluation: TREC Deep Learning 2019 (DL ’19) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], TREC Deep Learning 2020 (DL ’20) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and Deep
Learning Hard (DL HD) [22]. To assess the robustness we further evaluate Eclipse on out-of-domain
data based on the TREC Robust ’04 (RB ’04) collection [28]. We evaluate the systems using standard
metrics such as mean Average Precision (AP) and nDCG@10.
      </p>
      <p>Hyperparameters. We define four primary hyperparameters that influence diferent aspects of
the model’s decision-making process: +, − ,  , and  . The parameter + ∈ {1, . . . , 10} (resp. − ∈
{1, . . . , 14}) determines the number of relevant (resp. irrelevant) documents, used to build our
pseudorelevance embeddings. The hyperparameter  controls the strength of the relevant representative
embedding, while  modulates the denoising efect of the irrelevant representative embedding. Both
are positive values increasing linearly from 0.1 up to 1. For combinations where  =  we test the base
case of  =  = 1.</p>
      <p>
        Baselines. We compare our method to standard DIMEs, PRF DIME and LLM DIME. We use GPT4
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] as LLM in our experiments. We will refer to the dense IR system at full dimensionality as Baseline.
All the DIMEs, including Eclipse version, use a retrieved collection of documents  of size 1, 000.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments</title>
      <p>In our experiments, we investigate the following research questions: RQ1: Can non-relevant documents
be leveraged using irrelevant feedback to improve state-of-the-art DIME approaches? RQ2: Are metrics
of the retrieval pipeline impacted diferently by nonrelevant results when used for dimension importance
estimation?
Results for RQ1: Table 1 compare both versions of Eclipse with standard DIMEs (PRF and LLM) on
the TREC DL ’19, DL ’20, DH, and RB ’04 datasets, using the ANCE, Contriever, and TAS-B models.
We report the performance using the best configuration for all the DIMEs (standards and Eclipse ) in
the table. The most interesting results is over ANCE, where Eclipsereduce the percentage of retained
dimensions needed to surpass the baseline when using all the dimensions to just 40-60%, demonstrating
that explicitly modeling both positive and negative feedback in the DIME framework yields a robust
improvement. The gains are especially notable, with improvements of 21.03% in AP and 12.04% in
nDCG@10 relative to DIMEs, and even higher margins over the standard baseline: 22.88% (AP) and
14.18% (nDCG@10).</p>
      <p>Eclipse exhibits superior performance in the traditional evaluation protocol, improving performance
up to 21.03% (resp. 22.88%) in AP and 12.04% (resp. 14.18%) in nDCG@10 w.r.t. the DIME-based
baseline (resp. the baseline using all dimensions). In particular, both PRF Eclipse and LLM Eclipse
show statistically significant improvement with respect to their DIME counterparts and Baseline.</p>
      <p>Results for RQ2: To understand how the presence of nonrelevant documents in the dimension
importance estimation pipeline afects diferent aspects of the retrieval pipeline, we analyzed the recall
performance of LLM Eclipse compared the standard LLM DIME. Table 2 demonstrates that LLM Eclipse
achieves consistent recall improvements over LLM DIME across multiple datasets and bi-encoders, with
the most notable gains observed for low and medium relevance documents. This efect is especially
pronounced in the DL collections, where recall increases of up to 16.91% are observed for marginally
relevant documents. As a result, this explain why LLM Eclipse yields a larger boost in AP, which is
sensitive to recall across all relevance levels. In contrast, improvements in nDCG@10 are more modest,
reflecting the smaller gains for highly relevant documents that dominate the top-ranked results.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>We present Eclipse, a novel method designed to enhance dense retrieval by exploiting pseudo-irrelevant
feedback. This approach ofers improved separation between relevant and non-relevant dimensions
within document embeddings. Unlike conventional DIME methods that rely solely on relevance signals,
Eclipse introduces a contrastive perspective by utilizing irrelevant documents.</p>
      <p>Eclipse achieves an average improvement of up to 21.03% (and 22.88% for AP) and 12.04% (and 14.18%
for nDCG@10) compared to the DIME-based baseline (and the baseline using all dimensions).</p>
      <p>By emphasizing relevant embedding dimensions, Eclipse promotes moderately relevant documents
within the ranking, leading to marked gains in AP. Future research should focus on predicting a unique
percentage of retained dimensions for each queries. Another unexplored section is the use of irrelevant
documents generated by LLMs as a substitute for human-generated documents.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <sec id="sec-7-1">
        <title>During the preparation of this work, the author did not use any AI tool.</title>
        <p>[16] Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand
Joulin, and Edouard Grave. Unsupervised dense information retrieval with contrastive learning.
arXiv preprint arXiv:2112.09118, 2021.
[17] Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi
Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. In Bonnie
Webber, Trevor Cohn, Yulan He, and Yang Liu, editors, Proceedings of the 2020 Conference on
Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online, November
2020. Association for Computational Linguistics.
[18] Omar Khattab and Matei Zaharia. Colbert: Eficient and efective passage search via contextualized
late interaction over bert. In Proceedings of the 43rd International ACM SIGIR Conference on
Research and Development in Information Retrieval, SIGIR ’20, page 39–48, New York, NY, USA,
2020. Association for Computing Machinery.
[19] Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Mingjie Li, Wenjie Zhang, and Xuemin Lin.
Approximate nearest neighbor search on high dimensional data — experiments, analyses, and improvement.</p>
        <p>IEEE Transactions on Knowledge and Data Engineering, 32(8):1475–1488, 2020.
[20] Yi Luan, Jacob Eisenstein, Kristina Toutanova, and Michael Collins. Sparse, Dense, and Attentional
Representations for Text Retrieval. Transactions of the Association for Computational Linguistics,
9:329–345, 04 2021.
[21] Xueguang Ma, Minghan Li, Kai Sun, Ji Xin, and Jimmy Lin. Simple and efective unsupervised
redundancy elimination to compress dense vectors for passage retrieval. In Marie-Francine Moens,
Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors, Proceedings of the 2021 Conference
on Empirical Methods in Natural Language Processing, pages 2854–2859, Online and Punta Cana,
Dominican Republic, November 2021. Association for Computational Linguistics.
[22] Iain Mackie, Jefrey Dalton, and Andrew Yates. How deep is your learning: the dl-hard annotated
deep learning dataset. In Proceedings of the 44th International ACM SIGIR Conference on Research
and Development in Information Retrieval, SIGIR ’21, page 2335–2341, New York, NY, USA, 2021.</p>
        <p>Association for Computing Machinery.
[23] Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving language
understanding by generative pre-training. Technical report, 2018.
[24] N Reimers. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint
arXiv:1908.10084, 2019.
[25] Stephen Robertson and Hugo Zaragoza. The probabilistic relevance framework: Bm25 and beyond.</p>
        <p>Found. Trends Inf. Retr., 3(4):333–389, April 2009.
[26] J.J. Rocchio. Relevance Feedback in Information Retrieval. Prentice Hall, Englewood Clifs, New</p>
        <p>Jersey, 1971.
[27] Gabriele Tolomei, Cesare Campagnano, Fabrizio Silvestri, and Giovanni Trappolini. Prompt-to-os
(p2os): revolutionizing operating systems and human-computer interaction with integrated ai
generative models. In 2023 IEEE 5th International Conference on Cognitive Machine Intelligence
(CogMI), pages 128–134. IEEE, 2023.
[28] Ellen M. Voorhees. Overview of the trec 2004 robust track. In Proceedings of the Thirteenth
Text REtrieval Conference (TREC 2004), Gaithersburg, MD, 2004. NIST Special Publication 500-261,
National Institute of Standards and Technology (NIST).
[29] Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul N. Bennett, Junaid Ahmed,
and Arnold Overwijk. Approximate nearest neighbor negative contrastive learning for dense text
retrieval. In International Conference on Learning Representations, 2021.
[30] Jinxi Xu and W. Bruce Croft. Improving the efectiveness of information retrieval with local
context analysis. ACM Trans. Inf. Syst., 18(1):79–112, January 2000.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Khetam</given-names>
            <surname>Al</surname>
          </string-name>
          <string-name>
            <surname>Sharou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Zhenhao</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Lucia</given-names>
            <surname>Specia</surname>
          </string-name>
          .
          <article-title>Towards a better understanding of noise in natural language processing</article-title>
          .
          <source>In Ruslan Mitkov and Galia Angelova</source>
          , editors,
          <source>Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP</source>
          <year>2021</year>
          ), pages
          <fpage>53</fpage>
          -
          <lpage>62</lpage>
          ,
          <string-name>
            <surname>Held</surname>
            <given-names>Online</given-names>
          </string-name>
          ,
          <year>September 2021</year>
          . INCOMA Ltd.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>[2] Sileye 0</article-title>
          .
          <string-name>
            <surname>Ba</surname>
          </string-name>
          .
          <article-title>Discovering topics with neural topic models built from plsa assumptions</article-title>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Bacciu</surname>
          </string-name>
          , Cesare Campagnano, Giovanni Trappolini, and
          <string-name>
            <given-names>Fabrizio</given-names>
            <surname>Silvestri</surname>
          </string-name>
          .
          <article-title>Dantellm: Let's push italian llm research forward! In Proceedings of the 2024 Joint international conference on computational linguistics, language resources and evaluation (LREC-COLING</article-title>
          <year>2024</year>
          ), pages
          <fpage>4343</fpage>
          -
          <lpage>4355</lpage>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Bacciu</surname>
          </string-name>
          , Florin Cuconasu, Federico Siciliano, Fabrizio Silvestri, Nicola Tonellotto, and
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Trappolini</surname>
          </string-name>
          . Rraml:
          <article-title>Reinforced retrieval augmented machine learning</article-title>
          . volume
          <volume>3537</volume>
          , page 29 -
          <fpage>37</fpage>
          ,
          <year>2023</year>
          . Cited by:
          <fpage>7</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Bacciu</surname>
          </string-name>
          , Giovanni Trappolini, Andrea Santilli, Emanuele Rodolà, and
          <string-name>
            <given-names>Fabrizio</given-names>
            <surname>Silvestri</surname>
          </string-name>
          .
          <article-title>Fauno: The italian large language model that will leave you senza parole</article-title>
          ! volume
          <volume>3448</volume>
          , page 9 -
          <fpage>17</fpage>
          ,
          <year>2023</year>
          . Cited by:
          <fpage>7</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Payal</given-names>
            <surname>Bajaj</surname>
          </string-name>
          , Daniel Campos, Nick Craswell, Li Deng,
          <string-name>
            <given-names>Jianfeng</given-names>
            <surname>Gao</surname>
          </string-name>
          , and Xiaodong Liu et al.
          <article-title>Ms marco: A human generated machine reading comprehension dataset</article-title>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          , Aaron Courville, and
          <string-name>
            <given-names>Pascal</given-names>
            <surname>Vincent</surname>
          </string-name>
          .
          <article-title>Representation learning: A review and new perspectives</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          ,
          <volume>35</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1798</fpage>
          -
          <lpage>1828</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Happy</given-names>
            <surname>Buzaaba</surname>
          </string-name>
          and
          <string-name>
            <given-names>Toshiyuki</given-names>
            <surname>Amagasa</surname>
          </string-name>
          .
          <article-title>A scheme for eficient question answering with low dimension reconstructed embeddings</article-title>
          .
          <source>In The 23rd International Conference on Information Integration and Web Intelligence</source>
          , iiWAS2021, page
          <fpage>303</fpage>
          -
          <lpage>310</lpage>
          , New York, NY, USA,
          <year>2022</year>
          .
          <article-title>Association for Computing Machinery</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Nick</given-names>
            <surname>Craswell</surname>
          </string-name>
          , Bhaskar Mitra, Emine Yilmaz, and Daniel Campos.
          <article-title>Overview of the trec 2020 deep learning track</article-title>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Nick</surname>
            <given-names>Craswell</given-names>
          </string-name>
          , Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and
          <string-name>
            <surname>Ellen</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Voorhees</surname>
          </string-name>
          .
          <article-title>Overview of the trec 2019 deep learning track</article-title>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Giulio D'Erasmo</surname>
            , Giovanni Trappolini, Fabrizio Silvestri, and
            <given-names>Nicola</given-names>
          </string-name>
          <string-name>
            <surname>Tonellotto</surname>
          </string-name>
          . Eclipse:
          <article-title>Contrastive dimension importance estimation with pseudo-irrelevance feedback for dense retrieval</article-title>
          .
          <source>In Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts</source>
          and
          <article-title>Theories in Information Retrieval (ICTIR)</article-title>
          ,
          <source>ICTIR '25, page 147-154</source>
          , New York, NY, USA,
          <year>2025</year>
          .
          <article-title>Association for Computing Machinery</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Jacob</surname>
            <given-names>Devlin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          . BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In Jill Burstein</source>
          , Christy Doran, and Thamar Solorio, editors,
          <source>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers), pages
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          , Minneapolis, Minnesota,
          <year>June 2019</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Josh</given-names>
            <surname>Achiam</surname>
          </string-name>
          et al.
          <source>Gpt-4 technical report</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Guglielmo</surname>
            <given-names>Faggioli</given-names>
          </string-name>
          , Nicola Ferro, Rafaele Perego, and
          <string-name>
            <given-names>Nicola</given-names>
            <surname>Tonellotto</surname>
          </string-name>
          .
          <article-title>Dimension importance estimation for dense information retrieval</article-title>
          .
          <source>In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '24, page 1318-1328</source>
          , New York, NY, USA,
          <year>2024</year>
          .
          <article-title>Association for Computing Machinery</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Hofstätter</surname>
          </string-name>
          ,
          <string-name>
            <surname>Sheng-Chieh</surname>
            <given-names>Lin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jheng-Hong</surname>
            <given-names>Yang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Jimmy</given-names>
            <surname>Lin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Allan</given-names>
            <surname>Hanbury</surname>
          </string-name>
          .
          <article-title>Eficiently teaching an efective dense retriever with balanced topic aware sampling</article-title>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>