<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Enhancing Fusion-in-Decoder for Multi-Granularity Ranking</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Haeju</forename><surname>Park</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">LG AI Research</orgName>
								<address>
									<country key="KR">Republic of Korea</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Kyungjae</forename><surname>Lee</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">LG AI Research</orgName>
								<address>
									<country key="KR">Republic of Korea</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sunghyun</forename><surname>Park</surname></persName>
							<affiliation key="aff2">
								<orgName type="institution">LG AI Research</orgName>
								<address>
									<country key="KR">Republic of Korea</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Moontae</forename><surname>Lee</surname></persName>
							<affiliation key="aff3">
								<orgName type="institution">LG AI Research</orgName>
								<address>
									<country key="KR">Republic of Korea</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Enhancing Fusion-in-Decoder for Multi-Granularity Ranking</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">AEC23638FD7028479DE81C6D21979ECF</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:09+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Information Systems</term>
					<term>Retrieval Augmented Generation</term>
					<term>Large Language Model Information Retrieval&apos;s Role in RAG Systems (IR-RAG) -2024</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Large Language Models (LLMs) have demonstrated exceptional performance across various natural language tasks, leveraging extensive knowledge from massive datasets. However, their reliance solely on parametric knowledge often leads to the generation of inaccurate or outdated content, particularly in domain-specific tasks. Retrieval Augmented Generation (RAG) has emerged as a promising approach to address this limitation by incorporating external knowledge without necessitating re-training. While RAG enhances the accuracy of LLM-generated content, effectively retrieving external knowledge remains a challenge due to potential noise and computational costs. To address this, traditional information retrieval systems adopt two-stage approaches, utilizing efficient retrievers followed by reranking mechanisms. Recently, transformer-based architectures, including BERT and T5 models, have shown promise as effective rerankers. However, such models have limited context size and only perform single-granularity ranking at a time, hindering their effectiveness and efficiency. In this paper, we first explore the existing rerankers such as RankT5 and RFiD, highlighting challenges in multi-granularity ranking. Subsequently, we introduce PFiD (Passage Fusion-in-Decoder), a simple yet efficient approach aimed at effectively ranking both document and passage simultaneously. Through empirical evaluation, we demonstrate the efficacy of PFiD in improving effectiveness and efficiency, offering a promising direction for further research in this domain.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Despite their remarkable capabilities and growth, Large Language Models (LLMs) <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4]</ref> still tend to generate factually incorrect or outdated content as their knowledge solely relies on their parametric knowledge, especially in domainspecific or knowledge-intensive tasks <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7]</ref>. Retrieval Augmented Generation (RAG) approaches <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b9">10,</ref><ref type="bibr" target="#b10">11]</ref> have gained significant attention, which improve the quality of LLM-generated output by grounding on external knowledge to supplement the LLMs' parametric knowledge, without having to re-train the LLMs. RAG leverages a powerful information retrieval model, which is designed to search large datasets or knowledge bases. The retrieved information is then incorporated into LLMs, enabling it to generate more accurate and contextually relevant content. By incorporating external knowledge, RAG can effectively reduce the problem of generating factually incorrect or outdated content in LLMs <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b12">13]</ref>.</p><p>However, current RAG frameworks have major challenges when it comes to the effectiveness and efficiency of information retrieval systems: First, LLMs tend to generate inaccurate responses on distracting (or noisy) contexts, thus the performance of retrieval models has a significant impact on the quality of RAG's responses <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15,</ref><ref type="bibr" target="#b10">11]</ref>. Second, the retrieval component of RAG requires searching through large-scale knowledge bases or the web, which can be computationally expensive and slow <ref type="bibr" target="#b10">[11]</ref>. Due to the above challenges, existing retrieval systems adopt two-stage approaches, an efficient first-stage retriever such as BM25 <ref type="bibr" target="#b15">[16]</ref> and DPR <ref type="bibr" target="#b16">[17]</ref> retrieves a set of documents from a larger dataset, and then a second-stage reranker is used to rerank retrieved documents for precise ranking. Recently, with the advent of transformer-based models such as BERT <ref type="bibr" target="#b17">[18]</ref> and T5 <ref type="bibr" target="#b18">[19]</ref>, more architectures including bi-encoder <ref type="bibr" target="#b16">[17]</ref>, crossencoder <ref type="bibr" target="#b19">[20]</ref>, encoder-decoder <ref type="bibr" target="#b20">[21,</ref><ref type="bibr" target="#b21">22]</ref>, and decoder-only models <ref type="bibr" target="#b22">[23]</ref>, have gradually shown their effectiveness as a reranker. However, these models have limited context size and only perform single-granularity ranking during inference, which hinders their effectiveness and efficiency in real-world RAG scenarios.</p><p>To this end, in this paper, we focus on the multigranularity ranking task, which ranks both document and passage simultaneously. Specifically, we first investigate the single-passage cross-encoder models such as MonoT5 <ref type="bibr" target="#b21">[22]</ref> and RankT5 <ref type="bibr" target="#b20">[21]</ref>. It achieves superior performance across various ranking tasks, but due to the constraint of input tokens, its efficiency is limited in real-world RAG scenarios. Next, we present the use of multi-passage cross-encoder, such as FiD <ref type="bibr" target="#b8">[9]</ref> and RFiD <ref type="bibr" target="#b23">[24]</ref>. These models alleviate the input tokens limit by leveraging multi-passage, but they directly use the cross-attention score of the decoder as a passage relevance, which is implicitly learned, and encounter difficulty with distinguishing relative differences between passages. Thereafter, we propose a simple and effective PFiD (Passage Fusion-in-Decoder) for multi-granularity ranking. PFiD extends the FiD model by generating a document-level relevance token, enabling both document retrieval and passage ranking. Furthermore, PFiD adopts the inter-passage attention mechanism to learn relative passage relevance explicitly, using the special tokens at the beginning of the input text to represent the entire context.</p><p>Experiments on MIRACL passage ranking dataset <ref type="bibr" target="#b24">[25]</ref> demonstrate that PFiD improves effectiveness and efficiency compared to existing approaches, especially in RAG scenarios.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Preliminaries</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Task definition</head><p>Given a user query 𝑞 and a document (or passage) corpus 𝐶 = {𝐷1, 𝐷2, ..., 𝐷𝑛}, the goal of document retrieval is to find the 𝑘 documents that are most relevant to the query 𝑞. In our multi-granularity ranking setting, which consists of document retrieval and passage ranking tasks, the document retrieval task is to perform reranking on BM25 retrieved top-𝑘 documents. While traditional passage ranking tasks typically involve ranking entire passages, in this paper, the passage ranking task focuses solely on ranking passages within the retrieved document itself, which aligns more closely with real-world RAG scenarios and is thus more feasible.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Ranking models</head><p>Pre-trained Language Models (PLMs) are currently the most effective ranking models, which can be categorized into: biencoders and cross-encoders. Bi-encoders encode a query and a passage separately to obtain semantic representations <ref type="bibr" target="#b16">[17]</ref>, emerging as powerful first-stage retrievers by pre-computing the passage representations offline. Instead, cross-encoders take the concatenation of the query and a passage, and perform query-passage interactions <ref type="bibr" target="#b19">[20]</ref>, which have been conceived as second-stage rerankers, designed to explicitly refine the results provided by the firststage retrieval. In this paper, for brevity, we also refer other PLMs, such as encoder-only <ref type="bibr" target="#b16">[17,</ref><ref type="bibr" target="#b19">20]</ref>, decoder-only <ref type="bibr" target="#b22">[23]</ref>, and encoder-decoder <ref type="bibr" target="#b20">[21,</ref><ref type="bibr" target="#b21">22]</ref> models that perform querypassage interactions simultaneously, as cross-encoders.</p><p>There are several PLM-based cross-encoders, including sequence-to-sequence language models such as MonoT5 <ref type="bibr" target="#b21">[22]</ref> and RankT5 <ref type="bibr" target="#b20">[21]</ref> for ranking task, as well as multi-passage reader models like FiD <ref type="bibr" target="#b8">[9]</ref> and RFiD <ref type="bibr" target="#b23">[24]</ref> for RAG tasks, which have been demonstrated superior effectiveness. <ref type="bibr" target="#b21">[22]</ref> is the first work to define a ranking task as a text generation task by leveraging T5 <ref type="bibr" target="#b18">[19]</ref> encoderdecoder model. A query-document pair is concatenated into an input sequence Query : 𝑞 Document : 𝐷𝑛 Relevant:, and utilizes true and false as target tokens to represent their relevance. Then, the model is fine-tuned for text generation task. After training, the ranking scores are derived from the logits of true token, based on the softmax applied only on the logits of the true and false tokens.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>MonoT5. MonoT5</head><p>RankT5. Following MonoT5 <ref type="bibr" target="#b21">[22]</ref>, the input sequence is similar except that RankT5 do not include the Relevant: postfix. Then, the model use the &lt;extra_id_10&gt; as target token to learn unnormalized ranking score. The model is trained with list-wise ranking loss directly, instead of using text generation loss as in MonoT5 <ref type="bibr" target="#b21">[22]</ref>. However, these models cannot be directly used for long document retrieval due to the maximum input length constraint as in most PLMs, which hinders their effectiveness in the document retrieval task.</p><p>FiD. The FiD model further extends T5 <ref type="bibr" target="#b18">[19]</ref> encoder-decoder model, taking multiple 𝑘 passages as input, encoding separately, and then feeds the concatenated 𝑘 encoder hidden states into a T5 decoder to generate the answer. Relevance scores for passages are computed using cross-attention scores, which entail averaging the attention score across all tokens within the passage and all layers and heads within the decoder <ref type="bibr" target="#b25">[26]</ref>.</p><p>RFiD. While FiD <ref type="bibr" target="#b8">[9]</ref> treats all passages equally within its encoders, solely depending on the cross-attention mechanism to establish correlations between the decoder and encoders, which may identify the incorrect answer by referring to spurious passages. Instead, RFiD <ref type="bibr" target="#b23">[24]</ref> improves FiD by identifying potential answer-containing passages (or rationale) among the candidates and guiding the decoder with the identified rationales. Afterward, cross-attention scores are directly regarded as passage relevance scores the same as in <ref type="bibr" target="#b8">[9]</ref>. However, even with the rationale, the cross-attention mechanism still lacks for distinguishing relative differences between passages, as it is implicitly guided by a rationale classifier solely trained on point-wise binary classification loss.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Method</head><p>In this section, we briefly discuss a simple but effective Passage Fusion-in-Decoder (PFiD) for multi-granularity ranking. PFiD adopts the FiD <ref type="bibr" target="#b8">[9]</ref> architecture as a base model, further extends FiD by utilizing true and false token as a target token to model document-level relevance, enabling multi-granularity ranking simultaneously. Additionally, PFiD integrates inter-passage attention to learn relative passage relevances explicitly, which is similar to the listwise training objective of RankT5 <ref type="bibr" target="#b20">[21]</ref>.</p><p>Fusion-in-Decoder for Document Retrieval. Formally, Given a question 𝑞 and a set of 𝑘 passages within the document 𝐷𝑛 = {𝑃 𝑛 1 , 𝑃 𝑛 2 , ..., 𝑃 𝑛 𝑘 }, the FiD encoder outputs the 𝑘-th passage embeddings H k ∈ R 𝐿×𝑑 , where 𝐿 denotes the maximum token length, and 𝑑 denotes the dimension of hidden states, which are then concatenated as the input of the fusion decoder [H1, H2, ..., H k ].</p><formula xml:id="formula_0">H k = FiD-Encoder(𝑞 + 𝑃 𝑛 𝑘 )<label>(1)</label></formula><p>The FiD decoder utilizes [H1, H2, ..., H k ] to generate the target token 𝑇 = true 𝑜𝑟 false. Therefore, the loss function can be defined as follows:</p><formula xml:id="formula_1">ℒ𝐹 𝑖𝐷 = − 𝑇 ∑︁ 𝑖=1 log 𝑝(𝑦𝑖|𝑦1, 𝑦2, ...., 𝑦𝑖−1, [H1, H2, ..., H k ]) (2)</formula><p>Inter-passage Attention. Previous work <ref type="bibr" target="#b23">[24]</ref> tackled the issue of spurious passages by employing a binary classifier on the first token's encoder hidden states H k,1 , to determine whether the passage is a rationale passage to the query. Then, guide the decoder by appending the additional embeddings toward the end of the encoder's hidden states [H1, H2, ..., H k , H k+1 ], where H k+1 ∈ R 2×𝑑 is trainable rationale embedding. However, as Table <ref type="table">2</ref> shows, it drastically underperforms in passage ranking tasks by a large margin, as it does not explicitly model relative passage relevance.</p><p>Instead, to mitigate this, we utilize inter-passage attention to model interactions between passages explicitly. PFiD builds a set of input sequences by appending the first token hidden states of each pair as B = [H1,1, H2,1, ..., H k,1 ], where H𝑖,𝑗 denotes the 𝑗-th token embeddings of 𝑖-th passage. In a standard cross-encoder, the first token of the encoder aggregate query-passage information to compute a relevance score. We further use this token to depict the relative semantics via self-attention mechanism. Inspried by <ref type="bibr" target="#b26">[27]</ref>, we consider single-layer transformer model to depict relative passage relevance as follows:</p><formula xml:id="formula_2">︀ B = softmax (︂ QK ⊺ √ 𝑑 )︂ V, where Q = BW Q , K = BW K , V = BW V (3) in which matrices W Q , W K , W V ∈ R 𝑑×𝑑</formula><p>are learnable parameters. The information from different passages is fused and exchanged via the self-attention mechanism. The training loss used for inter-passage attention can be defined as follows:</p><formula xml:id="formula_3">𝑝 𝑘 = softmax( ̃︀ B 𝑘 W𝐵) ∈ R 2 , ℒ𝑝𝑎𝑠𝑠𝑎𝑔𝑒 = −(𝑦 log(𝑝 𝑘 ) + (1 − 𝑦) log(1 − 𝑝 𝑘 ))<label>(4)</label></formula><p>where 𝑦 is the passage relevance label, and the overall training objective of PFiD is:</p><formula xml:id="formula_4">ℒ 𝑎𝑙𝑙 = ℒ𝐹 𝑖𝐷 + 𝜆ℒ𝑝𝑎𝑠𝑠𝑎𝑔𝑒,<label>(5)</label></formula><p>where 𝜆 is a hyperparameter to balance two losses.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimental setup 4.1. Datasets</head><p>We use MIRACL <ref type="bibr" target="#b24">[25]</ref> passage ranking dataset for our experiments. The MIRACL <ref type="bibr" target="#b24">[25]</ref> dataset is a large-scale, opendomain, human-generated multi-document ranking dataset which is similar to MS MARCO <ref type="bibr" target="#b27">[28]</ref>, but MIRACL owns its advantage by providing segmented document collection, enabling both document retrieval and passage ranking. 1 For the document retrieval task, we construct the document retrieval dataset by regarding a document with at least one positive passage, as a positive document. Table <ref type="table" target="#tab_1">1</ref> shows the statistics of the datasets. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Baselines</head><p>We compare PFiD against the following three types of ranking baselines. The first is Single-Passage Cross-encoder (SPC) baselines, including MonoT5 <ref type="bibr" target="#b21">[22]</ref>, and RankT5 <ref type="bibr" target="#b20">[21]</ref>.</p><p>Due to the constraint of input tokens, we only take the first-𝑘 tokens in the document retrieval task. An alternative approach is to score each passage independently, and then take the passage with the highest score as the representative for ranking the document, or directly perform retrieval over the segmented passages. However, we will omit these approaches as the former lacks efficiency, and the latter is not scalable for real-world RAG scenarios. Then, the model is trained list-wisely with randomly sampled negatives from the entire passage sets; The second is Multi-Passage Cross-encoder (MPC) baselines, including FiD <ref type="bibr" target="#b8">[9]</ref> and RFiD <ref type="bibr" target="#b23">[24]</ref>. For comparison in our experimental setting, both FiD and RFiD models are trained with the target token of true 𝑜𝑟 false, enabling both document retrieval and passage ranking. All SPC and MPC baselines used in this experiment are initialized with T5-base model; The third is the most frequently employed lexical ranker BM25 <ref type="bibr" target="#b15">[16]</ref>. We use the Elasticsearch engine with default parameters 𝑘1 = 1.2, and 𝑏 = 0.75.</p><p>1 MS MARCO also provide segmented document collection, but the segmented corpus do not align with passages in passage ranking tasks. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Experimental Details</head><p>We adopt T5-base <ref type="bibr" target="#b18">[19]</ref> as our base model, using Adam <ref type="bibr" target="#b28">[29]</ref> with a learning rate of 10 −4 and a dropout rate of 0.1. For both training and inference, we use the top-100 passages and truncate them to 200 of the maximum token length. The hyperparameter 𝜆 is set to 0.5. For the document retrieval task, we perform ranking on BM25 top-100 retrieved documents, whereas passage ranking ranks the passages within the given positive document. We also conduct experiments on real-world RAG scenarios, considering both document retrieval and passage ranking simultaneously. We use the evaluation metric of the nDCG <ref type="bibr" target="#b29">[30]</ref>, Recall, and MRR scores to evaluate the effectiveness. All experiments are conducted on a single NVIDIA A100 GPU (40GB). In this work, we do not consider other training approaches including data augmentation, knowledge distillation, or negative sampling strategies as delving into their effects falls outside the scope of our objectives.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results and Analysis</head><p>Retrieval and Ranking. Table <ref type="table">2</ref> presents our evaluation results on document retrieval and passage ranking tasks.</p><p>The key observations are as follows: (i) MPC significantly outperforms SPC in document retrieval task by aggregating multiple 𝑘 passages, alleviating the problem of limited context size in SPC. In particular, one can see that PFiD outperforms RFiD by a large margin on both document ranking and passage ranking task. This indicates that by leveraging passage-wise context to guide the decoder, we can better identify relative passage relevance. Note that compared with the existing SPC baselines, our method achieves ranking efficiency by explicitly removing the need for each granularity ranking. PFiD directly consumes the entire document, and scores the relevance of the entire passages and document simultaneously. (ii) RFiD, implicitly guiding the decoder with rationale embedding shows improvement over FiD by a large margin, however, it is still even worse than BM25. It suggests that implicitly guiding indeed benefits the model's ranking ability to some extent. However, when ranking</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2</head><p>The evaluation results of different baselines. As for the document retrieval, we rank the top 100 documents retrieved by BM25, while the passage ranking task ranks the passage within the retrieved document. 𝑁 denotes the number of documents to rank, whereas 𝑃 denotes the number of passages in the document. The best performances are in †. Latency indicates the total inference time from document retrieval to passage ranking, which is measured by averaging the time taken for each query with a single thread and a single batch on the GPU.</p><p>Model Category Document Retrieval Passage Ranking Complexity Latency (s) top-𝑘 MRR@10 Recall@5 Recall@10 MRR@10 nDCG@5 nDCG@10 We first retrieve # documents from the candidates, and rerank # passages within the retrieved documents. Figure <ref type="figure" target="#fig_0">1</ref> represents the result of our evaluation. Notably, from Table <ref type="table">2</ref> we observed that MPC outperforms SPC in document retrieval tasks, however, the performance drastically drops in this setting, as cross-attention scores from the decoder are indistinguishable across passages from multi-documents. Additionally, despite RankT5 reaching the best effectiveness on the passage ranking task, it did not exhibit any improvements over our method in real-world RAG scenarios, suggesting the importance of the multi-granularity ranking. Instead, PFiD consistently outperforms all baselines, by leveraging the complementary nature of SPC and MPC. PFiD is capable of more efficiently retrieving documents and ranking passages, and capturing the relative semantic correlation between different passages, leading to superior performance.</p><p>Cross-attention vs PFiD. As discussed above, PFiD has the advantage of identifying relevant passages compared to previous models like RFiD since it explicitly models relative passage relevance. We investigate the effects of the crossattention scores of the decoder and our passage ranking scores for the passage ranking task. Figure <ref type="figure" target="#fig_1">2</ref> illustrates the distribution of the rank of positive passages. As depicted in Figure <ref type="figure" target="#fig_1">2</ref>, the PFiD is more strongly correlated with passage relevances than cross-attention scores, suggesting the PFiD focuses more on positive passages by explicitly learning relative passage relevance. Our experimental results show that the enhanced ability to identify relevant passages contributes to overall performance improvement. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Passage ranking results on the real-world RAG scenarios. We first retrieve # of documents and rerank # passages within the retrieved documents.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Distribution of the rank of positive passages.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1</head><label>1</label><figDesc>Statistics of Datasets.</figDesc><table><row><cell>Task</cell><cell cols="3"># train # dev # avg judgement</cell><cell># corpus</cell></row><row><cell>Document Retrieval</cell><cell>22,548</cell><cell>6,404</cell><cell>2.22</cell><cell>5,758,285</cell></row><row><cell>Passage Ranking</cell><cell>29,416</cell><cell>8,350</cell><cell>2.75</cell><cell>32,893,221</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title/>
		<author>
			<persName><surname>Openai</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2303.08774</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">Gpt-4 technical report</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Touvron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Martin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Stone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Albert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Almahairi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Babaei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Bashlykov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Batra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bhargava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bhosale</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bikel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">B</forename></persName>
		</author>
		<idno type="arXiv">arXiv:2307.09288</idno>
		<title level="m">Llama 2: Open foundation and fine-tuned chat models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Anil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Dai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Firat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Johnson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lepikhin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Passos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shakeri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Taropa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bailey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><forename type="middle">C</forename></persName>
		</author>
		<idno type="arXiv">arXiv:2305.10403</idno>
		<title level="m">Palm 2 technical report</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Language models are few-shot learners</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">B</forename><surname>Brown</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ryder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Subbiah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kaplan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dhariwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Neelakantan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shyam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sastry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Askell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Herbert-Voss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Krueger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Henighan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Child</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ramesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Ziegler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Winter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hesse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sigler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Litwin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chess</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Berner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mccandlish</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Amodei</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2005.14165</idno>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">When not to trust language models: Investigating effectiveness of parametric and nonparametric memories</title>
		<author>
			<persName><forename type="first">A</forename><surname>Mallen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Asai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Zhong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Khashabi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hajishirzi</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.acl-long.546</idno>
		<ptr target="https://aclanthology.org/2023.acl-long.546.doi:10.18653/v1/2023.acl-long.546" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Rogers</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Boyd-Graber</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Okazaki</surname></persName>
		</editor>
		<meeting>the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)<address><addrLine>Toronto, Canada</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="9802" to="9822" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">On faithfulness and factuality in abstractive summarization</title>
		<author>
			<persName><forename type="first">J</forename><surname>Maynez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Narayan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Bohnet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Mcdonald</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.acl-main.173</idno>
		<ptr target="https://aclanthology.org/2020.acl-main.173.doi:10.18653/v1/2020.acl-main.173" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">D</forename><surname>Jurafsky</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Chai</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Schluter</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Tetreault</surname></persName>
		</editor>
		<meeting>the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1906" to="1919" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zhong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Peng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Liu</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2311.05232</idno>
		<title level="m">A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Perez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Piktus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Petroni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Karpukhin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Küttler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Yih</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rocktäschel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Riedel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kiela</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2005.11401</idno>
		<title level="m">Retrieval-augmented generation for knowledge-intensive nlp tasks</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Leveraging passage retrieval with generative models for open domain question answering</title>
		<author>
			<persName><forename type="first">G</forename><surname>Izacard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.eacl-main.74</idno>
		<ptr target="https://aclanthology.org/2021.eacl-main.74.doi:10.18653/v1/2021.eacl-main.74" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">P</forename><surname>Merlo</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Tiedemann</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Tsarfaty</surname></persName>
		</editor>
		<meeting>the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="874" to="880" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Atlas: Few-shot learning with retrieval augmented language models</title>
		<author>
			<persName><forename type="first">G</forename><surname>Izacard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lomeli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Hosseini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Petroni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Schick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dwivedi-Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Riedel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<ptr target="http://jmlr.org/papers/v24/23-0037.html" />
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="page" from="1" to="43" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Jia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Dai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2312.10997</idno>
		<title level="m">Retrieval-augmented generation for large language models: A survey</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Rethinking with retrieval: Faithful large language model inference</title>
		<author>
			<persName><forename type="first">H</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Roth</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2301.00303</idno>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">N</forename><surname>Thakur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bonifacio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Ogundepo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kamalloo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Alfonso-Hermelo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rezagholizadeh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2312.11361</idno>
		<title level="m">Nomiracl: Knowing when you don&apos;t know for robust multilingual retrievalaugmented generation</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Large language models can be easily distracted by irrelevant context</title>
		<author>
			<persName><forename type="first">F</forename><surname>Shi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Misra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Scales</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dohan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">H</forename><surname>Chi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Schärli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhou</surname></persName>
		</author>
		<ptr target="https://proceedings.mlr.press/v202/shi23a.html" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 40th International Conference on Machine Learning</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Krause</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Brunskill</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Engelhardt</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Sabato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Scarlett</surname></persName>
		</editor>
		<meeting>the 40th International Conference on Machine Learning<address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">202</biblScope>
			<biblScope unit="page" from="31210" to="31227" />
		</imprint>
	</monogr>
	<note>Proceedings of Machine Learning Research</note>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Asai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hajishirzi</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2310.11511</idno>
		<title level="m">Self-rag: Learning to retrieve, generate, and critique through self-reflection</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">The probabilistic relevance framework: Bm25 and beyond</title>
		<author>
			<persName><forename type="first">S</forename><surname>Robertson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zaragoza</surname></persName>
		</author>
		<idno type="DOI">10.1561/1500000019</idno>
	</analytic>
	<monogr>
		<title level="j">Foundations and Trends in Information Retrieval</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="333" to="389" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><surname>Karpukhin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Oğuz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Min</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Edunov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Yih</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2004.04906</idno>
		<title level="m">Dense passage retrieval for open-domain question answering</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1810.04805</idno>
		<title level="m">Bert: Pre-training of deep bidirectional transformers for language understanding</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><surname>Raffel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Roberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Narang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Matena</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">J</forename><surname>Liu</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1910.10683</idno>
		<title level="m">Exploring the limits of transfer learning with a unified text-to-text transformer</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Nogueira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1901.04085</idno>
		<title level="m">Passage re-ranking with bert</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Zhuang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Jagerman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Hui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bendersky</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2210.10634</idno>
		<title level="m">Rankt5: Finetuning t5 for text ranking with ranking losses</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Nogueira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2003.06713</idno>
		<title level="m">Document ranking with a pretrained sequence-to-sequence model</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<author>
			<persName><forename type="first">X</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2310.08319</idno>
		<title level="m">Finetuning llama for multi-stage text retrieval</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2305.17041</idno>
		<title level="m">Rfid: Towards rational fusion-in-decoder for open-domain question answering</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">MIRACL: A Multilingual Retrieval Dataset Covering 18 Diverse Languages</title>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Thakur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Ogundepo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kamalloo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Alfonso-Hermelo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rezagholizadeh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Transactions of the Association for Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="1114" to="1131" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Izacard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2012.04584</idno>
		<title level="m">Distilling knowledge from reader to retriever for question answering</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Longtriever: a pre-trained long text encoder for dense document retrieval</title>
		<author>
			<persName><forename type="first">J</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Xie</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.emnlp-main.223</idno>
		<ptr target="https://aclanthology.org/2023.emnlp-main.223.doi:10.18653/v1/2023.emnlp-main.223" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing</title>
				<editor>
			<persName><forename type="first">H</forename><surname>Bouamor</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Pino</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Bali</surname></persName>
		</editor>
		<meeting>the 2023 Conference on Empirical Methods in Natural Language Processing<address><addrLine>Singapore</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="3655" to="3665" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Bajaj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Campos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Craswell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Majumder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mcnamara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mitra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rosenberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Stoica</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tiwary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1611.09268</idno>
		<title level="m">Ms marco: A human generated machine reading comprehension dataset</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ba</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1412.6980</idno>
		<title level="m">Adam: A method for stochastic optimization</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Cumulated gain-based evaluation of ir techniques</title>
		<author>
			<persName><forename type="first">K</forename><surname>Järvelin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kekäläinen</surname></persName>
		</author>
		<idno type="DOI">10.1145/582415.582418</idno>
		<idno>doi:10.1145/582415.582418</idno>
		<ptr target="https://doi.org/10.1145/582415.582418" />
	</analytic>
	<monogr>
		<title level="j">ACM Trans. Inf. Syst</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="page" from="422" to="446" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
