<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Towards Incorporating Personalized Context for Conversational Information Seeking</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Haitao</forename><surname>Yu</surname></persName>
							<email>yuhaitao@slis.tsukuba.ac.jp</email>
							<affiliation key="aff0">
								<orgName type="department">Institute of Library, Information and Media Science</orgName>
								<orgName type="institution">University of Tsukuba</orgName>
								<address>
									<settlement>Tsukuba City</settlement>
									<region>Ibaraki</region>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Lingzhen</forename><surname>Zheng</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Graduate School of Comprehensive Human Sciences</orgName>
								<orgName type="institution">University of Tsukuba</orgName>
								<address>
									<settlement>Tsukuba City</settlement>
									<region>Ibaraki</region>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Kaiyu</forename><surname>Yang</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Graduate School of Comprehensive Human Sciences</orgName>
								<orgName type="institution">University of Tsukuba</orgName>
								<address>
									<settlement>Tsukuba City</settlement>
									<region>Ibaraki</region>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sumio</forename><surname>Fujita</surname></persName>
							<email>sufujita@lycorp.co.jp</email>
							<affiliation key="aff2">
								<orgName type="department">LY Research</orgName>
								<orgName type="institution">LY Corporation</orgName>
								<address>
									<settlement>Tokyo</settlement>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Hideo</forename><surname>Joho</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute of Library, Information and Media Science</orgName>
								<orgName type="institution">University of Tsukuba</orgName>
								<address>
									<settlement>Tsukuba City</settlement>
									<region>Ibaraki</region>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff3">
								<orgName type="laboratory">Information Retrieval&apos;s Role in RAG Systems (IR-RAG)</orgName>
								<address>
									<addrLine>18 July</addrLine>
									<postCode>2024</postCode>
									<settlement>Washington</settlement>
									<region>DC</region>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Towards Incorporating Personalized Context for Conversational Information Seeking</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">5DB0423593A0DE9EBAEE285CB0CA2F32</idno>
					<idno type="arXiv">arXiv:2310.07712.</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:09+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Conversational, Information Seeking, Personalized Context, LLM Orcid 0000-0002-1569-8507 (H. Yu)</term>
					<term>0009-0004-5783-7079 (L. Zheng)</term>
					<term>0009-0002-4491-7235 (K. Yang)</term>
					<term>0000-0002-1282-386X (S. Fujita)</term>
					<term>0000-0002-6611-652X (H. Joho)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Conversational information seeking (CIS) extends the classic search to a conversational nature, which has attracted significant attention in recent years. Yet one size does not fit all, it is no surprise that users often need high-quality personalized response due to their different personas, e.g., for the search about alternatives to cow's milk, the desired responses may differ a lot. In this work, we focus on CIS that aims to account for personalized retrieval and response generation. Specifically, we follow the CIS paradigm presented in the TREC iKAT track, which consists of three core tasks, namely personal textual knowledge base (PTKB) statement ranking, passage ranking, and response generation. For PTKB statement ranking, we propose to fuse multiple large language models (LLMs). For passage ranking, we propose four different strategies for personalized retrieval. For response generation, we resort to zero-short LLM-based answer generation by incorporating personalized context. The experimental results show that: (1) For PTKB statement ranking, our method achieves the best performance in terms of MRR on the set of iKAT organizers' assessments. It also shows superior performance over the baseline based on GPT-4. This indicates that a fusion of multiple LLMs is a promising choice when tackling problems of this kind. (2) For passage ranking, on one hand, one of our proposed strategies is able to achieve comparable performance as Llama2-based baseline. On the other hand, our analysis indicates that the way of incorporating PTKB statements for personalized retrieval matters, where a direct concatenation is not recommended. (3) For response generation, our proposed method is able to generate grounded and natural personalized responses, and is comparable to the top-tier LLM-based baseline.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In recent years, conversational systems have attracted considerable attention from both academic researchers and industrial practitioners. In the field of information retrieval (IR), conversational information seeking (CIS) has been identified as one of the most important research directions. Remarkable efforts have been made from different aspects, which include, but not limited to, conversational search conceptualization <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b2">3]</ref>, conversational query re-writing <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b5">6]</ref>, generating and selecting clarifying questions <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b9">10]</ref> and conversational response generation <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b12">13]</ref>.</p><p>Despite the successes achieved by the aforementioned studies, fundamental research questions remain open. For example, providing high-quality user-specific response is still a challenging problem. Take the case by Aliannejadi et al. <ref type="bibr" target="#b13">[14]</ref> as an example, for the search about alternatives to cow's milk, two personas can be: (A) Alice is a vegan who is deeply concerned about the environment; and (B) Bob has been recently diagnosed with diabetes, has a nut allergy, and is lactose intolerant. Given Alice and Bob's personas, their corresponding conversations with the system would evolve and develop in very different ways. Put another way, the responses that are helpful to Alice may not be necessarily useful to Bob, and vice versa. In fact, information needs of this kind are prevalent in daily information searches, which include, but not limited to, job finding, healthcare search and online shopping. Given the information needs expressed as a sequence of search queries (or questions) and different personas, it is of great importance that the CIS system can effectively incorporate the personalized context and provide relevant responses to users. Motivated by this observation, we focus on developing a unified CIS system, which enables to incorporate personalized context during the interactive search process. The main contributions of this work are listed as follows:</p><p>• By following the CIS paradigm presented in the TREC iKAT track, we propose different methods for tackling the core tasks, namely personal textual knowledge base (PTKB) statement ranking, passage ranking, and response generation. For PTKB statement ranking, we explore how to fuse multiple large language models (LLMs). The experimental results show that our method achieves the best performance in terms of MRR on the set of iKAT organizers' assessments which relies on a larger assessment pool. Moreover, our method also shows superior performance over the GPT-4-based baseline. This highlights that it is not straightforward to solve a component task by merely tailoring a powerful LLM. Whereas a fusion of multiple LLMs can be a promising choice when tackling problems of this kind. • For passage ranking, we propose four different strategies for personalized retrieval, which enables us to well investigate the impact of utterance rewriting and the way of incorporating personalized context. Through result analysis and comparison, we found that: Though our proposed method for selecting PTKB statements is relatively reliable, how to incorporate the selected PTKB statements to formulate the input for personalized retrieval matters a lot. A direct concatenation is not suggested according to the inferior performance of our proposed strategies. • For response generation, we resort to zero-short</p><p>LLM-based answer generation by incorporating personalized context. Our method is able to generate grounded and natural personalized responses, and is comparable to the top-tier LLM-based baseline.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Preliminaries</head><p>Figure <ref type="figure" target="#fig_0">1</ref> describes our focused framework for CIS that accounts for users' personas. It assumes that there is a personal text knowledge base (PTKB), which consists narrative sentences providing personal information about the users.</p><p>A system following this framework consists of the following key modules. (1) Statement ranking: given the context of the conversation and the current user utterance, this module returns a ranked list of PTKB statements based on their relevance, which reflects the user's persona; (2) Passage ranking: given the context of the conversation, the current user utterance, and the PTKB statements, this module is responsible for retrieving a ranked list of passages from the document collection;</p><p>(3) Response generation: this module returns the answer text as a response to the user. In particular, the response should be a generative or abstractive summary of the relevant passages. We recognize that the gap exists between our focused framework for CIS and the real-world search scenarios. Since this topic is still in its infancy, we leave it as a future work to explore more complex frameworks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methodology</head><p>Given the target paradigm for CIS in section 2, we elaborate on the proposed methods for addressing the key module as below.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Statement Ranking by Fusing Multiple LLMs</head><p>The key idea of our method (denoted as SR_FML) for tackling statement ranking is to effectively fuse multiple LLMs through a cascade of four steps. At the first step, we rewrite each conversation turn's utterance. Specifically, the T5-CANARD model <ref type="bibr" target="#b14">[15]</ref> fine-tuned with the testing topics of TREC CAsT 2022 <ref type="bibr" target="#b15">[16]</ref> is used, and the preceding turns' conversations (3 turns at most) are used as the context. At the second step, given the candidate PTKB statements, we perform binary logistic regression based on the BERT <ref type="bibr" target="#b16">[17]</ref> model. The candidate PTKB statements with a true label are kept for later steps, and the statements with a false label are filtered out. At the third step, we perform binary logistic regression again over the remaining PTKB statements based on MonoT5 <ref type="bibr" target="#b17">[18]</ref> in the same way as the second step. In addition, we use RankGPT <ref type="bibr" target="#b18">[19]</ref> to sort the PTKB statements, and assign the top half statements with a true label, and a false label for the remaining bottom statements. At the fourth step, we mange to unify the ranking information and binary classification results of the previous two steps via a scoring function and an indicator function. The scoring function assigns a weight for each remaining statement in the 2nd step as follows:</p><formula xml:id="formula_0">𝑤(𝑠) = 1 − 𝐼 𝑛𝑑 𝑀𝑜𝑛𝑜𝑇 5 (𝑠) + 𝐼 𝑛𝑑 𝑅𝑎𝑛𝑘𝐺𝑃𝑇 (𝑠) 2 * |𝑆|<label>(1)</label></formula><p>where 𝐼 𝑛𝑑 𝑅𝑎𝑛𝑘𝐺𝑃𝑇 (𝑠) and 𝐼 𝑛𝑑 𝑀𝑜𝑛𝑜𝑇 5 (𝑠) represent the rank positions according to the regression scores by MonoT5 and RankGPT, respectively. |𝑆| represents the number of remaining PTKB statements in the second step. The indicator function builds upon 𝑤(𝑠) and a voting mechanism as follows:</p><formula xml:id="formula_1">𝐼 (𝑠) = ⎧ ⎨ ⎩ 1 if (𝑙𝑎𝑏 𝐵𝐸𝑅𝑇 (𝑠) + 𝑙𝑎𝑏 𝑀𝑜𝑛𝑜𝑇 5 (𝑠) + 𝑙𝑎𝑏 𝑅𝑎𝑛𝑘𝐺𝑃𝑇 (𝑠)) ≥ 2</formula><p>and 𝑤(𝑠) &gt; 0.65 0 otherwise <ref type="bibr" target="#b1">(2)</ref> 𝑙𝑎𝑏 𝐵𝐸𝑅𝑇 (𝑠), 𝑙𝑎𝑏 𝑀𝑜𝑛𝑜𝑇 5 (𝑠), and 𝑙𝑎𝑏 𝑅𝑎𝑛𝑘𝐺𝑃𝑇 (𝑠) respectively represent the binary classification result by each adopted LLM, where an output of 1 denotes a true label, and 0 for a false label.</p><p>The final result list of PTKB statements is generated by selecting statements with a positive output via the indicator function and ranking them via the scoring function in a decreasing order.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Zero-shot LLM-based Passage Ranking</head><p>To cope with passage ranking, we resort to the typical pipeline of retrieve-then-rank. Firstly, we use BM25 with the default setting in Pyserini to retrieve the top 5 passages. Then we design 4 strategies (denoted as PR_S1, PR_S2, PR_S3 and PR_S4, respectively) to re-rank the top 5 passages using multiple specifically selected LLMs in a zero-shot manner.</p><p>To formulate the input, PR_S1, PR_S3, and PR_S4 concatenate the rewritten utterance and the top 2 relevant PTKB statements returned by the module of statement ranking. PR_S2 directly uses the rewritten utterance as the input.</p><p>During the ranking process, the differences among the four strategies are as follows: (1) PR_S1 and PR_S2 assemble the results by multiple LLMs (i.e., "stabilityai/stablelm-tuned-alpha-7b", "eachadea/vicuna-13b-1.1", "jondurbin/airoboros-7b", "TheBloke/koala-13B-HF") <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b20">21,</ref><ref type="bibr" target="#b21">22,</ref><ref type="bibr" target="#b22">23,</ref><ref type="bibr" target="#b23">24]</ref> in a voting manner. Specifically, given the information need represented by the input, we ask each LLM to compare the candidate passages in a pairwise manner. The passage that is identified to be more relevant than the other gets a vote. Finally, we rank the passages based on the cumulative number of votes in a decreasing order; (3) PR_S3 merely relies on MonoT5 with the default setting in PyGaggle to rank the passages; (4) PR_S4 relies on the idea of RankGPT to rank the passages, where the GPT-3.5 API is used.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Personalized Response Generation</head><p>For tackling response generation, we aim to generate personalized response. Specifically, for each conversation turn, the top-1 passage and the top-2 PTKB statements representing the personalized context are used as the input. For the base LLM, we resort to T5 <ref type="bibr" target="#b24">[25]</ref>, which is specifically fine-tuned for the summarization task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimental Setup</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Dataset</head><p>We use the dataset released by TREC iKAT 2023 for evaluating the effectiveness with 25 testing topics. Each topic has 1 ∼ 3 subtree conversations that represent different personas. For each personalized conversation, there is a list of around 10 PTKB statements. Moreover, the passage collection has 116, 838, 987 passages, which is derived from a subset of ClueWeb22-B <ref type="bibr" target="#b25">[26]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Baselines</head><p>In order to make a fair and thorough analysis, we perform a module-specific comparison by selecting the most competitive and representative baseline methods from TREC iKAT 2023's participants. We add a prefix of BS to each baseline method for a better clarity.</p><p>For statement ranking, BS_zs_Llama and BS_ft_Llama use zero-shot and fine-tuned Llama-2-7b-chat <ref type="bibr" target="#b26">[27]</ref> for rewriting the utterance, respectively. Then they use MiniLM12 <ref type="bibr" target="#b27">[28]</ref> to rank PTKB statements based on the rewritten utterance.</p><p>For passage ranking, BS_Llama2 initially instructs Llama-2-7b-chat to reformulate the current utterance considering previous conversation turns' context. Then, the revised conversation, along with a specific passage, are provided to the model to assess the passage's relevance.</p><p>For response generation, BS_FastChatT5andLlama creates a summarization for each of the top passages retrieved by BM25 using FastChatT5 <ref type="bibr" target="#b28">[29]</ref>, then it generates the response to current utterance based on the summaries in a retrieval-generate loop. A final response is summarized by BS_DenseMonoT5 using different engines including conventional language models and Llama2 based on top passages.</p><p>Besides the above module-specific baseline methods, BS_GPT-4 is compared across three modules, which represents the method using the most powerful LLM (i.e., GPT-4 <ref type="bibr" target="#b29">[30]</ref>). For statement ranking, BS_GPT-4 casts it as a binary classification problem. The prompt includes the instruction, context of the conversation, PTKB statements of the user, and current user utterance. The output is a ranked list of relevant statements. For passage ranking, BS_GPT-4 initially generates an answer for each turn. Subsequently, GPT-4 is employed to produce five queries for each answer. These generated queries are used via BM25 to retrieve passages, then the pre-trained MiniLM12 is deployed for ranking the passages. For response generation, GPT-4 is prompted to generate the answer, using the top-10 retrieved passages, the top-3 PTKB statements, the context of the conversation and the user utterance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Implementation Details</head><p>All experiments were conducted on a server with two A100 (40GB) GPUs. The CUDA version is 12.2. For fine-tuning T5-CANARD, the configuration is: training epochs: 5, batch size: 4, learning rate: 1𝑒 − 5. For SR_FML, bert-base-uncased with default parameter settings is used as the backbone model, which comes from transformers library provided by HuggingFace <ref type="bibr" target="#b30">[31]</ref>. We iterate its predictions five times and compute the average relevance scores for each statement. For RankGPT, the configuration is: window size: 4, step size: 1. The MonoT5 with default parameter settings in Pygaggle is used. In PR_S3, the window size of RankGPT is adjusted to 3. In PR_S1 and PR_S2, we set the prompt_max_length of the four zero-shot LLMs to 2048. Additionally, we set the decoding method to beam_search, output_max_length to 512, and temperature to 1.0 by default <ref type="bibr" target="#b31">[32]</ref>. For RG_SumT5, t5-base-finetuned-summarize-news is employed with configuration: input max_length: 512, output min_length: 50, output max_length: 150, length_penalty: 2.0, num_beams: 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results and Analysis</head><p>In Table <ref type="table" target="#tab_0">1</ref>, Table <ref type="table">2</ref> and Table <ref type="table" target="#tab_1">3</ref>, we show the overall performance of the baseline approaches, and the proposed methods for statement ranking, passage ranking and response generation, respectively. Within each table, the best result in terms of each metric is indicated in bold, and the secondbest result is underlined.</p><p>For statement ranking, we note that there are two sets of assessments which were created by the iKAT organizers and NIST assessors, respectively. The key differences are that: During topic generation, the organizers annotated each turn in terms of their provenance to PTKB statements and included their labels in the released topic files. During the assessment of passage relevance, the NIST assessors were also asked to judge the relevance of PTKB statements to each turn. The assessment pool is smaller than the one done by the organizers. The organizers judged all of the turns, while the NIST assessors only judged the turns that were selected for passage relevance <ref type="bibr" target="#b13">[14]</ref>. From Table <ref type="table" target="#tab_0">1</ref>, we can observe that BS_zs_Llama outperforms the other methods in terms of nDCG@3, P@3 and Recall@3. Though BS_ft_Llama relies on the same LLM, its performance is impacted due to the rewritten utterances in a fine-tune setting. On the contrary, BS_GPT-4 relying on the powerful GPT-4 shows inferior performance across two sets of assessments. This indicates that the usage of GPT-4 for statement ranking is not straightforward, further exploration is needed for a better performance. Over the set of iKAT organizers' assessments, our proposed method (i.e., SR_FML) shows competitive performance as BS_zs_Llama, and achieves the best performance in terms of MRR. This indicates the benefit of fusing multiple LLMs, which enables us to leverage on the advantages of different LLMs. In view of the fact that the set of iKAT organizers' assessments bases on a larger assessment pool, it is reasonable to say that the evaluation over this set is more reliable.</p><p>For passage ranking, the results in Table <ref type="table">2</ref> show that BS_GPT-4 significantly outperform BS_Llama2 and our pro- posed methods by a large margin. This echoes the findings in prior studies <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b32">33,</ref><ref type="bibr" target="#b33">34,</ref><ref type="bibr" target="#b34">35]</ref> which have shown the leading capability of GPT-4 in the passage ranking task. One probable reason is that the pipeline of generate-retrieve-generate adopted by BS_GPT-4 is more suitable for passage ranking than our adopted pipeline of retrieve-generate. Among our proposed strategies for passage ranking, PR_S2 shows the best performance, and also outperforms BS_Llama2. Compared with BS_Llama2, a possible reason for the inferior performance of the other three strategies is the way of formulating the input. We directly concatenate the utterance and related PTKB statements as the input, while BS_Llama2 rewrites the utterance with the statements using LLM. Another possible reason for our inferior performance is that we focus on the earlier positions and only re-rank the top-5 passages returned by BM25. As a result, this setting would become a bottleneck for us to get relevant passages given the limited retrieval ability of BM25. For response generation, the results are evaluated in terms of groudedness and naturalness. Groudedness measures whether the generated response can be attributed to the passages that it is supposed to be generated from. Naturalness measures the extent to which the response sounds humanlike, such as the general fluency and understandability of the generated response. GPT-4 is used to evaluate both the groundedness and naturalness of the responses in each turn. Finally, the mean of groundedness and naturalness over all turns is reported. From Table <ref type="table" target="#tab_1">3</ref>, we can observe that BS_GPT-4 again outperforms the other methods by a large margin. Our proposed method (i.e., RG_SumT5) outperforms BS_DenseMonoT5 and shows competitive performance as BS_FastChatT5andLlama.</p><p>It is noticeable that the evaluation results are likely to be somewhat biased towards BS_GPT-4, since the evaluation is conducted by GPT-4. We leave it as a future work to further test the effectiveness of these methods for response generation through human evaluation results.</p><p>A joint look across Table <ref type="table" target="#tab_0">1</ref>, Table <ref type="table">2</ref> and Table <ref type="table" target="#tab_1">3</ref> reveals that: First, we do not observe a clear correlation between statement ranking and passage ranking, which seems counterintuitive. For instance, though BS_GPT-4 shows inferior performance in statement ranking, it outperforms the other methods by a large margin in passage ranking. This counterintuitiveness may arise from a number of possible reasons, such as the strong zero-shot capability of GPT-4 and the precise understanding of persona information underlying selected PTKB statements. This is also worthy to be investigated as a future work. Second, for both personalized retrieval and response generation in the context of CIS, there is still a large room to improve the performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>In this study, we focus on CIS that accounts for personalized retrieval and response generation. By following the CIS paradigm presented in the TREC iKAT track, we propose different methods to tackle three core tasks, namely personal textual knowledge base (PTKB) statement ranking, passage ranking and response generation. We have shown that fusing multiple LLMs is a promising way for addressing PTKB statement ranking. Also, our analysis indicates that an effective way of injecting the selected PTKB statements is quite important for personalized retrieval. Since conversational systems arise in a variety of applications, such as recommender systems and question answering, we believe that our work provides insights for developing conversational systems that account for personalized retrieval and response generation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Acknowledgments</head><p>This research has been supported by JSPS KAKENHI Grant Number 19H04215.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Our focused framework for conversational information seeking that incorporates personalized context.</figDesc><graphic coords="2,94.57,65.61,406.16,95.61" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>The performance comparison on statement ranking.</figDesc><table><row><cell></cell><cell cols="2">Ground Truth</cell><cell></cell><cell>Method</cell><cell></cell><cell>Metric</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>MRR</cell><cell>nDCG@3</cell><cell>P@3</cell><cell>Recall@3</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>BS_zs_Llama</cell><cell>0.6707</cell><cell>0.6394</cell><cell>0.3810</cell><cell>0.7375</cell></row><row><cell></cell><cell cols="3">iKAT organizers' assessment</cell><cell>BS_GPT-4 BS_ft_Llama</cell><cell>0.6618 0.6617</cell><cell>0.6288 0.6149</cell><cell>0.3423 0.3542</cell><cell>0.6888 0.6918</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>SR_FML</cell><cell>0.6890</cell><cell>0.6370</cell><cell>0.3512</cell><cell>0.6903</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="2">BS_zs_Llama 0.7950</cell><cell>0.7254</cell><cell>0.4626</cell><cell>0.6964</cell></row><row><cell></cell><cell cols="2">NIST assessment</cell><cell></cell><cell>BS_ft_Llama BS_GPT-4</cell><cell>0.7795 0.7027</cell><cell>0.7102 0.6174</cell><cell>0.4490 0.3605</cell><cell>0.6796 0.5833</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>SR_FML</cell><cell>0.7112</cell><cell>0.6594</cell><cell>0.4184</cell><cell>0.6213</cell></row><row><cell>Table 2</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="4">The performance comparison on passage ranking.</cell><cell></cell><cell></cell></row><row><cell>Method</cell><cell cols="2">nDCG@3 nDCG@5</cell><cell></cell><cell>mAP</cell><cell></cell></row><row><cell>BS_GPT-4</cell><cell>0.4382</cell><cell>0.4396</cell><cell cols="2">0.1759</cell><cell></cell></row><row><cell>BS_Llama2</cell><cell>0.1389</cell><cell>0.1466</cell><cell cols="2">0.0376</cell><cell></cell></row><row><cell>PR_S2</cell><cell>0.1433</cell><cell>0.1469</cell><cell cols="2">0.0350</cell><cell></cell></row><row><cell>PR_S4</cell><cell>0.1130</cell><cell>0.1070</cell><cell cols="2">0.0224</cell><cell></cell></row><row><cell>PR_S3</cell><cell>0.1107</cell><cell>0.1062</cell><cell cols="2">0.0223</cell><cell></cell></row><row><cell>PR_S1</cell><cell>0.1086</cell><cell>0.1049</cell><cell cols="2">0.0222</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 3</head><label>3</label><figDesc>The result comparison on response generation.</figDesc><table><row><cell>Method</cell><cell cols="2">Groundedness Naturalness</cell></row><row><cell>BS_GPT-4</cell><cell>0.89 (65/8)</cell><cell>4.0</cell></row><row><cell>BS_FastChatT5andLlama</cell><cell>0.67 (47/23)</cell><cell>3.684</cell></row><row><cell>BS_DenseMonoT5</cell><cell>0.51 (37/36)</cell><cell>2.808</cell></row><row><cell>RG_SumT5</cell><cell>0.67 (49/24)</cell><cell>2.9178</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Azzopardi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dubiel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Halvey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dalton</surname></persName>
		</author>
		<title level="m">The second international workshop on conversational approaches to information retrieval</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note>Conceptualizing agent-human interactions during the conversational search process</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Towards multimodal conversational information seeking</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Deldjoo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Trippas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zamani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 44th International ACM SIGIR conference on research and development in Information Retrieval</title>
				<meeting>the 44th International ACM SIGIR conference on research and development in Information Retrieval</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="1577" to="1587" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A theoretical framework for conversational search</title>
		<author>
			<persName><forename type="first">F</forename><surname>Radlinski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Craswell</surname></persName>
		</author>
		<idno type="DOI">10.1145/3020165.3020183</idno>
		<idno>doi:10.1145/3020165.3020183</idno>
		<ptr target="https://doi.org/10.1145/3020165.3020183" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval, CHIIR &apos;17</title>
				<meeting>the 2017 Conference on Conference Human Information Interaction and Retrieval, CHIIR &apos;17<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="117" to="126" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Few-shot generative conversational query rewriting</title>
		<author>
			<persName><forename type="first">S</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bennett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval</title>
				<meeting>the 43rd International ACM SIGIR conference on research and development in Information Retrieval</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1933" to="1936" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Question rewriting for conversational question answering</title>
		<author>
			<persName><forename type="first">S</forename><surname>Vakulenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Longpre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Tu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Anantha</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 14th ACM international conference on web search and data mining</title>
				<meeting>the 14th ACM international conference on web search and data mining</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="355" to="363" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Multi-stage conversational passage retrieval: An approach to fusing term importance estimation and neural query rewriting</title>
		<author>
			<persName><forename type="first">S.-C</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-H</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Nogueira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-F</forename><surname>Tsai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
		<idno type="DOI">10.1145/3446426</idno>
		<idno>doi:</idno>
		<ptr target="10.1145/3446426" />
	</analytic>
	<monogr>
		<title level="j">ACM Trans. Inf. Syst</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Asking clarifying questions in open-domain information-seeking conversations</title>
		<author>
			<persName><forename type="first">M</forename><surname>Aliannejadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zamani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Crestani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">B</forename><surname>Croft</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 42nd international acm sigir conference on research and development in information retrieval</title>
				<meeting>the 42nd international acm sigir conference on research and development in information retrieval</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="475" to="484" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Generating clarifying questions for information retrieval</title>
		<author>
			<persName><forename type="first">H</forename><surname>Zamani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dumais</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Craswell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bennett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lueck</surname></persName>
		</author>
		<idno type="DOI">10.1145/3366423.3380126</idno>
		<idno>doi:10. 1145/3366423.3380126</idno>
		<ptr target="https://doi.org/10.1145/3366423.3380126" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of The Web Conference 2020, WWW &apos;20</title>
				<meeting>The Web Conference 2020, WWW &apos;20<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="418" to="428" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Towards facetdriven generation of clarifying questions for conversational search</title>
		<author>
			<persName><forename type="first">I</forename><surname>Sekulić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Aliannejadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Crestani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2021 ACM SIGIR international conference on theory of information retrieval</title>
				<meeting>the 2021 ACM SIGIR international conference on theory of information retrieval</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="167" to="175" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Zamani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mitra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lueck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Diaz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">N</forename><surname>Bennett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Craswell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">T</forename><surname>Dumais</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2006.00166</idno>
		<title level="m">Analyzing and learning from user interactions for search clarification</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Quan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yu</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2004.12363</idno>
		<title level="m">Multidomain dialogue acts and response co-generation</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Structured and natural responses co-generation for conversational search</title>
		<author>
			<persName><forename type="first">C</forename><surname>Ye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Liao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Ji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T.-S</forename><surname>Chua</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</title>
				<meeting>the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="155" to="164" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Dialogbert: Discourseaware response generation via learning to recover and rank utterances</title>
		<author>
			<persName><forename type="first">X</forename><surname>Gu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">M</forename><surname>Yoo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-W</forename><surname>Ha</surname></persName>
		</author>
		<idno type="DOI">10.1609/aaai.v35i14.17527</idno>
		<ptr target="https://ojs.aaai.org/index.php/AAAI/article/view/17527.doi:10.1609/aaai.v35i14.17527" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the AAAI Conference on Artificial Intelligence</title>
				<meeting>the AAAI Conference on Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="page" from="12911" to="12919" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Trec ikat 2023: The interactive knowledge assistance track overview</title>
		<author>
			<persName><forename type="first">M</forename><surname>Aliannejadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zahra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Shubham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Jeffery</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Leif</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Thirty-Second Text REtrieval Conference (TREC</title>
				<meeting>the Thirty-Second Text REtrieval Conference (TREC</meeting>
		<imprint>
			<date type="published" when="2023">2023. 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">S.-C</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-H</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Nogueira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-F</forename><surname>Tsai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2004.01909</idno>
		<title level="m">Conversational question reformulation via sequence-to-sequence architectures and pretrained language models</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Trec cast 2022: Going beyond user ask and system retrieve with initiative and response generation</title>
		<author>
			<persName><forename type="first">P</forename><surname>Owoicho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dalton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Aliannejadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Azzopardi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Trippas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Vakulenko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">NIST Special Publication</title>
		<imprint>
			<biblScope unit="page" from="500" to="338" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1810.04805</idno>
		<title level="m">Bert: Pre-training of deep bidirectional transformers for language understanding</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Document ranking with a pretrained sequence-to-sequence model</title>
		<author>
			<persName><forename type="first">R</forename><surname>Nogueira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Pradeep</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.findings-emnlp.63</idno>
		<ptr target="https://aclanthology.org/2020.findings-emnlp.63.doi:10.18653/v1/2020.findings-emnlp.63" />
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">T</forename><surname>Cohn</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>He</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="708" to="718" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">W</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Ren</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2304.09542</idno>
		<title level="m">Is chatgpt good at search? investigating large language models as re-ranking agents</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">X</forename><surname>Geng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gudibande</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Wallace</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Abbeel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Levine</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Song</surname></persName>
		</author>
		<ptr target="https://bair.berkeley.edu/blog/2023/04/03/koala/" />
		<title level="m">Koala: A dialogue model for academic research</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note>Blog post</note>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Anand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Nussbaum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Duderstadt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Schmidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mulyar</surname></persName>
		</author>
		<ptr target="https://github.com/nomic-ai/gpt4all" />
		<title level="m">Gpt4all: Training an assistant-style chatbot with large scale data distillation from gpt-3.5-turbo</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Taori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gulrajani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Dubois</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Guestrin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Liang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">B</forename><surname>Hashimoto</surname></persName>
		</author>
		<ptr target="https://github.com/tatsu-lab/stanford_alpaca,2023" />
		<title level="m">Stanford alpaca: An instruction-following llama model</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<author>
			<persName><forename type="first">W.-L</forename><surname>Chiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zhuang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhuang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E</forename><surname>Gonzalez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Stoica</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">P</forename><surname>Xing</surname></persName>
		</author>
		<ptr target="https://lmsys.org/blog/2023-03-30-vicuna/" />
		<title level="m">Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Touvron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lavril</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Izacard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Martinet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-A</forename><surname>Lachaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lacroix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Rozière</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hambro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Azhar</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2302.13971</idno>
		<title level="m">Llama: Open and efficient foundation language models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Exploring the limits of transfer learning with a unified text-to-text transformer</title>
		<author>
			<persName><forename type="first">C</forename><surname>Raffel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Roberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Narang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Matena</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">J</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="page" from="5485" to="5551" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Clueweb22: 10 billion web documents with rich information</title>
		<author>
			<persName><forename type="first">A</forename><surname>Overwijk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Callan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</title>
				<meeting>the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="3360" to="3362" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Touvron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Martin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Stone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Albert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Almahairi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Babaei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Bashlykov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Batra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bhargava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bhosale</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2307.09288</idno>
		<title level="m">Llama 2: Open foundation and finetuned chat models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Sentence-bert: Sentence embeddings using siamese bert-networks</title>
		<author>
			<persName><forename type="first">N</forename><surname>Reimers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/1908.10084" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</title>
				<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W.-L</forename><surname>Chiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zhuang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhuang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">P</forename><surname>Xing</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E</forename><surname>Gonzalez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Stoica</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2306.05685</idno>
		<title level="m">Judging llm-as-a-judge with mtbench and chatbot arena</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">O</forename></persName>
		</author>
		<idno type="arXiv">arXiv:2303.08774</idno>
		<imprint>
			<date type="published" when="2023">2023. 2023</date>
		</imprint>
	</monogr>
	<note type="report_type">Gpt-4 technical report</note>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Transformers: State-of-the-art natural language processing</title>
		<author>
			<persName><forename type="first">T</forename><surname>Wolf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Sanh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chaumond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Delangue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Moi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cistac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Louf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Funtowicz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Davison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shleifer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Von Platen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Jernite</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Plu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">Le</forename><surname>Scao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gugger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Drame</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Lhoest</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rush</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.emnlp-demos.6</idno>
		<ptr target="https://aclanthology.org/2020.emnlp-demos.6.doi:10.18653/v1/2020.emnlp-demos.6" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">Q</forename><surname>Liu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Schlangen</surname></persName>
		</editor>
		<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="38" to="45" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">Y</forename><surname>Lin</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2306.02561</idno>
		<title level="m">Llm-blender: Ensembling large language models with pairwise ranking and generative fusion</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b32">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Pradeep</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sharifymoghaddam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2309.15088</idno>
		<title level="m">Rankvicuna: Zero-shot listwise document reranking with open-source large language models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Yuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Dou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-R</forename><surname>Wen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2308.07107</idno>
		<title level="m">Large language models for information retrieval: A survey</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ture</surname></persName>
		</author>
		<title level="m">Found in the middle: Permutation self-consistency improves listwise ranking in large language models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
