<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Performance Prediction for Conversational Search Using Perplexities of Query Rewrites</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Chuan</forename><surname>Meng</surname></persName>
							<email>c.meng@uva.nl</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Amsterdam</orgName>
								<address>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mohammad</forename><surname>Aliannejadi</surname></persName>
							<email>m.aliannejadi@uva.nl</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Amsterdam</orgName>
								<address>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Maarten</forename><surname>De Rijke</surname></persName>
							<email>m.derijke@uva.nl</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Amsterdam</orgName>
								<address>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Performance Prediction for Conversational Search Using Perplexities of Query Rewrites</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">C6CE2F0ACEEBE60880152B06DAEF2C12</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-04-29T07:04+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Query performance prediction</term>
					<term>conversational search</term>
					<term>perplexity</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We consider query performance prediction (QPP) task for conversational search (CS), i.e., to estimate the retrieval quality for queries in multi-turn conversations. We reuse QPP methods from ad-hoc search for CS by feeding them self-contained query rewrites generated by T5. Our experiments on three CS datasets show that (i) lower query rewriting quality may lead to worse QPP performance, and (ii) incorporating query rewriting quality (as measured by perplexity) improves the effectiveness of QPP methods for CS if the query rewriting quality is limited. Our implementation is publicly available at https://github.com/ChuanMeng/QPP4CS.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>We consider the task of query performance prediction (QPP) <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref> for conversational search (CS) <ref type="bibr" target="#b2">[3]</ref>, i.e., estimating the retrieval quality for a query in a multi-turn conversation. Little research has been done into QPP for CS. A unique aspect of CS is that each conversational query may contain omissions or coreferences, making it hard for ad-hoc search systems or QPP methods to capture the underlying information need. A popular two-stage CS pipeline <ref type="bibr" target="#b2">[3]</ref> can effectively solve this issue by (i) rewriting a conversational query into a self-contained query, and (ii) reusing ad-hoc search systems fed with the query rewrite.</p><p>Inspired by the two-stage pipeline, we model QPP for CS by feeding query rewrites to QPP methods designed for ad-hoc search. However, our experiments on CS datasets show that lowquality query rewrites reduce the effectiveness of QPP methods. Based on the fact that lower query rewriting quality tends to result in lower retrieval quality, we argue that query rewriting quality provides evidence for estimating retrieval quality. To incorporate query rewriting quality into QPP methods, we propose a perplexity-based pre-retrieval QPP framework (PPL-QPP) for CS. PPL-QPP first evaluates the quality of a query rewrite by its perplexity measured by a pre-trained language model, and then combines the perplexity with a state-of-the-art pre-retrieval QPP</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>Performance of QPP methods on three CS datasets, in terms of Pearson's 𝜌, Kendall's 𝜏 , and Spearman's 𝜌 correlation coefficients. IDF, PMI, SCQ, and VAR are defined for a single query term; aggregation functions over terms are needed; we report the performance of each method using the optimal aggregation function on each dataset; the aggregation functions used by each method on CAsT-19, CAsT-20, and OR-QuAC are listed sequentially in the brackets. All values are statistically significant (t-test, 𝑝 &lt; 0.05) except the ones in italics. The best value in each column is marked in bold.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Methods</head><p>CAsT method <ref type="bibr" target="#b1">[2]</ref>. Experiments show that PPL-QPP improves the effectiveness of QPP methods in the context of CS in cases when the query rewriting quality is limited.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Experiments</head><p>Experimental setup. We use seven widely used pre-retrieval QPP methods <ref type="bibr" target="#b1">[2]</ref> on three CS datasets: CAsT-19 <ref type="bibr" target="#b3">[4]</ref>, CAsT-20 <ref type="bibr" target="#b3">[4]</ref>, and OR-QuAC <ref type="bibr" target="#b4">[5]</ref>. The retriever to be evaluated by the QPP methods is T5-based query rewriter<ref type="foot" target="#foot_0">1</ref> +BM25, a widely-used CS method <ref type="bibr" target="#b2">[3]</ref>. The T5-generated query rewrites used by BM25 are fed into all QPP methods. We evaluate QPP methods by calculating the correlation between the NDCG@3 scores of the queries in the test set and the estimated retrieval quality. Note that NDCG@3 is the primary metric in CAsT <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b5">6]</ref>.</p><p>Performance of QPP methods for CS. Experimental results are presented in Table <ref type="table">1</ref>. Our leading observation is that the overall performance of QPP methods on CAst-19 and OR-QuAC is better than on CAsT-20. The difference in results seems to be due to the difference in query rewriting quality on the three datasets. We measure query rewriting quality using the similarity between manual and T5-generated query rewrites in terms of ROUGE, and the BM25 retrieval quality gap between using manual and T5-generated query rewrites. Fig. <ref type="figure" target="#fig_0">1a</ref> shows that the ROUGE scores on CAsT-20 are lower than those on CAsT-19 and OR-QuAC; Fig. <ref type="figure" target="#fig_0">1b</ref> shows that the gap is larger on CAsT-20 than the gap on CAsT-19. We conclude that the quality of T5-generated query rewrites is lower on CAsT-20 than on the other datasets and that lower query rewriting quality may lead to worse QPP effectiveness.</p><p>Incorporating query rewriting quality into QPP for CS. Based on our observation that lower query rewriting quality tends to result in lower retrieval quality, we argue that query rewriting quality can provide evidence for estimating retrieval quality. We propose PPL-QPP, which incorporates query rewriting quality into QPP methods. Since we cannot obtain manual query rewrites during estimation, we regard the perplexity of generated query rewrites as a measure of quality. PPL-QPP first uses GPT-2 XL 2 to measure the perplexity of a T5-generated query rewrite and combines the perplexity with a pre-retrieval QPP method through linear interpolation:</p><formula xml:id="formula_0">𝛼 • 1 PPL + (1 − 𝛼) • QPP .</formula><p>Here, 𝛼 is a trade-off parameter; the perplexity and QPP values are first normalized prior to fusion. For the QPP method to be combined, we use the state-of-the-art VAR (sum) on CAsT-19 and OR-QuAC, and SCQ (avg) on CAsT-20. The performance of PPL-QPP is presented in Table <ref type="table">1</ref>. The results show that PPL-QPP improves the effectiveness of QPP methods in the context of CS on CAsT-19 and, in particular, on CAsT-20, where the query rewriting quality is limited. Interestingly, and different from CAsT-19/20, PPL-QPP does not bring improvements on the OR-QuAC dataset; we plan to further investigate this in our future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Conclusion</head><p>In this paper, we have targeted QPP for CS. We have reused QPP methods for ad-hoc search in the context of CS by feeding them self-contained query rewrites generated by T5. Our experiments on three CS datasets show that (i) lower query rewriting quality may lead to worse QPP performance, and (ii) incorporating query rewriting quality into QPP methods improves their effectiveness in the context of CS when query rewriting quality is limited.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1:The similarity between manual and T5-generated query rewrites in terms of ROUGE (a) and the retrieval quality of BM25 for manual/T5-generated query rewrites in terms of NDCG@3 (b).</figDesc></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://huggingface.co/castorini/t5-base-canard</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgement. We want to thank our reviewers for their feedback. This research was partially supported by the China Scholarship Council (CSC).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">An analysis of variations in the effectiveness of query performance prediction</title>
		<author>
			<persName><forename type="first">D</forename><surname>Ganguly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Datta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mitra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Greene</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ECIR</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="215" to="229" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Estimating the query difficulty for information retrieval</title>
		<author>
			<persName><forename type="first">D</forename><surname>Carmel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Yom-Tov</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2010">2010</date>
			<publisher>Morgan &amp; Claypool Publishers</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Multi-stage conversational passage retrieval: An approach to fusing term importance estimation and neural query rewriting</title>
		<author>
			<persName><forename type="first">S.-C</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-H</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Nogueira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-F</forename><surname>Tsai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">TOIS</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="page" from="1" to="29" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">The conversational assistance track overview</title>
		<author>
			<persName><forename type="first">J</forename><surname>Dalton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Callan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Text Retrieval Conference</title>
				<imprint>
			<date type="published" when="2020">2020. 2020</date>
		</imprint>
	</monogr>
	<note>CAsT</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Open-retrieval conversational question answering</title>
		<author>
			<persName><forename type="first">C</forename><surname>Qu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Qiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">B</forename><surname>Croft</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Iyyer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGIR</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="539" to="548" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Dalton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Callan</surname></persName>
		</author>
		<title level="m">CAsT-19: A dataset for conversational information seeking</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1985" to="1988" />
		</imprint>
	</monogr>
	<note>SIGIR</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
