<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Multi-Aspect Reviewed-Item Retrieval via LLM Query Decomposition and Aspect Fusion</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Anton</forename><surname>Korikov</surname></persName>
							<email>anton.korikov@mail.utoronto.ca</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Toronto</orgName>
								<address>
									<settlement>Toronto</settlement>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">George</forename><surname>Saad</surname></persName>
							<email>g.saad@mail.utoronto.ca</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Toronto</orgName>
								<address>
									<settlement>Toronto</settlement>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ethan</forename><surname>Baron</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Toronto</orgName>
								<address>
									<settlement>Toronto</settlement>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mustafa</forename><surname>Khan</surname></persName>
							<email>mr.khan@mail.utoronto.ca</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Toronto</orgName>
								<address>
									<settlement>Toronto</settlement>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Manav</forename><surname>Shah</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Toronto</orgName>
								<address>
									<settlement>Toronto</settlement>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Scott</forename><surname>Sanner</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Toronto</orgName>
								<address>
									<settlement>Toronto</settlement>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="department">SIGIR&apos;24 Workshop on Information Retrieval&apos;s Role in RAG Systems</orgName>
								<address>
									<addrLine>July 18</addrLine>
									<postCode>2024</postCode>
									<region>Washington D.C</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Multi-Aspect Reviewed-Item Retrieval via LLM Query Decomposition and Aspect Fusion</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">A60CA5A355565E17848060CA4EA0A276</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:09+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Dense retrieval, query decomposition, multi-aspect retrieval, LLM reranking, late fusion, Orcid 0009-0003-4487-9504 (A. Korikov)</term>
					<term>0009-0000-3549-9874 (G. Saad)</term>
					<term>0009-0004-2461-5760 (E. Baron)</term>
					<term>0009-0008-3622-7270 (M. Khan)</term>
					<term>0009-0008-4728-0771 (M. Shah)</term>
					<term>0000-0001-7984-8394 (S. Sanner)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>While user-generated product reviews often contain large quantities of information, their utility in addressing natural language product queries has been limited, with a key challenge being the need to aggregate information from multiple low-level sources (reviews) to a higher item level during retrieval. Existing methods for reviewed-item retrieval (RIR) typically take a late fusion (LF) approach which computes query-item scores by simply averaging the top-K query-review similarity scores for an item. However, we demonstrate that for multi-aspect queries and multi-aspect items, LF is highly sensitive to the distribution of aspects covered by reviews in terms of aspect frequency and the degree of aspect separation across reviews. To address these LF failures, we propose several novel aspect fusion (AF) strategies which include Large Language Model (LLM) query extraction and generative reranking. Our experiments show that for imbalanced review corpora, AF can improve over LF by a MAP@10 increase from 0.36 ± 0.04 to 0.52 ± 0.04, while achieving equivalent performance for balanced review corpora.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>User-generated reviews are an abundant and rich source of data that has the potential to be used to improve the retrieval of reviewed-items such as products, services, or destinations. However, a challenge of using review data for retrieval is that information has to be aggregated across multiple (low-level) reviews to a (higher) item-level during retrieval. Recent work <ref type="bibr" target="#b0">[1]</ref>, defining this Reviewed-Item Retrieval setting as RIR, showed that state-of-the-art results could be achieved by using a bi-encoder to aggregate review information to an item-level in a process called late fusion (LF). As opposed to aggregating review information to an item-level before query-scoring (early fusion), LF first computes query-review similarity to avoid losing information before scoring, and then averages the top-𝐾 query-review similarity scores to get a query-item similarity score. Recently, LF has been implemented by retrieval augmented generation (RAG) driven conversational recommendation (ConvRec) systems for generative recommendation, explanation, and interactive question answering <ref type="bibr" target="#b1">[2]</ref>.</p><p>In this paper, we extend RIR to a multi-aspect retrieval setting, formulating what we call multi-aspect RIR (MA-RIR). In this problem, our goal is to retrieve relevant items for a multi-aspect query by using the reviews of multi-aspect items. Specifically, for an item with multiple aspects, we assume that each review describes at least one, and up to all, of the item's aspects.</p><p>As our primary contributions:</p><p>• We formulate the MA-RIR problem and identify failure modes of LF under imbalanced review-aspect distributions, considering imbalances due to both aspect frequency and the degree of aspect separation across reviews. • We propose several novel aspect fusion strategies, which include LLM query extraction and reranking, to address failures of LF review-score aggregation on imbalanced multi-aspect review distributions. • We leverage a recently released multi-aspect retrieval dataset, Recipe-MPR <ref type="bibr" target="#b2">[3]</ref>, with ground-truth query-and item-aspect labels to generate four multiaspect review distributions with various aspect balance properties, and numerically evaluate the effect of review-aspect balance on MA-RIR. • Our simulations show that for imbalanced data, Aspect Fusion can improve over LF by MAP@10 increase from 0.36 ± 0.04 to 0.52 ± 0.04 while achieving equivalent performance for balanced data. • We show that LLM reranking in both cross-encoder and zero-shot (ZS) listwise reranking settings can provide some improvements when given a large enough number of reviews, but risk decreasing performance when not enough reviews are provided.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Neural IR</head><p>Given a set of documents 𝒟 and a query 𝑞 ∈ 𝒬, an IR task 𝐼 𝑅⟨𝒟 , 𝑞⟩ is to assign a similarity score 𝑆 𝑞,𝑑 ∈ ℝ between the query and each document 𝑑 ∈ 𝒟 and return a ranked list of top scoring documents. The standard first-stage neural-IR method <ref type="bibr" target="#b3">[4]</ref> for a large corpus is to first use a bi-encoder 𝑔(⋅) ∶ 𝒬 ∪𝒟 → ℝ 𝑚 to map a query 𝑞 and document 𝑑 to their respective embeddings 𝑔(𝑞) = z 𝑞 and 𝑔(𝑑) = z 𝑑 . A similarity function 𝑓 (⋅, ⋅) ∶ ℝ 𝑚 × ℝ 𝑚 → ℝ, such as the dot product, is then used to compute a query-document score 𝑆 𝑞,𝑑 = 𝑓 (z 𝑞 , z 𝑑 ). For web-scale corpora, exact similarity search for the top query-document scores is typically impractical, so approximate similarity search algorithms <ref type="bibr" target="#b4">[5]</ref> are used instead.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Reviewed-Item Retrieval</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.1.">Problem Formulation</head><p>Information retrieval across two-level data structures was previously studied by Zhang and Balog <ref type="bibr" target="#b5">[6]</ref>. Specifically, Zhang and Balog define the Object Retrieval problem, where (high-level) objects are described by multiple (low-level) documents. Given a query, the task is to retrieve high-level objects by using information in the low-level documents.</p><p>To investigate a special case of object retrieval where the goal is retrieving items (e.g., products, destinations) based on their reviews, Abdollah Pour et al. <ref type="bibr" target="#b0">[1]</ref> recently proposed the Reviewed-item Retrieval (RIR) problem. In the 𝑅𝐼 𝑅⟨ℐ , 𝒟 , 𝑞⟩ problem, there is a set of items ℐ, where every item 𝑖 is a high-level object. Each item is described by a set of reviews (i.e., "low-level documents") 𝒟 𝑖 ⊂ 𝒟, and the 𝑟'th review of item 𝑖 is 𝑑 𝑖,𝑟 ∈ 𝒟 𝑖 . The main difference between RIR and Object Retrieval is that in RIR a low-level document 𝑑 𝑖,𝑟 cannot describe more than one high-level object 𝑖, while Object Retrieval allows for more general two-level structures. Given a query 𝑞 ∈ 𝒬 and a score 𝑆 𝑞,𝑖 between 𝑞 and each item 𝑖, the goal of RIR is to retrieve a ranked list 𝐿 𝑞 of top-𝐾 𝐼 scoring items:</p><formula xml:id="formula_0">𝐿 𝑞 = (𝑖 1 , ..., 𝑖 𝐾 𝐼 ) s.t. 𝑖 1 ∈ arg max 𝑖 {𝑆 𝑞,𝑖 } 𝑆 𝑞,𝑖 𝑘 , ≥ 𝑆 𝑞,𝑖 𝑘+1 , ∀𝑖 𝑘 ∈ 𝐿 𝑞 .</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.2.">Fusion</head><p>To get a query-item score 𝑆 𝑞,𝑖 using an item's review set 𝒟 𝑖 , review information needs to be aggregated to an item level: this process is called fusion. Two alternatives exist for fusion <ref type="bibr" target="#b5">[6]</ref>: if low-level information is aggregated before a query is used for scoring, it is called Early Fusion (EF) -in contrast, if the aggregation occurs after query-scoring, it is called Late Fusion (LF).</p><p>For EF in RIR, Abdollah Pour et al. <ref type="bibr" target="#b0">[1]</ref> experiment with mean-pooling and contrastive learning methods to create an item embedding z 𝑖 ∈ ℝ 𝑚 from review embeddings {z 𝑑 } 𝑑∈𝒟 𝑖 . They then directly compute the similarity between z 𝑖 and a query embedding z 𝑞 as the query-item score 𝑆 𝑞,𝑖 = 𝑓 (z 𝑞 , z 𝑖 ).</p><p>For LF in RIR, these authors first compute query-review similarity scores 𝑆 𝑞,𝑑 𝑖,𝑟 = 𝑓 (z 𝑞 , z 𝑑 𝑖,𝑟 ). They then aggregate these scores into a query-item score 𝑆 𝑞,𝑖 by averaging the top 𝐾 𝑅 query-review scores for each item:</p><formula xml:id="formula_1">𝑆 𝑞,𝑖 = 1 𝐾 𝑅 𝐾 𝑅 ∑ 𝑟=1 𝑆 𝑞,𝑑 𝑖,𝑟 .<label>(1)</label></formula><p>Numerical evaluations performed for EF and LF for RIR demonstrate that EF has significantly worse performance than LF <ref type="bibr" target="#b0">[1]</ref>, and Abdollah Pour et al. conjecture that EF performs worse because it loses fine-grained review information before query-scoring. In contrast, by delaying fusion, LF preserves review-level information during query-scoring. Due to these findings, we do not study EF for MA-RIR, rather, we focus on developing Aspect Fusion as an extension of LF, discussed next.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Multi-Aspect Reviewed Item Retrieval</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Multi-Aspect Queries</head><p>This paper focuses on retrieving relevant items using their reviews for a multi-aspect query, such as "Can I have a meatball recipe that doesn't take too long? ". We define a query aspect to be a sub-span of a multi-aspect query that represents a distinct topic (or facet) in the query, for instance the sub-spans ""meatball" and "doesn't take too long" in the previous sentence. . In this work, multi-aspect queries are assumed to be logical AND queries for all aspects, though an aspect itself can represent other logical operators such as XOR (e.g. a query aspect may be "chicken or beef "). Finally, we assume all query aspects are equally important -a further discussion of weighted multi-aspect retrieval can be found in Section 7.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Multi-Aspect Reviewed-Items</head><p>In addition to considering multi-aspect queries, we also consider multi-aspect items described by reviews. For instance, a multi-aspect item that is relevant to the multi-aspect query example above might be a recipe titled "Beef meatballs cooked in canned soup, ready in 25 minutes". However, Figure <ref type="figure">1</ref>: Two extremes of item aspect distributions, showing reviews for an item with aspects "meatballs" and "ready in 25 minutes": a) Fully overlapping (top) -Each review mentions all item aspects. b) Fully disjoint with imbalanced aspect frequency (bottom) -no review mentions more than one aspect, and some aspects are mentioned much more frequently than others. since our goal is to isolate the properties of review-based retrieval, we assume that no such natural language (NL) item-level description is available. Instead, we assume that the item's aspects are described in reviews. Obviously, itemlevel descriptions (e.g. titles) are often available in practice, so a prime direction for future work is fusion across multiple levels of NL data during reviewed-item retrieval.</p><p>Examples of reviews describing the item in the previous paragraph, which has aspects "meatballs" and "ready in 25 minutes", are shown in Figure <ref type="figure">1</ref>. In this paper, we assume that a review 𝑑 𝑖,𝑟 must mention at least one item aspect 𝑎 item 𝑖,𝑗 ∈ 𝒜 item 𝑖 and could mention up to all item aspects. Formally, the distribution of item aspects across reviews can be defined with a bipartite aspect distribution graph 𝒢 = {𝒟 , 𝒜 item , ℰ } where an edge (𝑑 𝑖,𝑟 , 𝑎 represent the set of item-aspects that are relevant to a query and should be considered during retrieval. We define the 𝑀𝐴 − 𝑅𝐼 𝑅⟨𝒜 , ℰ , 𝒟 , 𝑞⟩ problem as the task of retrieving a ranked list of relevant multi-aspect items 𝐿 𝑞 for a multi-aspect query 𝑞, where 𝒜 = 𝒜 item ∪ 𝒜 query .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Multi-Aspect Review Distributions</head><p>As we will demonstrate with numerical simulations on LLMgenerated review data, understanding review distributions in terms of aspect frequency and degree of aspect separation between reviews is key to designing successful MA-RIR techniques. Figure <ref type="figure">1</ref> shows two extremes of aspect distributions that are among the distributions we explore in our experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.1.">Fully Overlapping Distributions</head><p>Figure <ref type="figure">1a</ref>) shows a fully overlapping aspect distribution where each review mentions all aspects -in this case, the bipartite graph 𝒢 (see the RHS of Figure <ref type="figure">1</ref>) is fully connected for item 𝑖 1 . This is the most balanced review aspect distribution possible for an item, and, because of this "perfect" aspect balance, we postulate that aspect-agnostic retrieval approaches such as standard LF will perform competitively on such distributions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.2.">Degree of Separation and Aspect Frequency</head><p>In contrast to the case of perfect review-aspect balance, Figure <ref type="figure">1b</ref>) shows an extreme case of aspect imbalance. Firstly, one aspect is mentioned much more frequently than another -this is an aspect frequency imbalance. Secondly, each review mentions only one aspect -this is a maximal degree of separation of aspects across reviews (fully disjoint). Mathematically, 𝒢 has |𝒜 item 𝑖 1 | (disjoint) star components where some stars have a singificantly higher degree than others. In the next section, we discuss the negative effects of imbalanced review-aspect distributions on LF performance on MA-RIR, and propose aspect fusion as a method for mitigating these negative effects.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Aspect Fusion for MA-RIR</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Desiderata of Aspect Fusion</head><p>Recall that LF computes a query-item similarity score by averaging the top 𝐾 𝑅 query-review similarity scores using Aspect Fusion extracts aspects (i.e., query subspans) from a query, performs LF with each aspect, and aggregates the resulting top 𝐾 𝐼 item lists (i.e., one list per extracted aspect) to a final list.</p><p>Equation <ref type="bibr" target="#b0">(1)</ref>. For MA-RIR, we propose two desiderata for the aspect distribution in the top 𝐾 𝑅 reviews during fusion.</p><p>Desideratum 1: Since we assume multi-aspect queries are AND queries, if an item contains 𝒜 rel,𝑞 𝑖 relevant aspects for query 𝑞, the 𝐾 𝑅 reviews used for LF should mention all 𝒜 rel,𝑞 𝑖 of those relevant aspects.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Desideratum 2:</head><p>As mentioned in Section 3.1, we also assume all query aspects are equally important, which implies that aspect frequency should be identical for all 𝒜 rel,𝑞 𝑖 aspects in the top 𝐾 𝑅 retrieved reviews.</p><p>In a fully overlapping distribution (Figure <ref type="figure">1a</ref>) where each review mentions each aspect, both Desiderata 1 and 2 are guaranteed to be satisfied by any subset of item reviews. We thus argue that standard LF should be sufficient when reviews fully overlap in aspects, and focus on developing Aspect Fusion methods that address the failures of LF for imbalanced review-aspect distributions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Failures of LF under Review-Aspect Imbalance</head><p>Standard LF will fail to achieve Desiderata 1 and 2 for reviewaspect distributions with at least some degree of disjointedness and aspect frequency imbalance under the following assumptions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Aspect Popularity Bias</head><p>Aspects that are reviewed more frequently are more likely to be mentioned in the top 𝐾 𝑅 reviews.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Embedding Bias</head><p>The non-isotropic nature of the embedding space <ref type="bibr" target="#b6">[7]</ref>  , respectively, for some item 𝑖. If query-review similarity scores tend to be higher when a review describes aspect 𝑎 rel 𝑖,𝑗 as opposed to aspect 𝑎 rel 𝑖,𝑘 , LF will be more likely to select reviews from review set 𝒟 𝑗 𝑖 for the top 𝐾 𝑅 fused reviews. For example, in Figure <ref type="figure">1b</ref>), the reviews describing cooking time might be more likely to score higher with the full query than reviews describing "meatballs".</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Aspect Fusion</head><p>To address these failures of LF on imbalanced data, we introduce several methods for Aspect Fusion, which explicitly utilizes the multi-aspect nature of reviews during fusion to address multi-aspect queries.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.1.">Aspect Extraction</head><p>To extract aspects from queries, we propose to use fewshot (FS) prompting with an LLM. Though the number of query-aspects is typically not known a priori, since we study multi-aspect queries, our proposed prompt (Figure <ref type="figure">10</ref> in the Appendix) asks that at least two non-overlapping subspans of the query be extracted as aspects. We represent the set of extracted query aspects for query 𝑞 as 𝒜 ext 𝑞 and let 𝐴 e 𝑞 = |𝒜 ext 𝑞 |.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.2.">Aspect-Item Scoring</head><p>The key to Aspect Fusion is directly computing aspect-review similarity scores 𝑆 𝑎,𝑑 𝑖,𝑟 , as opposed to similarity scores between reviews and a monolithic query, since the later can be negatively impacted by review-aspect distribution imbalance. Aspect similarity scores are computed by separately embedding each extracted aspect 𝑎 ∈ 𝒜 ext 𝑞 as z 𝑎 = 𝑔(𝑎) and calculating 𝑆 𝑎,𝑑 𝑖,𝑟 = 𝑓 (z 𝑎 , z 𝑑 𝑖,𝑟 ). Then, aspect-item scores 𝑆 𝑎,𝑖 ∈ ℝ are obtained by aggregating the top 𝐾 𝑅 aspectreview scores via Eq. ( <ref type="formula" target="#formula_1">1</ref>) with aspect-review scores instead of query-review scores. For each extracted aspect 𝑎, the top-𝐾 𝐼 scoring items are ordered into a list</p><formula xml:id="formula_2">𝐿 𝑎 = (𝑖 1 , ..., 𝑖 𝐾 𝐼 ) s.t. 𝑖 1 ∈ arg max 𝑖 {𝑆 𝑎,𝑖 } 𝑆 𝑎,𝑖 𝑘 , ≥ 𝑆 𝑎,𝑖 𝑘+1 , ∀𝑖 𝑘 ∈ 𝐿 𝑞 .</formula><p>Figure <ref type="figure" target="#fig_0">2b</ref>) demonstrates aspect-item scoring and how it can alleviate the biases of standard LF. In this figure, the red and green points are embeddings of the reviews of item 𝑖 describing aspect 𝑎 item 𝑖,1 and 𝑎 item 𝑖,2 , respectively -both these aspects are assumed to be relevant to the query. Though the former aspect is more frequent, an equal number (𝐾 𝑅 ) of reviews for each aspect will be used during score fusion -as long as the aspect review embeddings are similar enough to the relevant query aspect embedding, and the total number of reviews for an aspect is at least 𝐾 𝑅 . In contrast, Figure <ref type="figure" target="#fig_0">2a</ref>) shows how standard (monolithic) LF will take a biased review sample of the first aspect since it is more frequently mentioned by reviews and z 𝑞 happens to be closer to those review embeddings. To differentiate between LF for RIR proposed by Abdollah Pour et al., and Aspect Fusion, we will refer to LF as Monolithic LF since it uses the full query.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.3.">Aspect-Item Score Fusion</head><p>After aspect-item scoring, we must aggregate the 𝐴 𝑒 𝑞 top 𝐾 𝐼 item lists for each aspect {𝐿 𝑎 } 𝑎 ∈ 𝒜 ext 𝑞 into a single ranked list of top-𝐾 𝐼 items for the query, 𝐿 𝑞 . We examine six aggregation strategies, which can be categorized as four score aggregation methods and two rank aggregation methods. The score-based variants convert the 𝐴 𝑒 𝑞 aspect-item scores into a query-item score 𝑆 𝑞,𝑖 using 1. AMean: Arithmatic mean 2. GMean: Geometric mean 3. HMean: Harmonic mean 4. Min: Minimum to return the final ranked list 𝐿 𝑞 . The two rank-based list aggregation methods include:</p><formula xml:id="formula_3">1. Borda: Borda count 2. R-R: Round-robin (interleaved) merge.</formula><p>In Borda Count, the score for a given item 𝑖 is calculated as follows:</p><formula xml:id="formula_4">∑ 𝐴 𝑒 𝑞 𝑗=1 (𝐾 𝐼 − rank 𝐿 𝑎 𝑗 𝑖 + 1)</formula><p>, where rank 𝐿 𝑎 𝑗 𝑖 is the rank of item 𝑖 in list 𝐿 𝑎 𝑗 . In a round-robin merge of 𝐴 𝑒 𝑞 lists, elements from each list are merged in a cyclic order, and when a conflict arises with a particular item, that item is skipped and the merge continues from the same list.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">LLM Reranking</head><p>In addition to Aspect Fusion, we also introduce an LLM reranking step for MA-RIR -to the best of our knowledge LLM reranking has not been previously studied in a reviewed-item setting. Our goal is to understand whether LLMs in cross-encoder (CE) or ZS listwise <ref type="bibr" target="#b7">[8]</ref> settings can fuse reviews of multi-aspect items for effective reranking.</p><p>After a list 𝐿 𝑞 of top 𝐾 𝐼 items is returned from the first stage, 𝐾 𝑅 reviews for each item need to be given to the LLM for what we call fusion-during-reranking. For Monolithic LF, these 𝐾 𝑅 reviews are simply the 𝐾 𝑅 reviews used for LF. For Aspect Fusion, since 𝐾 𝑅 reviews were used for fusion with each aspect, we propose to perform a round-robin merge of the top 𝐾 𝑅 review lists for each aspect in order to preserve a balanced distribution of reviews across aspects.</p><p>For a CE, reviews are simply concatenated and crossencoded with the query. For listwise reranking, our prompt provides the LLM with the query, initial ranked list of item IDs, reviews for each item, and instructions to order the items based on relevance to the query -the full listwise reranking prompt is in Figure <ref type="figure">11</ref> in the Appendix.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Experimental Method</head><p>We perform simulations on generated review data to study the effect of aspect balance across reviews and test our hypothesis that Aspect Fusion is more robust to aspect imbalances than Monolithic LF. While using synthetic data exposes our results to biases from the data generation process, we are able to generate synthetic review distributions with far greater control that would have been possible several years ago before the advent of LLMs. We specifically design experiments to study the performance of Aspect Fusion vs Monolithic LF under the presence of aspect imbalance, both in the form of disjointedness of aspects across reviews and imbalanced aspect frequencies.</p><p>In order to perform our experimentation, we need a dataset that has (a) multi-aspect queries and items, (b) GT aspect labels and (c) item reviews. To the best of our knowledge, there is no existing dataset with all of these properties. However, the recently-released Recipe-MPR dataset <ref type="bibr" target="#b2">[3]</ref> includes properties (a) and (b). We leverage this dataset and generate item reviews using GPT-4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>Distribution of ground truth (GT) and LLM-extracted aspects for Recipe-MPR queries and items </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Data Generation</head><p>We create four datasets for our experiments based on the Recipe-MPR dataset and our new LLM-generated reviews. Firstly, the fully overlapping dataset includes 20 reviews per item, which each mention all of the aspects of the item. Secondly, the fully disjoint dataset includes 10 reviews for each aspect of a given item. We also modify the fully disjoint dataset to create two datasets with imbalanced aspect frequencies. In the one rare aspect dataset, we remove all but one of the reviews for a randomly-selected aspect of each item. In the one popular aspect dataset, we keep all ten reviews for only one randomly-selected aspect of each item, and keep only one review for the other aspects.</p><p>In order to generate reviews, the GT aspects for each correct item in Recipe-MPR were used to prompt GPT-4. The total number of items for which there were GT aspects is 473. The distribution of the number of aspects per query and item is shown in Table <ref type="table">1</ref>. On average, each item has 2.2 aspects. The prompts we used to generate the reviews are included in the Appendix.</p><p>Recipe-MPR contains logical AND queries with ground truth (GT) labels for the query aspects. Refer to subsection 3.1 for an example of a query 𝑞 and its GT aspects, 𝒜 query 𝑞 . Since the focus of this paper is on MA-RIR, we only included the 411 queries whose associated correct item had at least two aspects. For each of these queries, we used two-shot examples to have GPT-4 extract "at least two non-overlapping spans" representing the relevant aspects in the query.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Experimental Details</head><p>For our query and review embeddings, we used TAS-B <ref type="bibr" target="#b8">[9]</ref>. For the listwise reranking experiments, we used the gpt-3.5turbo-16k model. For the CE reranking experiments, the model used was ms-marco-MiniLM-L-12-v2<ref type="foot" target="#foot_0">1</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Experimental Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>RQ1: Is Aspect Fusion helpful when item aspects are discussed disjointly across reviews?</head><p>Table 2 lists the mean absolute precision at 10 (MAP@10) and recall@10 (Re@10) of the stage 1 dense retrieval for various settings of 𝐾 𝑅 . The table is broken up according Both methods perform similarly on the fully overlapping dataset, but Aspect Fusion performs significantly better than Monolithic LF for the fully disjoint dataset and 𝐾 𝑅 &lt; 30. For the fully disjoint dataset, Aspect Fusion drops in performance for 𝐾 𝑅 &gt; 10 because when 𝐾 𝑅 exceeds the number of reviews per aspect, scoring is based on reviews that are irrelevant to the given aspect. This decline in performance does not apply in the fully overlapping case.</p><p>to whether the disjoint or overlapping reviews are used. Throughout this paper, we show results for 𝐾 𝐼 = 10. In our experiments we noticed that varying 𝐾 𝐼 led to minor changes in the results. For completeness, we report results for 𝐾 𝐼 = 5 in the Appendix.</p><p>We see that for the fully overlapping dataset, Aspect Fusion is approximately equivalent to the Monolithic LF approach, while for the fully disjoint dataset, Aspect Fusion score aggregation approaches (arithmetic mean, harmonic mean, and geometric mean) offer a significant improvement in performance compared to the Monolithic LF approach. This pattern offers empirical evidence that Aspect Fusion is better suited to disjoint aspect distributions than Monolithic LF. More specifically, this suggests that Monolithic LF is not symmetrical across aspects, and fails to consider information from each of the aspects in a balanced way.</p><p>Additionally, for the fully disjoint dataset, the performance of the aspect-based approach suffers for 𝐾 𝑅 &gt; 10. This can be explained by the fact that when 𝐾 𝑅 exceeds the number of disjoint reviews available for a given aspect (10 in this data), the aspect-based methods will score items based on reviews that are irrelevant to a given aspect. This could result in correct items receiving low scores for some aspects. We conclude that Aspect Fusion should use 𝐾 𝑅 ≤ 𝑅 𝑎,min 𝑖 , where 𝑅 𝑎,min 𝑖 is the smallest number of reviews for an item 𝑖 for an aspect, in order to avoid this performance drop.</p><p>Furthermore, the fact that the score aggregation methods outperform the rank-based aggregation methods (R-R and Borda) offers evidence that the embedding similarity scores contain significant information about how well an item's reviews align with a given query aspect, above and beyond that item's rank relative to the other candidate items. Considering the simplicity and strong performance of AMean score aggregation, we focus on this Aspect Fusion method in the remaining results below.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>RQ2: How does review aspect frequency imbalance affect Monlithic LF and Aspect Fusion?</head><p>Table <ref type="table" target="#tab_4">3</ref> shows the performance of the stage 1 dense retrieval for the balanced frequency (fully disjoint) dataset and the two datasets with imbalance in the review aspect frequency. These results are also presented visually in Figure <ref type="figure" target="#fig_3">4</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2</head><p>LF versus Aspect Fusion with six various aggregation functions for both the Fully Disjoint and Fully Overlapping datasets with 95% error margins in parentheses.</p><p>𝐾 𝑅 1 2 5 10 15 30 Dataset MAP@10 Re@10 MAP@10 Re@10 MAP@10 Re@10 MAP@10 Re@10 MAP@10 Re@10 MAP@10 Re@10  Dataset 𝐾 𝑅 1 2 5 10 15 30 MAP@10 Re@10 MAP@10 Re@10 MAP@10 Re@10 MAP@10 Re@10 MAP@10 Re@10 MAP@10 Re@10 Balanced Frequency Note that this imbalance can only be analyzed for the case where the reviews cover disjoint, rather than overlapping, aspects.</p><p>Based on our conclusion above, we focus on the results for 𝐾 𝑅 = 1 in this section, since for the datasets with imbalanced review aspect frequency, 𝑅 𝑎,min 𝑖 = 1. We see that there is a significant decrease in performance for all methods when aspect frequency imbalance is introduced. This result suggests that balance in reviews across aspects is helpful for both Monolithic LF and Aspect Fusion.</p><p>Furthermore, for 𝐾 𝑅 = 1, the performance of Monolithic LF decreases more when aspect frequency imbalance is introduced, compared to for Aspect Fusion methods. For example, the MAP@10 of Monolithic LF decreased from 0.41 to 0.36 on the one popular aspect dataset, representing a 12% drop, compared to a 7% drop for the Aspect Fusion approach. This suggests Aspect Fusion methods may be more robust to aspect frequency imbalance.</p><p>Lastly, we note that the performance of Monolithic LF decreases as 𝐾 𝑅 grows large, which occurs because any relevant item aspects that are infrequently reviewed (there is only 1 review for rare aspects in these datasets) will contribute less and less to the query-item score with an increase in 𝐾 𝑅 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 4</head><p>Effect of using LLM extracted query aspects vs. GT query aspects on Monolithic LF and Aspect Fusion. The values in parentheses indicate the 95% error margin.</p><p>𝐾 𝑅 1 2 5 10 15 30 MAP@10 Re@10 MAP@10 Re@10 MAP@10 Re@10 MAP@10 Re@10 MAP@10 Re@10 MAP@10 Re@10  𝐾 𝑅 1 2 5 10 15 30 MAP@10 Re@10 MAP@10 Re@10 MAP@10 Re@10 MAP@10 Re@10 MAP@10 Re@10 MAP@10 Re@10  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>RQ3: How does the use of extracted query aspects instead of GT query aspects affect Aspect Fusion?</head><p>Table <ref type="table">4</ref> shows the same results as Table <ref type="table">2</ref> except for the case of the extracted query aspects. These results are also presented visually in Figure <ref type="figure" target="#fig_4">5</ref>. At 𝐾 𝑅 = 1, while the MAP@10 of Aspect Fusion drops from 0.56 with GT aspects to 0.46 with extracted aspects, it remains higher than the 0.41 MAP@10 of Monolithic LF. This result implies that Aspect Fusion is useful even when GT query aspects are unknown. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>RQ4: Are LLMs effective MA-RIR rerankers?</head><p>Table <ref type="table" target="#tab_8">5</ref> summarizes the performance of the listwise<ref type="foot" target="#foot_1">2</ref> and cross-encoder rerankers. We see there is a beneficial effect to increasing the number of reviews 𝐾 𝑅 given to the language model for both CE and listwise reranking. Specifically, for reranking Monolithic LF on the fully disjoint dataset, listwise MAP@10 improves from 0.33 to 0.46, for 𝐾 𝑅 = 1 and 𝐾 𝑅 = 30, respectively. Similarly, CE MAP@10 improves from 0.35 MAP@10 at 𝐾 𝑅 = 1 to 0.47 at 𝐾 𝑅 = 30. We conjecture this large increase in MAP@10 with 𝐾 𝑅 is due to the quadratic nature of cross-attention across input text.</p><p>Since Aspect Fusion did best with low 𝐾 𝑅 values, a possible reason that we did not observe any benefits of LLM reranking for Aspect Fusion is because 𝐾 𝑅 was not high enough. Also, while some reranking settings showed 2nd stage MAP@10 increases over 1st stage values (such as at 𝐾 𝑅 = 30 reranking of Monolithic LF for fully disjoint data), when too few reviews were given to the reranker, the second stage sometimes made performance worse, such as at 𝐾 𝑅 = 1.</p><p>Figure <ref type="figure" target="#fig_6">7</ref> shows a heatmap of the ranks assigned to the correct items by the stage 1 retriever and stage 2 reranker. An effective reranker would consistently improve the ranks for the correct item, and this would result in the center of mass lying below the anti-diagonal. We see that this is indeed the case for a high value of 𝐾 𝑅 , but is not the case for a low value of 𝐾 𝑅 . The raw values underlying this figure are provided in the Appendix.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Related Work</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.1.">Multi-level Retrieval</head><p>The most relevant work to ours is that on RIR by Abdollah Pour et al. <ref type="bibr" target="#b0">[1]</ref>, which formulates the RIR problem and studies EF and LF approaches. In addition to LF with an off-the-shelf bi-encoder such as TAS-B, the authors also contrastively fine tune an encoder for LF and show performance improvements over off-the-shelf LF. Extending their contrastive learning approach to MA-RIR Aspect Fusion is a natural direction for future work. As mentioned in Section 2.2.1, Zhang and Balog <ref type="bibr" target="#b5">[6]</ref> have previously studied the Object Fusion problem, which allows for more general twolevel structures than RIR (in which a low-level document cannot describe more than one high-level object). However, they did not study neural techniques or multi-aspect retrieval, which are key to our work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.2.">Multi-aspect Retrieval</head><p>In addition to releasing Recipe-MPR, which was used to generate review distributions in this work, Zhang et al <ref type="bibr" target="#b2">[3]</ref> use the queries and items in Recipe-MPR in a multi-aspect question-answering setting, and find that FS GPT-3 listwise prompting achieves far superior accuracy to all other methods. However, it is computationally infeasible to use such listwise prompting methods for first stage retrieval. Kong et al. <ref type="bibr" target="#b9">[10]</ref> consider multiple aspects when calculating relevance scores in dense retrieval, but assume documents and queries contain a fixed number of aspects from known categories. Similarly, the label aggregation method of Kang et al. <ref type="bibr" target="#b10">[11]</ref> explicitly deals with multiple query aspects, but has fixed number of known categories.</p><p>Another methods called Multi-Aspect Dense Retrieval (MADRM) <ref type="bibr" target="#b9">[10]</ref> learns early fusion embeddings of documents and queries by extracting and then aggregating their aspects, and report improvements over Monolithic LF baselines. DORIS-MAE <ref type="bibr" target="#b11">[12]</ref> presents a dataset that deconstructs complex queries into hierarchies of aspects and sub-aspects. Unlike our aspect extraction approach, which extracts aspects from queries using few-shot prompting with an LLM, DORIS-MAE predefines these aspects and their corresponding topic hierarchy for both queries and document corpora.</p><p>Finally, some recent works study multi-aspect LLMdriven conversational recommendation <ref type="bibr" target="#b12">[13]</ref>, including work on preference elicitation over multiple aspects <ref type="bibr" target="#b13">[14]</ref> and knowledge graph based topic-guided chatbots <ref type="bibr" target="#b14">[15]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.">Conclusions</head><p>By extending reviewed-item-retrieval (RIR) to a setting with multi-aspect queries and items, we were able to both theoretically and empirically demonstrate the failure modes of Monolithic Late Fusion (LF) when there is an imbalance in how aspects are distributed across reviews. Specifically, since Monolithic LF is aspect-agnostic, it is subject to a frequency bias in its review selection towards more popular aspects. Furthermore, the disjointedness of aspects across reviews can induce a selection bias towards certain aspects if monolithic multi-aspect query embeddings are closer to review embeddings for those aspects.</p><p>To address these failure modes, we propose Aspect Fusion as a robust MA-RIR method for imbalanced review distributions. Using the recently released Recipe-MPR dataset, specifically designed to study multi-aspect retrieval, we design four generated datasets that allow us to empircally test the effects of review imbalances from aspect frequency and disjointess. Our experiments show that Aspect Fusion is much more robust to non-uniform review variations than Monolithic LF, outperforming the later with a 44% MAP@10 increase on some distributions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Appendix A A.1. LLM Prompts</head><p>We provide the prompts userd for overlapping review generation, disjoint review generation, query aspect extraction, and listwise reranking in Figures <ref type="figure">8, 9</ref>, 10, and 11 respectively.  MAP@5 Re@5 MAP@5 Re@5 MAP@5 Re@5 MAP@5 Re@5 MAP@5 Re@5 MAP@5 Re@5  MAP@5 Re@5 MAP@5 Re@5 MAP@5 Re@5 MAP@5 Re@5 MAP@5 Re@5 MAP@5 Re@ </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.2. Results for 𝐾 𝐼 = 5</head><p>In the main body we showed various results of experiments where 𝐾 𝐼 was set to 10. We found that varying 𝐾 𝐼 within this order of magnitude had a very small effect on the results, and therefore did not include findings for any other settings of 𝐾 𝐼 above. For completeness, in this section we duplicate the preceding tables but use 𝐾 𝐼 = 5 instead of 𝐾 𝐼 = 10. See Tables 6, 7, 8, and 9 for these results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.3. Data for Figure 7</head><p>In Figure <ref type="figure" target="#fig_6">7</ref>, we show the number of queries for which the correct item was ranked in a certain position by the stage 1 retriever and stage 2 reranker. The underlying data for this figure is shown in Table <ref type="table" target="#tab_15">10</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 8</head><p>Stage 1 retriever performance by whether labelled GT or extracted query aspects are used, with 𝐾 𝐼 = 5. The values in parentheses indicate the 95% error margin.</p><p>𝐾 𝑅 1 2 5 10 15 30 MAP@5 Re@5 MAP@5 Re@5 MAP@5 Re@5 MAP@5 Re@5 MAP@5 Re@5 MAP@5 Re@5 Extracted query aspects AMean .45 (.04) .60 (.05) .46 (.04) .63 (.05) .46 (.04) .62 (.05) .45 (.04) .61 (.05) .44 (.04) .62 (.05) .14 (.03) .20 (.04) Mono LF .39 (.04) .56 (.05) .39 (.04) .57 (.05) .40 (.04) .57 (.05) .40 (.04) .58 (.05) .42 (.04) .61 (.05) .39 (.04) .58 (.05) GT query aspects AMean .55 (.04) .71 (.04) .55 (.04) .70 (.04) .55 (.04) .70 (.04) .55 (.04) .69 (.04) .50 (.04) .67 (.05) .16 (.03) .21 (.04) Mono LF .39 (.04) .56 (.05) .39 (.04) .57 (.05) .40 (.04) .57 (.05) .40 (.04) .58 (.05) .42 (.04) .61 (.05) .39 (.04) .58 (.05)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 9</head><p>Stage 2 reranker performance by reranking method and setting of 𝐾 𝑅 , with 𝐾 𝐼 = 5. "No" refers to the case where no reranking is applied, and is equivalent to the stage 1 results. The values in parentheses indicate the 95% error margin.</p><p>𝐾 𝑅 1 2 5 10 15 30 MAP@5 Re@5 MAP@5 Re@5 MAP@5 Re@5 MAP@5 Re@5 MAP@5 Re@5 MAP@5 Re@5   </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: a) Top. In (Monolithic) LF, the full query is scored against all reviews, and the top 𝐾 𝑅 query-review scores are averaged for each item to produce a query-item score. b) Bottom.Aspect Fusion extracts aspects (i.e., query subspans) from a query, performs LF with each aspect, and aggregates the resulting top 𝐾 𝐼 item lists (i.e., one list per extracted aspect) to a final list.</figDesc><graphic coords="3,309.59,65.61,213.68,246.44" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Monolithic LF versus Aspect Fusion with AMean aggregation.Both methods perform similarly on the fully overlapping dataset, but Aspect Fusion performs significantly better than Monolithic LF for the fully disjoint dataset and 𝐾 𝑅 &lt; 30. For the fully disjoint dataset, Aspect Fusion drops in performance for 𝐾 𝑅 &gt; 10 because when 𝐾 𝑅 exceeds the number of reviews per aspect, scoring is based on reviews that are irrelevant to the given aspect. This decline in performance does not apply in the fully overlapping case.</figDesc><graphic coords="5,309.59,65.61,213.68,106.84" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Effect of Aspect Frequency. Aspect Fusion performs better than Monolithic LF for low values of 𝐾 𝑅 , but suffers for higher values of 𝐾 𝑅 . This pattern is explained in the discussion of RQ1.</figDesc><graphic coords="6,72.00,431.02,213.68,106.84" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Aspect Fusion with GT vs extracted query aspects with fully disjoint reviews. Although GT query aspects perform better, Aspect Fusion still offers an improvement over Monolithic LF with extracted query aspects.</figDesc><graphic coords="6,309.59,431.02,213.68,106.84" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Comparison of reranking methods. Performance generally increases as more reviews are included in the LLM -using too few reviews can hurt performance.</figDesc><graphic coords="7,72.00,387.58,213.68,160.26" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: Ranks of correct items after Stage 1 Monolithic LF (x axis) and Stage 2 Cross-Encoder reranking (y axis) on the Fully Disjoint dataset. Circle size is proportional to position frequency, and the center of mass is shown in red. For 𝐾 𝑅 = 1, most of the mass lies above the diagonal line, meaning the reranker has worsened performance. On the other hand, for 𝐾 𝑅 = 30, most of the mass lies below the diagonal line, meaning that the reranker has improved the performance.</figDesc><graphic coords="7,309.59,387.58,213.68,106.84" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 8 : 4 Figure 9 : 4 Figure 10 : 4 Figure 11 :</head><label>849410411</label><figDesc>Figure 8: Overlapping Review Generation Prompt Used with GPT-4</figDesc><graphic coords="9,329.94,149.27,170.95,73.46" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head></head><label></label><figDesc>item 𝑖,𝑗 ) ∈ ℰ exists if review 𝑑 𝑖,𝑟 ∈ 𝒟 𝑖 mentions aspect 𝑎 item</figDesc><table><row><cell></cell><cell>𝑖,𝑗</cell><cell>∈ 𝒜 item 𝑖</cell><cell>. We also</cell></row><row><cell>let 𝒜 𝑖 rel,𝑞</cell><cell>⊆ 𝒜 item 𝑖</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head></head><label></label><figDesc>biases retrieval towards one aspect. Consider two equally sized and fully disjoint review subsets 𝒟 𝑗 𝑖 ⊂ 𝒟 𝑖 and 𝒟 𝑘 𝑖 ⊂ 𝒟 𝑖 in which reviews mention only a single aspect 𝑎 rel 𝑖,𝑗 ∈ 𝒜</figDesc><table><row><cell>rel,𝑞 𝑖</cell><cell>or 𝑎 rel 𝑖,𝑘 ∈ 𝒜</cell><cell>rel,𝑞 𝑖</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 3</head><label>3</label><figDesc>Effect of aspect frequency imbalance on Monolithic LF and Aspect Fusion. "Balanced frequency" refers to the fully disjoint dataset where all item aspects have the same number of reviews. The values in parentheses indicate the 95% error margin.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>.04) .70 (.04) .47 (.04) .72 (.04) .47 (.04) .72 (.04) .46 (.04) .71 (.04) .46 (.04) .71 (.04)</head><label></label><figDesc>.15 (.03) .23 (.04) Mono LF .41 (.04) .67 (.05) .41 (.04) .69 (.04) .42 (.04) .69 (.04) .41 (.04) .70 (.04) .43 (.04) .70 (.04) .</figDesc><table><row><cell>Extracted</cell><cell>AMean .46 (</cell></row><row><cell>query aspects</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>41 (.04) .68 (.05)</head><label></label><figDesc></figDesc><table><row><cell>GT query</cell><cell>AMean .56 (</cell></row><row><cell>aspects</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_7"><head>.04) .77 (.04) .56 (.04) .77 (.04) .56 (.04) .78 (.04) .56 (.04) .76 (.04) .51 (.04) .75 (.04</head><label></label><figDesc></figDesc><table><row><cell>) .16 (.03) .24 (.04)</cell></row><row><cell>Mono LF .41 (.04) .67 (.05) .41 (.04) .69 (.04) .42 (.04) .69 (.04) .41 (.04) .70 (.04) .43 (.04) .70 (.04) .41 (.04) .68 (.05)</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_8"><head>Table 5</head><label>5</label><figDesc>Reranker performance of CE and LW LLMs for various 𝐾 𝑅 values. "No" refers to the case where no reranking is applied, and is equivalent to the stage 1 results. The values in parentheses indicate the 95% error margin.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_9"><head>50 (.04)</head><label></label><figDesc>(.04) .67 (.05) .38 (.04) .69 (.04) .40 (.04) .69 (.04) .44 (.04) .70 (.04) .47 (.04) .70 (.04) .47 (.04) .68 (.05) LW .33 (.04) .67 (.05) .34 (.04) .69 (.04) .38 (.04) .69 (.04) .40 (.04) .70 (.04) .42 (.04) .70 (.04) .46 (.04) .68 (.05) No .41 (.04) .67 (.05) .41 (.04) .69 (.04) .42 (.04) .69 (.04) .41 (.04) .70 (.04) .43 (.04) .70 (.04) .41 (.04) .68 (.05) Fully overlapping AMean CE .51 (.04) .74 (.04) .52 (.04) .76 (.04) .52 (.04) .77 (.04) .51 (.04) .75 (.04) .48 (.04) .75 (.04) .50 (.04) .74 (.04) LW .43 (.04) .74 (.04) .48 (.04) .76 (.04) .55 (.04) .77 (.04) .53 (.04) .75 (.04) .53 (.04) .75 (.04) .52 (.04) .74 (.04) No .52 (.04) .74 (.04) .52 (.04) .76 (.04) .52 (.04) .77 (.04) .52 (.04) .75 (.04) .52 (.04) .75 (.04) .53 (.04) .74 (.04) .73 (.04) .50 (.04) .75 (.04) .50 (.04) .74 (.04) .50 (.04) .75 (.04) .48 (.04) .75 (.04) .50 (.04) .75 (.04)</figDesc><table><row><cell></cell><cell></cell><cell>CE .36 (.04) .77 (.04) .51 (.04) .77 (.04) .53 (.04) .78 (.04) .53 (.04) .76 (.04) .53 (.04) .75 (.04) .16 (.03) .24 (.04)</cell></row><row><cell></cell><cell>AMean</cell><cell>LW .40 (.04) .77 (.04) .45 (.04) .77 (.04) .53 (.04) .78 (.04) .55 (.04) .76 (.04) .52 (.04) .75 (.04) .16 (.03) .24 (.04)</cell></row><row><cell>Fully</cell><cell></cell><cell>No .56 (.04) .77 (.04) .56 (.04) .77 (.04) .56 (.04) .78 (.04) .56 (.04) .76 (.04) .51 (.04) .75 (.04) .16 (.03) .24 (.04)</cell></row><row><cell>disjoint</cell><cell cols="2">Mono LF CE .35 Mono LF CE .</cell></row></table><note>LW .45 (.04) .73 (.04) .47 (.04) .75 (.04) .52 (.04) .74 (.04) .53 (.04) .75 (.04) .53 (.04) .75 (.04) .52 (.04) .75 (.04) No .50 (.04) .73 (.04) .51 (.04) .75 (.04) .51 (.04) .74 (.04) .51 (.04) .75 (.04) .50 (.04) .75 (.04) .50 (.04) .75 (.04)</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_10"><head>Table 6</head><label>6</label><figDesc>Stage 1 retriever performance for various aggregation functions and settings of 𝐾 𝑅 , with 𝐾 𝐼 = 5. All methods except Mono LF include Aspect Fusion. The values in parentheses indicate the 95% error margin.</figDesc><table><row><cell>𝐾 𝑅</cell><cell>1</cell><cell>2</cell><cell>5</cell><cell>10</cell><cell>15</cell><cell>30</cell></row><row><cell>Dataset</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_12"><head>Table 7</head><label>7</label><figDesc>Stage 1 retriever performance by review aspect frequency and settings of 𝐾 𝑅 , with 𝐾 𝑖 = 5. "Balanced" refers to the fully disjoint dataset where all item aspects have the same number of reviews. The values in parentheses indicate the 95% error margin.</figDesc><table><row><cell>𝐾 𝑅</cell><cell>1</cell><cell>2</cell><cell>5</cell><cell>10</cell><cell>15</cell><cell>30</cell></row><row><cell>Dataset</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_13"><head></head><label></label><figDesc>5 Balanced Frequency AMean .55 (.04) .71 (.04) .55 (.04) .70 (.04) .55 (.04) .70 (.04) .55 (.04) .69 (.04) .50 (.04) .67 (.05) .16 (.03) .21 (.04) Mono LF .39 (.04) .56 (.05) .39 (.04) .57 (.05) .40 (.04) .57 (.05) .40 (.04) .58 (.05) .42 (.04) .61 (.05) .39 (.04) .58 (.05) One Popular Aspect AMean .51 (.04) .65 (.05) .42 (.04) .58 (.05) .31 (.04) .45 (.05) .25 (.04) .39 (.05) .01 (.01) .02 (.01) .01 (.01) .02 (.01) Mono LF .35 (.04) .50 (.05) .30 (.04) .46 (.05) .26 (.04) .40 (.05) .24 (.04) .38 (.05) .25 (.04) .38 (.05) .25 (.04) .38 (.05) One Rare Aspect AMean .51 (.04) .65 (.05) .43 (.04) .59 (.05) .35 (.04) .48 (.05) .33 (.04) .45 (.05) .15 (.03) .20 (.04) .06 (.02) .08 (.03) Mono LF .37 (.04) .53 (.05) .34 (.04) .51 (.05) .30 (.04) .45 (.05) .28 (.04) .43 (.05) .30 (.04) .45 (.05) .28 (.04) .40 (.05)</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_14"><head></head><label></label><figDesc>Fully disjoint AMean CE .41 (.04) .71 (.04) .52 (.04) .70 (.04) .53 (.04) .70 (.04) .53 (.04) .69 (.04) .53 (.04) .67 (.05) .17 (.03) .21 (.04) LW .44 (.04) .71 (.04) .48 (.04) .70 (.04) .53 (.04) .70 (.04) .55 (.04) .69 (.04) .53 (.04) .67 (.05) .16 (.03) .21 (.04) No .55 (.04) .71 (.04) .55 (.04) .70 (.04) .55 (.04) .70 (.04) .55 (.04) .69 (.04) .50 (.04) .67 (.05) .16 (.03) .21 (.04) Mono LF CE .35 (.04) .56 (.05) .37 (.04) .57 (.05) .38 (.04) .57 (.05) .41 (.04) .58 (.05) .46 (.04) .61 (.05) .46 (.04) .58 (.05) LW .33 (.04) .56 (.05) .34 (.04) .57 (.05) .37 (.04) .57 (.05) .37 (.04) .58 (.05) .44 (.04) .61 (.05) .43 (.04) .58 (.05) No .39 (.04) .56 (.05) .39 (.04) .57 (.05) .40 (.04) .57 (.05) .40 (.04) .58 (.05) .42 (.04) .61 (.05) .39 (.04) .58 (.05) Fully overlapping AMean CE .51 (.04) .67 (.05) .51 (.04) .68 (.05) .51 (.04) .67 (.05) .50 (.04) .66 (.05) .49 (.04) .66 (.05) .51 (.04) .66 (.05) LW .48 (.04) .67 (.05) .50 (.04) .68 (.05) .52 (.04) .67 (.05) .51 (.04) .66 (.05) .53 (.04) .66 (.05) .53 (.04) .66 (.05) No .51 (.04) .67 (.05) .51 (.04) .68 (.05) .51 (.04) .67 (.05) .50 (.04) .66 (.05) .51 (.04) .66 (.05) .52 (.04) .66 (.05) Mono LF CE .50 (.04) .66 (.05) .50 (.04) .66 (.05) .50 (.04) .66 (.05) .50 (.04) .67 (.05) .48 (.04) .65 (.05) .49 (.04) .65 (.05) LW .47 (.04) .66 (.05) .48 (.04) .66 (.05) .52 (.04) .66 (.05) .53 (.04) .67 (.05) .51 (.04) .65 (.05) .51 (.04) .65 (.05) No .49 (.04) .66 (.05) .50 (.04) .66 (.05) .50 (.04) .66 (.05) .50 (.04) .67 (.05) .49 (.04) .65 (.05) .48 (.04) .65 (.05)</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_15"><head>Table 10</head><label>10</label><figDesc>Ranks assigned to the correct items for stage 1 retriever and stage 2 CE reranker with AMean aggregation Aspect Fusion.</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-12-v2</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">Approximately 1% of queries had only 9 items returned by the listwise reranker instead of 10 -this was an error in generative retrieval.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Self-supervised contrastive BERT fine-tuning for fusion-based reviewed-item retrieval</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Abdollah Pour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Farinneya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Toroghi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Korikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pesaranghader</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Sajed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bharadwaj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mavrin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sanner</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-031-28244-7_1</idno>
	</analytic>
	<monogr>
		<title level="m">European Conference on Information Retrieval</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="3" to="17" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Retrieval-augmented conversational recommendation with prompt-based semistructured natural language state tracking</title>
		<author>
			<persName><forename type="first">S</forename><surname>Kemper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Cui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Dicarlantonio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Korikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sanner</surname></persName>
		</author>
		<idno type="DOI">10.1145/3626772.3657670</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;24</title>
				<meeting>the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;24<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Recipe-MPR: A test collection for evaluating multi-aspect preference-based natural language retrieval</title>
		<author>
			<persName><forename type="first">H</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Korikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Farinneya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Abdollah Pour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bharadwaj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pesaranghader</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">Y</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">X</forename><surname>Lok</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sanner</surname></persName>
		</author>
		<idno type="DOI">10.1145/3539618.3591880</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;23</title>
				<meeting>the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;23<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="2744" to="2753" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Sentence-BERT: Sentence embeddings using Siamese BERT-networks</title>
		<author>
			<persName><forename type="first">N</forename><surname>Reimers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D19-1410</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">K</forename><surname>Inui</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Jiang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Ng</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">X</forename><surname>Wan</surname></persName>
		</editor>
		<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics<address><addrLine>Hong Kong, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="3982" to="3992" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Billion-scale similarity search with GPUs</title>
		<author>
			<persName><forename type="first">J</forename><surname>Johnson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Douze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jégou</surname></persName>
		</author>
		<idno type="DOI">10.1109/TBDATA.2019.2921572</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Big Data</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="535" to="547" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Design patterns for fusionbased object retrieval</title>
		<author>
			<persName><forename type="first">S</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Balog</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-56608-5_66</idno>
	</analytic>
	<monogr>
		<title level="m">European Conference on Information Retrieval</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="684" to="690" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings</title>
		<author>
			<persName><forename type="first">K</forename><surname>Ethayarajh</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D19-1006</idno>
		<ptr target="https://aclanthology.org/D19-1006.doi:10.18653/v1/D19-1006" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">K</forename><surname>Inui</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Jiang</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Ng</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">X</forename><surname>Wan</surname></persName>
		</editor>
		<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics<address><addrLine>Hong Kong, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="55" to="65" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">X</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Pradeep</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2305.02156</idno>
		<title level="m">Zero-shot listwise document reranking with a large language model</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Efficiently teaching an effective dense retriever with balanced topic aware sampling</title>
		<author>
			<persName><forename type="first">S</forename><surname>Hofstätter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-C</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-H</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hanbury</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval</title>
				<meeting>the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="113" to="122" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Multi-aspect dense retrieval</title>
		<author>
			<persName><forename type="first">W</forename><surname>Kong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Khadanga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">K</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bendersky</surname></persName>
		</author>
		<idno type="DOI">10.1145/3534678.3539137</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD &apos;22</title>
				<meeting>the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD &apos;22<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="3178" to="3186" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Learning to rank with multi-aspect relevance for vertical search</title>
		<author>
			<persName><forename type="first">C</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Tseng</surname></persName>
		</author>
		<idno type="DOI">10.1145/2124295.2124350</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM &apos;12</title>
				<meeting>the Fifth ACM International Conference on Web Search and Data Mining, WSDM &apos;12<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="453" to="462" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">DORIS-MAE: Scientific document retrieval using multi-level aspect-based queries</title>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Naidu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bergen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Paturi</surname></persName>
		</author>
		<idno type="DOI">10.5555/3666122.3667790</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 37th International Conference on Neural Information Processing Systems, NIPS &apos;23</title>
				<meeting>the 37th International Conference on Neural Information Processing Systems, NIPS &apos;23<address><addrLine>Red Hook, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Curran Associates Inc</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">A review of modern recommender systems using generative models (gen-recsys)</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Deldjoo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mcauley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Korikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sanner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ramisa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Vidal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sathiamoorthy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kasirzadeh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Milano</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD &apos;24)</title>
				<meeting>the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD &apos;24)<address><addrLine>Barcelona, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024">August 25-29, 2024. 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Bayesian optimization with LLM-based acquisition functions for natural language preference elicitation</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">E</forename><surname>Austin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Korikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Toroghi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sanner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 18th ACM Conference on Recommender Systems (RecSys&apos;24)</title>
				<meeting>the 18th ACM Conference on Recommender Systems (RecSys&apos;24)</meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">X</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-R</forename><surname>Wen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2010.04125</idno>
		<title level="m">Towards topic-guided conversational recommender system</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
