<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Multi-Aspect Reviewed-Item Retrieval via LLM Query Decomposition and Aspect Fusion</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anton Korikov</string-name>
          <email>anton.korikov@mail.utoronto.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>George Saad</string-name>
          <email>g.saad@mail.utoronto.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ethan Baron</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mustafa Khan</string-name>
          <email>mr.khan@mail.utoronto.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manav Shah</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Scott Sanner</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>SIGIR'24 Workshop on Information Retrieval's Role in RAG Systems</institution>
          ,
          <addr-line>July</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Toronto</institution>
          ,
          <addr-line>Toronto</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>pect Fusion can improve over LF by</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <abstract>
        <p>While user-generated product reviews often contain large quantities of information, their utility in addressing natural language product queries has been limited, with a key challenge being the need to aggregate information from multiple low-level sources (reviews) to a higher item level during retrieval. Existing methods for reviewed-item retrieval (RIR) typically take a late fusion (LF) approach which computes query-item scores by simply averaging the top-K query-review similarity scores for an item. However, we demonstrate that for multi-aspect queries and multi-aspect items, LF is highly sensitive to the distribution of aspects covered by reviews in terms of aspect frequency and the degree of aspect separation across reviews. To address these LF failures, we propose several novel aspect fusion (AF) strategies which include Large Language Model (LLM) query extraction and generative reranking. Our experiments show that for imbalanced review corpora, AF can improve over LF by a MAP@10 increase from 0.36 ± 0.04 to 0.52 ± 0.04, while achieving equivalent performance for balanced review corpora.</p>
      </abstract>
      <kwd-group>
        <kwd>Dense retrieval</kwd>
        <kwd>query decomposition</kwd>
        <kwd>multi-aspect retrieval</kwd>
        <kwd>LLM reranking</kwd>
        <kwd>late fusion</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        User-generated reviews are an abundant and rich source
of data that has the potential to be used to improve the
retrieval of reviewed-items such as products, services, or
destinations. However, a challenge of using review data
for retrieval is that information has to be aggregated across
multiple (low-level) reviews to a (higher) item-level during
retrieval. Recent work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], defining this Reviewed-Item
Retrieval setting as RIR, showed that state-of-the-art results
could be achieved by using a bi-encoder to aggregate review
information to an item-level in a process called late fusion
(LF). As opposed to aggregating review information to an
item-level before query-scoring (early fusion), LF first
computes query-review similarity to avoid losing information
before scoring, and then averages the top- query-review
similarity scores to get a query-item similarity score.
Recently, LF has been implemented by retrieval augmented
generation (RAG) driven conversational recommendation
(ConvRec) systems for generative recommendation,
explanation, and interactive question answering [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>In this paper, we extend RIR to a multi-aspect retrieval
setting, formulating what we call multi-aspect RIR (MA-RIR).
In this problem, our goal is to retrieve relevant items for
a multi-aspect query by using the reviews of multi-aspect
items. Specifically, for an item with multiple aspects, we
assume that each review describes at least one, and up to
all, of the item’s aspects.</p>
      <sec id="sec-1-1">
        <title>As our primary contributions:</title>
        <p>• We formulate the MA-RIR problem and identify
failure modes of LF under imbalanced review-aspect
distributions, considering imbalances due to both
aspect frequency and the degree of aspect separation
(M. Khan)</p>
        <p>CEUR</p>
        <p>
          ceur-ws.org
across reviews.
• We propose several novel aspect fusion strategies,
which include LLM query extraction and reranking,
to address failures of LF review-score aggregation
on imbalanced multi-aspect review distributions.
• We leverage a recently released multi-aspect
retrieval dataset, Recipe-MPR [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], with ground-truth
query- and item- aspect labels to generate four
multiaspect review distributions with various aspect
balance properties, and numerically evaluate the efect
of review-aspect balance on MA-RIR.
• Our simulations show that for imbalanced data,
Ascrease from 0.36 ± 0.04 to 0.52 ± 0.04 while achieving
equivalent performance for balanced data.
• We show that LLM reranking in both cross-encoder
and zero-shot (ZS) listwise reranking settings can
provide some improvements when given a large
enough number of reviews, but risk decreasing
performance when not enough reviews are provided.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <sec id="sec-2-1">
        <title>2.1. Neural IR</title>
        <p>Given a set of documents</p>
        <p>
          and a query  ∈  , an IR task
 ⟨ , ⟩
is to assign a similarity score  ,
∈ ℝ between the
query and each document  ∈ 
and return a ranked list
of top scoring documents. The standard first-stage
neuralIR method [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] for a large corpus is to first use a bi-encoder
(⋅) ∶  ∪ → ℝ
        </p>
        <p>to map a query  and document  to their
respective embeddings () =
function  (⋅, ⋅) ∶ ℝ  × ℝ

z and () =</p>
        <p>
          z . A similarity
→ ℝ, such as the dot product,
is then used to compute a query-document score  ,
 ( z , z ). For web-scale corpora, exact similarity search
for the top query-document scores is typically impractical,
=
so approximate similarity search algorithms [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] are used
instead.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Reviewed-Item Retrieval</title>
        <p>
          2.2.1. Problem Formulation
Information retrieval across two-level data structures was
previously studied by Zhang and Balog [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Specifically,
        </p>
        <sec id="sec-2-2-1">
          <title>Zhang and Balog define the</title>
          <p>Object Retrieval problem, where
(high-level) objects are described by multiple (low-level)
documents. Given a query, the task is to retrieve high-level
objects by using information in the low-level documents.</p>
          <p>
            To investigate a special case of object retrieval where
the goal is retrieving items (e.g., products, destinations)
based on their reviews, Abdollah Pour et al. [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] recently
proposed the Reviewed-item Retrieval (RIR) problem. In
the  ⟨ℐ ,  , ⟩
          </p>
          <p>problem, there is a set of items ℐ, where
every item  is a high-level object. Each item is described by
a set of reviews (i.e., “low-level documents”)   ⊂  , and
the  ’th review of item  is  , ∈   . The main diference
between RIR and Object Retrieval is that in RIR a low-level
document  , cannot describe more than one high-level
object  , while Object Retrieval allows for more general
two-level structures. Given a query  ∈ 
and a score  ,
ranked list</p>
          <p>of top-  scoring items:
between  and each item  , the goal of RIR is to retrieve a

 = ( 1, ...,    )
s.t.  1 ∈ arg max{ , }</p>
          <p>
            ,  , ≥  , +1 ,
2.2.2. Fusion
To get a query-item score  , using an item’s review set
  , review information needs to be aggregated to an item
level: this process is called fusion. Two alternatives exist for
fusion [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]: if low-level information is aggregated before a
query is used for scoring, it is called Early Fusion (EF) — in
contrast, if the aggregation occurs after
query-scoring, it is
called Late Fusion (LF).
          </p>
          <p>
            For EF in RIR, Abdollah Pour et al. [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] experiment with
mean-pooling and contrastive learning methods to create an
item embedding z ∈ ℝ from review embeddings {z }∈  .
They then directly compute the similarity between z and a
query embedding z as the query-item score  ,
=  ( z , z ).
          </p>
          <p>For LF in RIR, these authors first compute query-review
similarity scores  ,</p>
          <p>,
top   query-review scores for each item:
these scores into a query-item score  , by averaging the
=  ( z , z , ). They then aggregate</p>
          <p>1
  =1
 , =
∑  , ,
.</p>
          <p>(1)</p>
          <p>
            Numerical evaluations performed for EF and LF for RIR
demonstrate that EF has significantly worse performance
than LF [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ], and Abdollah Pour et al. conjecture that EF
performs worse because it loses fine-grained review
information before query-scoring. In contrast, by delaying fusion,
LF preserves review-level information during query-scoring.
Due to these findings, we do not study EF for MA-RIR, rather,
we focus on developing Aspect Fusion as an extension of
          </p>
        </sec>
        <sec id="sec-2-2-2">
          <title>LF, discussed next.</title>
          <p>3.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Multi-Aspect Reviewed Item</title>
    </sec>
    <sec id="sec-4">
      <title>Retrieval</title>
      <sec id="sec-4-1">
        <title>3.1. Multi-Aspect Queries</title>
        <p>This paper focuses on retrieving relevant items using their
reviews for a multi-aspect query, such as “Can I have a
meatball recipe that doesn’t take too long? ”. We define
a query aspect to be a sub-span of a multi-aspect query
that represents a distinct topic (or facet) in the query, for
instance the sub-spans ““meatball” and “doesn’t take too
long” in the previous sentence. While there is ambiguity in
identifying which sub-spans, if any, in a query should be
considered aspects, this sub-span based definition is a simple
way to represent aspects and is conducive to overlap-based
evaluations of aspect extraction such as intersection-over
 as  
union (IOU). Formally, we denote the set of aspects in query
query, where the  ′th query aspect is  ,
query
∈  
query.</p>
        <p>In this work, multi-aspect queries are assumed to be logical
AND queries for all aspects, though an aspect itself can
represent other logical operators such as XOR (e.g. a query
aspect may be “chicken or beef ”). Finally, we assume all
query aspects are equally important — a further discussion
of weighted multi-aspect retrieval can be found in Section
7.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Multi-Aspect Reviewed-Items</title>
        <p>In addition to considering multi-aspect queries, we also
consider multi-aspect items described by reviews. For instance,
a multi-aspect item that is relevant to the multi-aspect query
example above might be a recipe titled “Beef meatballs
cooked in canned soup, ready in 25 minutes”. However,
(bottom) — no review mentions more than one aspect, and some
aspects are mentioned much more frequently than others.
since our goal is to isolate the properties of review-based
retrieval, we assume that no such natural language (NL)
item-level description is available. Instead, we assume that
the item’s aspects are described in reviews. Obviously,
itemlevel descriptions (e.g. titles) are often available in practice,
so a prime direction for future work is fusion across multiple
levels of NL data during reviewed-item retrieval.</p>
        <p>Examples of reviews describing the item in the previous
paragraph, which has aspects “meatballs” and “ready in 25
minutes”, are shown in Figure 1. In this paper, we assume
that a review  , must mention at least one item aspect
 ,item ∈   item and could mention up to all item aspects.
Formally, the distribution of item aspects across reviews
can be defined with a bipartite aspect distribution graph
 = { ,  item, ℰ } where an edge ( , ,  ,item) ∈ ℰ exists if
item ∈   item. We also
review  , ∈   mentions aspect  ,</p>
        <p>rel,
let   ⊆   item represent the set of item-aspects that
are relevant to a query and should be considered during
retrieval. We define the   −  ⟨ , ℰ ,  , ⟩ problem as
the task of retrieving a ranked list of relevant multi-aspect
items   for a multi-aspect query  , where  =  item ∪
 query .</p>
      </sec>
      <sec id="sec-4-3">
        <title>3.3. Multi-Aspect Review Distributions</title>
        <p>As we will demonstrate with numerical simulations on
LLMgenerated review data, understanding review distributions
in terms of aspect frequency and degree of aspect separation
between reviews is key to designing successful MA-RIR
techniques. Figure 1 shows two extremes of aspect
distributions that are among the distributions we explore in our
experiments.
3.3.1. Fully Overlapping Distributions
3.3.2. Degree of Separation and Aspect Frequency
In contrast to the case of perfect review-aspect balance,
Figure 1b) shows an extreme case of aspect imbalance. Firstly,
one aspect is mentioned much more frequently than
another — this is an aspect frequency imbalance. Secondly,
each review mentions only one aspect — this is a maximal
degree of separation of aspects across reviews (fully disjoint).
Mathematically,  has |  i1tem| (disjoint) star components
where some stars have a singificantly higher degree than
others. In the next section, we discuss the negative efects of
imbalanced review-aspect distributions on LF performance
on MA-RIR, and propose aspect fusion as a method for
mitigating these negative efects.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Aspect Fusion for MA-RIR</title>
      <sec id="sec-5-1">
        <title>4.1. Desiderata of Aspect Fusion</title>
        <p>Recall that LF computes a query-item similarity score by
averaging the top   query-review similarity scores using
Equation (1). For MA-RIR, we propose two desiderata for
the aspect distribution in the top   reviews during fusion.
Desideratum 1: Since we assume multi-aspect queries
are AND queries, if an item contains   rel, relevant aspects
for query  , the   reviews used for LF should mention all
  rel, of those relevant aspects.</p>
        <p>Desideratum 2: As mentioned in Section 3.1, we also
assume all query aspects are equally important, which
implies that aspect frequency should be identical for all
  rel, aspects in the top   retrieved reviews.</p>
        <p>In a fully overlapping distribution (Figure 1a) where each
review mentions each aspect, both Desiderata 1 and 2 are
guaranteed to be satisfied by any subset of item reviews.
We thus argue that standard LF should be suficient when
reviews fully overlap in aspects, and focus on developing
Aspect Fusion methods that address the failures of LF for
imbalanced review-aspect distributions.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Failures of LF under Review-Aspect</title>
      </sec>
      <sec id="sec-5-3">
        <title>Imbalance</title>
        <p>Standard LF will fail to achieve Desiderata 1 and 2 for
reviewaspect distributions with at least some degree of
disjointedness and aspect frequency imbalance under the following
assumptions.</p>
        <p>Aspect Popularity Bias Aspects that are reviewed more
frequently are more likely to be mentioned in the top  
reviews.</p>
        <p>
          The non-isotropic nature of the
embedding space [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] biases retrieval towards one aspect. Consider
two equally sized and fully disjoint review subsets   ⊂  
and    ⊂   in which reviews mention only a single aspect

 ,
rel ∈  
rel, or  ,
rel ∈
        </p>
        <p>rel, , respectively, for some item  .</p>
        <p>If query-review similarity scores tend to be higher when
a review describes aspect  ,</p>
        <p>rel as opposed to aspect  ,r el, LF
will be more likely to select reviews from review set    for
the top   fused reviews. For example, in Figure 1b), the
reviews describing cooking time might be more likely to
score higher with the full query than reviews describing
“meatballs”.</p>
      </sec>
      <sec id="sec-5-4">
        <title>4.3. Aspect Fusion</title>
        <p>To address these failures of LF on imbalanced data, we
introduce several methods for Aspect Fusion, which explicitly
utilizes the multi-aspect nature of reviews during fusion to
address multi-aspect queries.
4.3.1. Aspect Extraction
let  e = |  ext|.</p>
        <p>To extract aspects from queries, we propose to use
fewshot (FS) prompting with an LLM. Though the number of
query-aspects is typically not known a priori, since we study
multi-aspect queries, our proposed prompt (Figure 10 in
the Appendix) asks that at least two non-overlapping
subspans of the query be extracted as aspects. We represent
the set of extracted query aspects for query  as  
ext and
4.3.2. Aspect-Item Scoring
The key to Aspect Fusion is directly computing aspect-review
similarity scores  ,
,
, as opposed to similarity scores
between reviews and a monolithic query, since the later can
be negatively impacted by review-aspect distribution
imbalance. Aspect similarity scores are computed by separately
embedding each extracted aspect  ∈  
and calculating  ,</p>
        <p>, =  ( z , z , ). Then, aspect-item scores
 ,</p>
        <p>∈ ℝ are obtained by aggregating the top  
aspectreview scores via Eq. (1) with aspect-review scores instead
of query-review scores. For each extracted aspect  , the
ext as z
= ()
top-  scoring items are ordered into a list

 = ( 1, ...,    )
s.t.  1 ∈ arg max{ , }</p>
        <p>,  , ≥  , +1 ,
2a) shows how standard (monolithic) LF will take a biased
review sample of the first aspect since it is more frequently
mentioned by reviews and z happens to be closer to those
review embeddings. To diferentiate between LF for RIR
proposed by Abdollah Pour et al., and Aspect Fusion, we
will refer to LF as Monolithic LF since it uses the full query.
After aspect-item scoring, we must aggregate the   top
  item lists for each aspect { 
}
 ∈  ext into a single ranked
into a query-item score  , using
list of top-  items for the query,   . We examine six
aggregation strategies, which can be categorized as four score
aggregation methods and two rank aggregation methods.
The score-based variants convert the   aspect-item scores
1. AMean: Arithmatic mean
2. GMean: Geometric mean</p>
        <sec id="sec-5-4-1">
          <title>3. HMean: Harmonic mean</title>
        </sec>
        <sec id="sec-5-4-2">
          <title>4. Min: Minimum</title>
          <p>to return the final ranked list   . The two rank-based list
aggregation methods include:</p>
        </sec>
        <sec id="sec-5-4-3">
          <title>1. Borda: Borda count</title>
        </sec>
        <sec id="sec-5-4-4">
          <title>2. R-R: Round-robin (interleaved) merge.</title>
          <p>In Borda Count, the score for a given item  is calculated
as follows: ∑</p>
          <p>=1 (  − rank 

+ 1), where rank 
is the
rank of item  in list    . In a round-robin merge of   lists,
elements from each list are merged in a cyclic order, and
when a conflict arises with a particular item, that item is
skipped and the merge continues from the same list.</p>
        </sec>
      </sec>
      <sec id="sec-5-5">
        <title>4.4. LLM Reranking</title>
        <p>
          In addition to Aspect Fusion, we also introduce an LLM
reranking step for MA-RIR — to the best of our
knowledge LLM reranking has not been previously studied in
a reviewed-item setting. Our goal is to understand whether
LLMs in cross-encoder (CE) or ZS listwise [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] settings can
fuse reviews of multi-aspect items for efective reranking.
        </p>
        <sec id="sec-5-5-1">
          <title>After a list</title>
          <p>of top   items is returned from the first
stage,   reviews for each item need to be given to the LLM
for what we call fusion-during-reranking. For Monolithic LF,
these   reviews are simply the   reviews used for LF. For
Aspect Fusion, since   reviews were used for fusion with
each aspect, we propose to perform a round-robin merge of
a balanced distribution of reviews across aspects.
the top   review lists for each aspect in order to preserve</p>
          <p>For a CE, reviews are simply concatenated and
crossencoded with the query. For listwise reranking, our prompt
provides the LLM with the query, initial ranked list of item
IDs, reviews for each item, and instructions to order the
items based on relevance to the query — the full listwise
reranking prompt is in Figure 11 in the Appendix.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Experimental Method</title>
      <p>We perform simulations on generated review data to study
the efect of aspect balance across reviews and test our
hypothesis that Aspect Fusion is more robust to aspect
imbalances than Monolithic LF. While using synthetic data
exposes our results to biases from the data generation process,
we are able to generate synthetic review distributions with
far greater control that would have been possible several
years ago before the advent of LLMs. We specifically design
experiments to study the performance of Aspect Fusion vs
Monolithic LF under the presence of aspect imbalance, both
in the form of disjointedness of aspects across reviews and
imbalanced aspect frequencies.</p>
      <p>
        In order to perform our experimentation, we need a
dataset that has (a) multi-aspect queries and items, (b) GT
aspect labels and (c) item reviews. To the best of our
knowledge, there is no existing dataset with all of these properties.
However, the recently-released Recipe-MPR dataset [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
includes properties (a) and (b). We leverage this dataset and
generate item reviews using GPT-4.
We create four datasets for our experiments based on the
Recipe-MPR dataset and our new LLM-generated reviews.
      </p>
      <p>Firstly, the fully overlapping dataset includes 20 reviews
per item, which each mention all of the aspects of the item.
Secondly, the fully disjoint dataset includes 10 reviews for
each aspect of a given item. We also modify the fully
disjoint dataset to create two datasets with imbalanced aspect
frequencies. In the one rare aspect dataset, we remove all
but one of the reviews for a randomly-selected aspect of
each item. In the one popular aspect dataset, we keep all ten
reviews for only one randomly-selected aspect of each item,
and keep only one review for the other aspects.</p>
      <p>In order to generate reviews, the GT aspects for each
correct item in Recipe-MPR were used to prompt GPT-4.
The total number of items for which there were GT aspects
is 473. The distribution of the number of aspects per query
and item is shown in Table 1. On average, each item has 2.2
aspects. The prompts we used to generate the reviews are
included in the Appendix.</p>
      <p>Recipe-MPR contains logical AND queries with ground
truth (GT) labels for the query aspects. Refer to subsection
3.1 for an example of a query  and its GT aspects,   query.
Since the focus of this paper is on MA-RIR, we only included
the 411 queries whose associated correct item had at least
two aspects. For each of these queries, we used two-shot
examples to have GPT-4 extract “at least two non-overlapping
spans” representing the relevant aspects in the query.</p>
      <sec id="sec-6-1">
        <title>5.2. Experimental Details</title>
        <p>
          For our query and review embeddings, we used TAS-B [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
For the listwise reranking experiments, we used the
gpt-3.5turbo-16k model. For the CE reranking experiments, the
model used was ms-marco-MiniLM-L-12-v2 1.
        </p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. Experimental Results</title>
      <p>RQ1: Is Aspect Fusion helpful when item aspects are
discussed disjointly across reviews?</p>
      <p>Table 2 lists the mean absolute precision at 10 (MAP@10)
and recall@10 (Re@10) of the stage 1 dense retrieval for
various settings of   . The table is broken up according
1https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-12-v2
to whether the disjoint or overlapping reviews are used.
Throughout this paper, we show results for   = 10. In
our experiments we noticed that varying   led to minor
changes in the results. For completeness, we report results
for   = 5 in the Appendix.</p>
      <p>We see that for the fully overlapping dataset, Aspect
Fusion is approximately equivalent to the Monolithic LF
approach, while for the fully disjoint dataset, Aspect Fusion
score aggregation approaches (arithmetic mean, harmonic
mean, and geometric mean) ofer a significant improvement
in performance compared to the Monolithic LF approach.
This pattern ofers empirical evidence that Aspect Fusion
is better suited to disjoint aspect distributions than
Monolithic LF. More specifically, this suggests that Monolithic
LF is not symmetrical across aspects, and fails to consider
information from each of the aspects in a balanced way.</p>
      <p>Additionally, for the fully disjoint dataset, the
performance of the aspect-based approach sufers for   &gt; 10.
This can be explained by the fact that when   exceeds the
number of disjoint reviews available for a given aspect (10 in
this data), the aspect-based methods will score items based
on reviews that are irrelevant to a given aspect. This could
result in correct items receiving low scores for some aspects.
We conclude that Aspect Fusion should use   ≤  , min,
where  , min is the smallest number of reviews for an item
 for an aspect, in order to avoid this performance drop.</p>
      <p>Furthermore, the fact that the score aggregation methods
outperform the rank-based aggregation methods (R-R and
Borda) ofers evidence that the embedding similarity scores
contain significant information about how well an item’s
reviews align with a given query aspect, above and beyond
that item’s rank relative to the other candidate items.
Considering the simplicity and strong performance of AMean
score aggregation, we focus on this Aspect Fusion method
in the remaining results below.</p>
      <p>RQ2: How does review aspect frequency imbalance
afect Monlithic LF and Aspect Fusion?</p>
      <p>Table 3 shows the performance of the stage 1 dense
retrieval for the balanced frequency (fully disjoint) dataset and
the two datasets with imbalance in the review aspect
frequency. These results are also presented visually in Figure 4.</p>
      <p>Note that this imbalance can only be analyzed for the case
where the reviews cover disjoint, rather than overlapping,
aspects.</p>
      <p>Based on our conclusion above, we focus on the results for
  = 1 in this section, since for the datasets with imbalanced
review aspect frequency,  , min = 1. We see that there
is a significant decrease in performance for all methods
when aspect frequency imbalance is introduced. This result
suggests that balance in reviews across aspects is helpful
for both Monolithic LF and Aspect Fusion.</p>
      <p>Furthermore, for   = 1, the performance of Monolithic
LF decreases more when aspect frequency imbalance is
introduced, compared to for Aspect Fusion methods. For
example, the MAP@10 of Monolithic LF decreased from
0.41 to 0.36 on the one popular aspect dataset, representing
a 12% drop, compared to a 7% drop for the Aspect Fusion
approach. This suggests Aspect Fusion methods may be
more robust to aspect frequency imbalance.</p>
      <p>Lastly, we note that the performance of Monolithic LF
decreases as   grows large, which occurs because any
relevant item aspects that are infrequently reviewed (there is
only 1 review for rare aspects in these datasets) will
contribute less and less to the query-item score with an increase
in   .</p>
      <p>Table 5 summarizes the performance of the listwise2 and
cross-encoder rerankers. We see there is a beneficial efect
to increasing the number of reviews   given to the
language model for both CE and listwise reranking. Specifically,
for reranking Monolithic LF on the fully disjoint dataset,
listwise MAP@10 improves from 0.33 to 0.46, for   = 1
and   = 30, respectively. Similarly, CE MAP@10 improves
from 0.35 MAP@10 at   = 1 to 0.47 at   = 30. We
conjecture this large increase in MAP@10 with   is due to the
quadratic nature of cross-attention across input text.
2Approximately 1% of queries had only 9 items returned by the listwise
reranker instead of 10 — this was an error in generative retrieval.</p>
      <p>Since Aspect Fusion did best with low   values, a
possible reason that we did not observe any benefits of LLM
reranking for Aspect Fusion is because   was not high
enough. Also, while some reranking settings showed 2nd
stage MAP@10 increases over 1st stage values (such as at
  = 30 reranking of Monolithic LF for fully disjoint data),
when too few reviews were given to the reranker, the
second stage sometimes made performance worse, such as at
  = 1.</p>
      <p>Figure 7 shows a heatmap of the ranks assigned to the
correct items by the stage 1 retriever and stage 2 reranker.
An efective reranker would consistently improve the ranks
for the correct item, and this would result in the center
of mass lying below the anti-diagonal. We see that this is
indeed the case for a high value of   , but is not the case
for a low value of   . The raw values underlying this figure
are provided in the Appendix.</p>
    </sec>
    <sec id="sec-8">
      <title>7. Related Work</title>
      <sec id="sec-8-1">
        <title>7.1. Multi-level Retrieval</title>
        <p>
          The most relevant work to ours is that on RIR by
Abdollah Pour et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], which formulates the RIR problem and
studies EF and LF approaches. In addition to LF with an
of-the-shelf bi-encoder such as TAS-B, the authors also
contrastively fine tune an encoder for LF and show
performance improvements over of-the-shelf LF. Extending their
contrastive learning approach to MA-RIR Aspect Fusion is
a natural direction for future work. As mentioned in
Section 2.2.1, Zhang and Balog [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] have previously studied the
Object Fusion problem, which allows for more general
twolevel structures than RIR (in which a low-level document
cannot describe more than one high-level object).
However, they did not study neural techniques or multi-aspect
retrieval, which are key to our work.
        </p>
      </sec>
      <sec id="sec-8-2">
        <title>7.2. Multi-aspect Retrieval</title>
        <p>
          In addition to releasing Recipe-MPR, which was used to
generate review distributions in this work, Zhang et al [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]
use the queries and items in Recipe-MPR in a multi-aspect
question-answering setting, and find that FS GPT-3 listwise
prompting achieves far superior accuracy to all other
methods. However, it is computationally infeasible to use such
listwise prompting methods for first stage retrieval. Kong
et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] consider multiple aspects when calculating
relevance scores in dense retrieval, but assume documents and
queries contain a fixed number of aspects from known
categories. Similarly, the label aggregation method of Kang et
al. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] explicitly deals with multiple query aspects, but has
ifxed number of known categories.
        </p>
        <p>
          Another methods called Multi-Aspect Dense Retrieval
(MADRM) [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] learns early fusion embeddings of
documents and queries by extracting and then aggregating their
aspects, and report improvements over Monolithic LF
baselines. DORIS-MAE [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] presents a dataset that deconstructs
complex queries into hierarchies of aspects and sub-aspects.
Unlike our aspect extraction approach, which extracts
aspects from queries using few-shot prompting with an LLM,
DORIS-MAE predefines these aspects and their
corresponding topic hierarchy for both queries and document corpora.
        </p>
        <p>
          Finally, some recent works study multi-aspect
LLMdriven conversational recommendation [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], including work
on preference elicitation over multiple aspects [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] and
knowledge graph based topic-guided chatbots [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>8. Conclusions</title>
      <p>By extending reviewed-item-retrieval (RIR) to a setting with
multi-aspect queries and items, we were able to both
theoretically and empirically demonstrate the failure modes
of Monolithic Late Fusion (LF) when there is an imbalance
in how aspects are distributed across reviews. Specifically,
since Monolithic LF is aspect-agnostic, it is subject to a
frequency bias in its review selection towards more popular
aspects. Furthermore, the disjointedness of aspects across
reviews can induce a selection bias towards certain aspects
if monolithic multi-aspect query embeddings are closer to
review embeddings for those aspects.</p>
      <p>To address these failure modes, we propose Aspect Fusion
as a robust MA-RIR method for imbalanced review
distributions. Using the recently released Recipe-MPR dataset,
specifically designed to study multi-aspect retrieval, we
design four generated datasets that allow us to empircally test
the efects of review imbalances from aspect frequency and
disjointess. Our experiments show that Aspect Fusion is
much more robust to non-uniform review variations than
Monolithic LF, outperforming the later with a 44% MAP@10
increase on some distributions.</p>
    </sec>
    <sec id="sec-10">
      <title>A. Appendix A</title>
      <sec id="sec-10-1">
        <title>A.1. LLM Prompts</title>
        <p>We provide the prompts userd for overlapping review
generation, disjoint review generation, query aspect extraction,
and listwise reranking in Figures 8, 9, 10, and 11 respectively.
A.2. Results for   = 5
In the main body we showed various results of experiments
where   was set to 10. We found that varying   within this
order of magnitude had a very small efect on the results,
and therefore did not include findings for any other settings
of   above. For completeness, in this section we duplicate
the preceding tables but use   = 5 instead of   = 10. See
Tables 6, 7, 8, and 9 for these results.</p>
      </sec>
      <sec id="sec-10-2">
        <title>A.3. Data for Figure 7</title>
        <p>In Figure 7, we show the number of queries for which the
correct item was ranked in a certain position by the stage 1
retriever and stage 2 reranker. The underlying data for this
ifgure is shown in Table 10.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>M. M. Abdollah Pour</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Farinneya</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Toroghi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Korikov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Pesaranghader</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Sajed</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bharadwaj</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Mavrin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Sanner</surname>
          </string-name>
          ,
          <article-title>Self-supervised contrastive BERT fine-tuning for fusion-based reviewed-item retrieval</article-title>
          ,
          <source>in: European Conference on Information Retrieval</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>17</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978- 3-
          <fpage>031</fpage>
          - 28244-
          <issue>7</issue>
          _
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kemper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Dicarlantonio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Korikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sanner</surname>
          </string-name>
          ,
          <article-title>Retrieval-augmented conversational recommendation with prompt-based semistructured natural language state tracking</article-title>
          ,
          <source>in: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , SIGIR '24,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .1145/3626772. 3657670.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Korikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Farinneya</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Abdollah Pour</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bharadwaj</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Pesaranghader</surname>
            ,
            <given-names>X. Y.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Y. X.</given-names>
          </string-name>
          <string-name>
            <surname>Lok</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Jones</surname>
          </string-name>
          , S. Sanner,
          <article-title>RecipeMPR: A test collection for evaluating multi-aspect preference-based natural language retrieval</article-title>
          ,
          <source>in: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , SIGIR '23,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2023</year>
          , p.
          <fpage>2744</fpage>
          -
          <lpage>2753</lpage>
          . doi:
          <volume>10</volume>
          .1145/3539618.3591880.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          , Sentence-BERT:
          <article-title>Sentence embeddings using Siamese BERT-networks</article-title>
          , in: K. Inui,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          Wan (Eds.),
          <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>3982</fpage>
          -
          <lpage>3992</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          - 1410.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , M. Douze,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jégou</surname>
          </string-name>
          ,
          <article-title>Billion-scale similarity search with GPUs</article-title>
          ,
          <source>IEEE Transactions on Big Data</source>
          <volume>7</volume>
          (
          <year>2021</year>
          )
          <fpage>535</fpage>
          -
          <lpage>547</lpage>
          . doi:
          <volume>10</volume>
          .1109/TBDATA.
          <year>2019</year>
          .
          <volume>2921572</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , K. Balog,
          <article-title>Design patterns for fusionbased object retrieval</article-title>
          ,
          <source>in: European Conference on Information Retrieval</source>
          , Springer,
          <year>2017</year>
          , pp.
          <fpage>684</fpage>
          -
          <lpage>690</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -56608-5_
          <fpage>66</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ethayarajh</surname>
          </string-name>
          ,
          <article-title>How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings</article-title>
          , in: K. Inui,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          Wan (Eds.),
          <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>55</fpage>
          -
          <lpage>65</lpage>
          . URL: https://aclanthology.org/D19-1006. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          -1006.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>X.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pradeep</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Zero-shot listwise document reranking with a large language model</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2305</volume>
          .
          <fpage>02156</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hofstätter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-C.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-H.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          ,
          <article-title>Eficiently teaching an efective dense retriever with balanced topic aware sampling</article-title>
          ,
          <source>in: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>113</fpage>
          -
          <lpage>122</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>W.</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Khadanga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Xu,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Bendersky, Multi-aspect dense retrieval</article-title>
          ,
          <source>in: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining</source>
          , KDD '22,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2022</year>
          , p.
          <fpage>3178</fpage>
          -
          <lpage>3186</lpage>
          . doi:
          <volume>10</volume>
          .1145/3534678. 3539137.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Tseng</surname>
          </string-name>
          ,
          <article-title>Learning to rank with multi-aspect relevance for vertical search</article-title>
          ,
          <source>in: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining</source>
          , WSDM '12,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2012</year>
          , p.
          <fpage>453</fpage>
          -
          <lpage>462</lpage>
          . doi:
          <volume>10</volume>
          .1145/2124295.2124350.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Naidu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bergen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Paturi</surname>
          </string-name>
          , DORIS-MAE:
          <article-title>Scientific document retrieval using multi-level aspect-based queries</article-title>
          ,
          <source>in: Proceedings of the 37th International Conference on Neural Information Processing Systems</source>
          , NIPS '23, Curran Associates Inc.,
          <string-name>
            <surname>Red</surname>
            <given-names>Hook</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .5555/3666122. 3667790.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Deldjoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. McAuley</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Korikov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Sanner</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ramisa</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Vidal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Sathiamoorthy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kasirzadeh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Milano</surname>
          </string-name>
          ,
          <article-title>A review of modern recommender systems using generative models (gen-recsys)</article-title>
          ,
          <source>in: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '24)</source>
          ,
          <source>August 25-29</source>
          ,
          <year>2024</year>
          , Barcelona, Spain,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Austin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Korikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Toroghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sanner</surname>
          </string-name>
          ,
          <article-title>Bayesian optimization with LLM-based acquisition functions for natural language preference elicitation</article-title>
          ,
          <source>in: Proceedings of the 18th ACM Conference on Recommender Systems (RecSys'24)</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <article-title>Towards topic-guided conversational recommender system</article-title>
          , arXiv preprint arXiv:
          <year>2010</year>
          .
          <volume>04125</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>