<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Evaluating User Intent Classification and Hybrid Retrieval in a RAG-based Conversational Tourism Recommender System</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Akshat Tandon</string-name>
          <email>akshat.tandon@tum.de</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ashmi Banerjee</string-name>
          <email>ashmi.banerjee@tum.de</email>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>PCWrEooUrckResehdoinpgs ISSNc1e6u1r-3w-0s0.o7r3g</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Traditional tourism recommender systems struggle with cold-start problems and lack the natural interaction capabilities of conversational agents. This paper introduces a modular Hybrid Retrieval-Augmented Generation (RAG)-based Conversational Tourism Recommender System (C-TRS) for European cities. Our architecture combines LLMs with a hybrid retrieval pipeline (dense and sparse vector search) and a structured dialogue state tracker. We use a curated knowledge base from Wikivoyage and Tripadvisor, encompassing over 100 European cities. User utterances are parsed for multi-label intent classification, triggering the retrieval of relevant city-level knowledge chunks to augment LLM prompts for actions like answering queries or providing recommendations. Our evaluation focuses on user intent classification (comparing traditional models with few-shot LLM prompting) and retrieval quality (using the RAGAS framework). Findings demonstrate the eficacy of our hybrid retrieval approach and the power of few-shot learning with LLMs in handling the complexities of conversational recommendation. Overall, our hybrid retrieval strategy balances recall and precision by combining dense (semantic) and sparse (lexical) embeddings, conditioned on conversational intent. This dynamic selection addresses limitations of static or purely lexical/semantic retrieval in tourism RAG systems. It represents a novel integration of intent-driven hybrid retrieval in a conversational tourism framework. Our code and artifacts are available at https://github.com/Akshat125/conversational-trs.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;City Tourism Recommendations</kwd>
        <kwd>Conversational Recommender Systems</kwd>
        <kwd>Retrieval-Augmented Generation</kwd>
        <kwd>Hybrid Vector Search</kwd>
        <kwd>Large Language Models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Recommender systems are pivotal in helping users discover new destinations, attractions, and points of
interest (POIs). In tourism, this assistance is crucial for enabling users to explore and plan enriching
travel experiences. However, traditional recommendation approaches often grapple with the "cold-start"
problem, which stems from insuficient user information and limited interaction data. This problem
hinders their ability to provide personalized suggestions, especially for new users or in novel contexts,
as the system lacks the necessary data to make accurate recommendations [
        <xref ref-type="bibr" rid="ref32 ref35">30, 33</xref>
        ]. Furthermore, as the
desire for personalized and contextually relevant recommendations grows, conversational interfaces
have emerged as a natural and intuitive way for users to express their needs and preferences [8].
      </p>
      <p>
        The emergence of LLMs pretrained on extensive textual corpora introduces a paradigm shift by
leveraging implicit world knowledge, advanced natural language understanding, and few-shot in-context
learning. This term refers to the ability of the model to learn from a small amount of data in a specific
context, allowing it to generate recommendations with limited explicit training data [
        <xref ref-type="bibr" rid="ref15 ref7">6, 14</xref>
        ]. Nonetheless,
LLMs are prone to hallucinations and factual inaccuracies when generating responses without grounded
external knowledge, necessitating retrieval-augmented approaches. Retrieval-Augmented Generation
(RAG) frameworks combine information retrieval with conditional text generation, allowing grounding
of LLM outputs in external, domain-specific corpora. This mitigates hallucination while reducing
finetuning overhead by decoupling retrieval and generation stages [
        <xref ref-type="bibr" rid="ref33 ref40">38, 31</xref>
        ]. Existing RAG-based tourism RS
typically utilize single retrieval strategies and static pipelines, which limit adaptability and incur high
computational costs [
        <xref ref-type="bibr" rid="ref6">5</xref>
        ].
      </p>
      <p>This paper presents a modular Hybrid Retrieval-Augmented Generation (RAG)-based Conversational
Tourism Recommender System (C-TRS) tailored to recommend European city trips. Our architecture
integrates LLMs with a hybrid retrieval pipeline — combining dense and sparse vector search and
a structured dialogue state tracker. This design allows the system to detect user intent, maintain
context across turns, and retrieve and generate grounded, contextually appropriate recommendations
or answers.</p>
      <p>To support this system, we use a knowledge base combining structured and unstructured travel
information from Wikivoyage and Tripadvisor, covering over 100 European cities [4]. Each user
utterance, such as "Recommend relaxing destinations in Spain for early spring", is parsed for multi-label
intent classification, and depending on the system action (e.g., answer, recommend, and explain),
relevant city-level knowledge chunks are retrieved from a vector database and used to augment the
LLM prompt.</p>
      <p>
        We focus our evaluation on two key components: (1) user intent classification using both traditional
models (BERT and BART) and few-shot prompting with LLMs, and (2) the retrieval quality of diferent
vector search strategies using the RAGAS evaluation framework [
        <xref ref-type="bibr" rid="ref17">16</xref>
        ]. Our experiments show that
LLM-based few-shot classification outperforms traditional, nuanced, multi-intent classification
methods. Furthermore, the hybrid retrieval strategy balances recall and precision by switching between
dense embeddings (capturing semantic similarity) and sparse retrieval (capturing exact term matches),
addressing the limitations of purely dense or sparse retrieval in domain-specific recommendation. This
intent-driven hybrid retrieval mitigates LLM hallucination while reducing computational overhead —
advancing prior RAG-based tourism systems that lacked dynamic retrieval or conversational
modeling [
        <xref ref-type="bibr" rid="ref6">5</xref>
        ]. To our knowledge, this is the first C-TRS to embed hybrid retrieval selection directly within a
conversational framework.
      </p>
      <p>This paper is structured as follows: Section 2 reviews the relevant literature and identifies the
research gaps. Section 3 outlines our proposed system architecture and core components. In Section 4,
we evaluate two critical aspects of the C-TRS: user intent classification and the RAG pipeline, with
a key focus on the retrieval strategies. Finally, Section 5 concludes the paper by summarizing our
contributions and findings, and discusses directions for future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>This section surveys the existing research in CRS, user intent classification, and RAG for recommender
systems in the tourism domain.</p>
      <sec id="sec-2-1">
        <title>2.1. Conversational Tourism Recommender Systems (C-TRS) using LLMs</title>
        <p>
          Conversational recommender systems (CRS) have emerged as a promising approach to address
challenges such as cold-start problems and information asymmetry in recommendation scenarios [
          <xref ref-type="bibr" rid="ref20">19</xref>
          ].
By enabling users to express preferences, ask questions, and provide feedback in natural language,
CRS ofers a more interactive and personalized recommendation experience. The recent surge in LLM
capabilities has significantly enhanced the reasoning and dialog management capacities of CRS. This
has led to the development of various LLM-enhanced architectures that incorporate fine-tuning, RAG,
and hybrid frameworks integrating LLMs with traditional recommender systems [
          <xref ref-type="bibr" rid="ref19 ref22 ref23 ref25">22, 23, 18, 21</xref>
          ].
        </p>
        <p>
          In the tourism domain, several LLM-based conversational systems have been proposed to assist users
with trip planning and itinerary generation. For instance, zIA [9] is a persona-driven assistant that
ofers localized destination suggestions and supports itinerary planning. TravelAgent [10] combines
recommendation, planning, memory, and tool-use components to deliver personalized travel itineraries
through conversational interaction. Similarly, Vaiage [
          <xref ref-type="bibr" rid="ref28">26</xref>
          ] introduces a graph-structured multi-agent
architecture that recommends points of interest (POIs) and dynamically builds adaptive itineraries
based on user preferences and contextual factors.
        </p>
        <p>While these systems demonstrate the potential of LLMs for conversational tourism assistance, they
primarily focus on POI recommendation and detailed itinerary planning, often assuming that a
destination or city has already been selected. In contrast, our work targets an earlier and less explored
stage in the travel planning pipeline: recommending sustainable cities or destinations. We design a
conversational framework that supports natural language interaction while prioritizing sustainability, a
critical but underrepresented objective in existing LLM-based tourism CRS.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. User Intent Classification in CRS</title>
        <p>
          Several studies have addressed user intent classification in conversational recommender systems (CRS).
Early work by Cai et al. [8] proposed a taxonomy of user intents in movie recommendation. It evaluated
traditional machine learning models such as logistic regression, XGBoost, and SVM, highlighting the
benefit of contextual features. With the rise of transformers, Moradizeyveh [
          <xref ref-type="bibr" rid="ref34">32</xref>
          ] developed a pipeline
using a fine-tuned BERT model for intent recognition, while Kemper et al. [
          <xref ref-type="bibr" rid="ref25">23</xref>
          ] applied few-shot
prompt-based classification in restaurant CRS. Other advances include Sauer et al. [
          <xref ref-type="bibr" rid="ref41">39</xref>
          ], who leveraged
knowledge distillation for few-shot intent classification, and H. Liu et al. [
          <xref ref-type="bibr" rid="ref29">27</xref>
          ], who utilized
labelenhanced graph neural networks to capture relationships among intent classes. Techniques for dynamic
label refinement were proposed by Park et al. [
          <xref ref-type="bibr" rid="ref38">36</xref>
          ] to improve semantic separability in few-shot settings,
while Hou et al. [
          <xref ref-type="bibr" rid="ref21">20</xref>
          ] addressed multi-label intent detection with adaptive thresholding. Complementary
approaches that jointly model user preferences and intents in CRS have been introduced by Li et al.
[
          <xref ref-type="bibr" rid="ref27">25</xref>
          ] and Park et al. [
          <xref ref-type="bibr" rid="ref38">36</xref>
          ], focusing on multi-aspect preference modeling and explainability. Despite
these contributions, intent classification specifically tailored to tourism CRS remains underexplored,
motivating our evaluation of both supervised and LLM-based zero- and few-shot methods in this
domain.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. RAG for Tourism Recommender Systems</title>
        <p>
          Retrieval-Augmented Generation (RAG) techniques have recently gained traction in recommender
systems for their ability to combine external knowledge retrieval with powerful language models,
improving both accuracy and contextual relevance while reducing hallucination [
          <xref ref-type="bibr" rid="ref2 ref26">2, 24</xref>
          ]. In the tourism
domain, Qi et al. [
          <xref ref-type="bibr" rid="ref39">37</xref>
          ] demonstrates the efectiveness of RAG by optimizing a Tibet-focused conversational
tourism recommender system grounded in a vector database of tourist viewpoints, which reduces
hallucinations and enhances personalization. Similarly, Song et al. [
          <xref ref-type="bibr" rid="ref43">41</xref>
          ] propose TravelRAG, a framework
that augments RAG with knowledge graphs for tourist attractions, yielding improvements in both
eficiency and accuracy.
        </p>
        <p>
          More recently, Banerjee et al. [
          <xref ref-type="bibr" rid="ref6">5</xref>
          ] introduced a single-shot RAG-driven recommendation pipeline
incorporating sustainability-aware reranking to generate eco-friendly city recommendations. However,
their approach is not conversational and focuses on a single retrieval step. We extend this work by
developing a fully conversational system that integrates hybrid retrieval strategies, improves
computational eficiency, and incorporates user intent classification to better understand and respond to user
preferences. This enables iterative and personalized interactions tailored to sustainable tourism.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Approach: RAG-driven System Design</title>
      <p>To enable context-aware and dynamically adaptive recommendations within our CRS, we propose an
architecture centered around a RAG-pipeline. This design enhances the traditional CRS interaction loop
by integrating vector-based semantic search and conditional prompt augmentation, triggered based on
the conversational context. The RAG pipeline primarily supports two types of recommender actions:
Answer and Recommend and Explain, where external knowledge is required to generate informative
and grounded responses. A high-level overview of the system architecture is depicted in Figure 1.</p>
      <p>Our architecture follows a modular design and is composed of three primary stages: Retrieval,
Augmentation, and Generation. Each user query is processed through the augmented pipeline and,
depending on intent and dialogue state, may trigger the retrieval mechanism.</p>
      <sec id="sec-3-1">
        <title>3.1. User Interaction Scenario</title>
        <p>The system operates in a multi-turn conversational setting where users engage in natural dialogue to
discover travel destinations. A typical session starts with an open-ended query such as "Can you suggest
a relaxing destination in Europe for early spring?" or a factual question like "What is the best time to visit
Ljubljana?". As the conversation evolves, users may provide preferences (e.g., "I prefer less crowded places
with good public transport") or respond to system recommendations (e.g., "That sounds interesting—tell
me more about local cuisine there"). These utterances are parsed to infer intents, preferences, and context
cues, which are stored in a structured dialogue state and used to guide retrieval and generation.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Dataset</title>
        <p>
          The underlying knowledge base consists of structured and unstructured city-level travel information
for 160 European cities from Wikivoyage, parsed and processed using the method described in Banerjee
et al. [4]. Each article is hierarchically chunked by section headings (e.g., Get Around, Do, Eat) to
preserve semantic organization similar to Banerjee et al. [
          <xref ref-type="bibr" rid="ref6">5</xref>
          ]. The resulting corpus covers over 100
European cities, enriched with metadata such as month-wise seasonality, sustainability indicators, and
geolocation. Additionally, for the Recommend and Explain action, this knowledge base is augmented
with structured metadata from Tripadvisor, including Points of Interest (POIs), green accommodations
lfagged under Tripadvisor’s sustainability program 1, and user ratings. The evaluation dataset used
in this study contains 50 single-hop queries, where answers are explicitly derivable from one or more
knowledge chunks. These queries cover a diverse range of intents, destinations, and seasons, and are
used to benchmark the retrieval component of the system.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Retrieval Stage</title>
        <p>The retrieval stage is responsible for identifying relevant external knowledge chunks to support system
responses. This stage is invoked only when deemed necessary by a lightweight routing mechanism
(described in the augmentation stage). This selective invocation ensures computational eficiency,
avoiding retrieval during trivial turns such as acknowledgments. This stage includes document indexing,
query embedding, vector similarity search, optional reranking, and strategy-specific chunk filtering.</p>
        <sec id="sec-3-3-1">
          <title>3.3.1. Embedding and Hybrid Vector Search</title>
          <p>
            We construct a vector database using city-level travel data from Wikivoyage using the knowledge-base
developed in Banerjee et al. [4]. Each document is parsed and chunked hierarchically using markdown
headers to preserve semantic structure. Large chunks are recursively split by subheaders, subsubsections,
and sentences, resulting in an average size of 600 characters, similar to Banerjee et al. [
            <xref ref-type="bibr" rid="ref6">5</xref>
            ].
          </p>
          <p>
            Two types of embeddings are generated:
• Dense embeddings: using the all-MiniLM-L6-v2 model [
            <xref ref-type="bibr" rid="ref48">44</xref>
            ] to capture semantic similarity.
• Sparse embeddings: using the splade-cocondenser-selfdistil model [
            <xref ref-type="bibr" rid="ref18 ref4">17</xref>
            ] to capture
contextual keyword-based relevance.
          </p>
          <p>Embeddings are stored in a Milvus Lite instance 2, using FLAT indexing for dense vectors with cosine
similarity and SPARSE_INVERTED_INDEX for sparse vectors with inner product similarity.</p>
          <p>At runtime, the user query is embedded using the same model(s) as during indexing. Three retrieval
modes are supported:
• Dense Retrieval: Semantic similarity using cosine distance.
• Sparse Retrieval: Lexical matching based on token overlap.</p>
          <p>
            • Hybrid Retrieval: Combines both using Reciprocal Rank Fusion (RRF) [
            <xref ref-type="bibr" rid="ref13">12</xref>
            ], as shown in Figure 1.
          </p>
        </sec>
        <sec id="sec-3-3-2">
          <title>3.3.2. Retrieval Strategies</title>
          <p>We employ intent-aware retrieval workflows that utilize both the dialogue state and metadata for more
targeted filtering. The underlying intent classification approach is detailed in Section 4.1.</p>
        </sec>
        <sec id="sec-3-3-3">
          <title>For Answer Action</title>
          <p>When the user issues a factual query about a known destination:
1. Metadata filtering narrows search to chunks matching the current city and (optionally) relevant
subheadings.
2. Vector search retrieves top- matching chunks using dense, sparse, or hybrid methods.</p>
          <p>This enables precise, city-specific responses, even when city references are implicit, by using an
entity extractor agent to update the dialogue state.</p>
          <p>For Recommend and Explain Action This strategy involves a multi-stage retrieval with
sustainability-aware reranking to ensure better explanation:
1. Query generation: An LLM transforms dialogue constraints into a pseudo-natural language
query.</p>
          <p>
            2. Initial search: Top- chunks are retrieved using dense, sparse, or hybrid methods.
3. SFI reranking: Candidate cities are reranked for sustainability using the Societal Fairness
Indicator (SFI) [
            <xref ref-type="bibr" rid="ref6">3, 5</xref>
            ], which combines CO2e emission trade-ofs, destination popularity (ratings
and reviews), and seasonality indices (monthly footfall by destination and travel month). Although
sustainability is not the primary focus of this work, incorporating SFI helps elevate eco-friendly
destinations.
4. City-specific search: The top-ranked city is re-searched in the vector database to extract
context-specific chunks.
5. Tripadvisor augmentation: Retrieved chunks are enriched with structured POI and green hotel
metadata from Tripadvisor.
          </p>
          <p>This hierarchical approach balances user constraints and contextual relevance while prioritizing
sustainability to generate high-quality recommendations with meaningful explanations, improving the
transparency and interpretability of our system.</p>
        </sec>
        <sec id="sec-3-3-4">
          <title>3.3.3. Reranking and Candidate Selection</title>
          <p>
            To improve context quality, we apply a cross-encoder-based reranker 3 after initial search. This model
embeds chunks and queries together to obtain a similarity score for each pair, making it slower yet
more accurate than a dense retriever [
            <xref ref-type="bibr" rid="ref14">13</xref>
            ]. While RRF is an efective algorithm for aggregating ranked
results from dense and sparse retrieval, it does not take the semantic alignment into account. Applying a
cross-encoder reranker refines the results with semantic relevance scoring, a strategy shown to improve
performance in hybrid retrieval [
            <xref ref-type="bibr" rid="ref42">40</xref>
            ].
          </p>
          <p>
            Algorithm 1 abstracts the retrieval workflow used in our RAG pipeline, unifying the retrieval strategies
discussed earlier. SearchWikivoyageDocs retrieves the top-k chunks for a given query, retriever
(dense, sparse, or hybrid), and optional filter. When the is_recommendation flag is set, as in the
Recommend and Explain action, the candidate pool undergoes SFI reranking and a city-specific search
to yield sustainable, context-rich recommendations. Building on N. F. Liu et al. [
            <xref ref-type="bibr" rid="ref30">28</xref>
            ], which highlights
the role of reranking in mitigating position bias, we further enlarge the candidate pool (e.g., top-10)
before reranking to ensure broader coverage and select the final top-5 chunks.
          </p>
          <p>Algorithm 1 Multi-stage retrieval function with cross-encoder reranking</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Augmentation Stage</title>
        <p>
          The augmentation stage governs intent interpretation, dialogue state tracking, context routing, and
action selection. It is the core logic engine of the system and builds upon previous works in conversational
3https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-6-v2
recommendation [
          <xref ref-type="bibr" rid="ref25 ref31 ref8">29, 7, 23</xref>
          ], introducing optimizations for modularity, eficiency, and interpretability.
        </p>
        <sec id="sec-3-4-1">
          <title>3.4.1. Intent Classification</title>
          <p>
            Each user utterance is passed through a multi-label intent classifier to determine one or more
conversational intents. We adopt a taxonomy comprising five categories [
            <xref ref-type="bibr" rid="ref25">23</xref>
            ]:
• Ask Recommendation: Direct request for a travel recommendation.
• Provide Preference: Statement of preferences or constraints (e.g., travel style, destination type).
• Inquire: Factual or exploratory questions about destinations.
• Accept Recommendation: Positive feedback confirming interest in a suggested destination.
• Reject Recommendation: Negative feedback dismissing a prior suggestion.
          </p>
          <p>Multiple intents can co-occur in a single turn, provided they are semantically compatible. For instance,
the utterance ‘Amsterdam looks great to me, can you tell me more about the local cuisines there?” would be
classified with the intents: Accept Recommendation, Inquire, and Provide Preference—a valid combination.
In contrast, a contradictory statement such as Looks great. I don’t like it.” would be flagged for intent
conflict (Accept + Reject), and corrective heuristics would be applied. The corrective heuristics, such as
constraint validation and priority-based resolution, act as guardrails to handle diverse user queries. For
example, when processing feedback on a recommendation, we first verify that a recommendation was
issued in the previous turn.</p>
          <p>Intent classification is performed using autoregressive LLMs, with few-shot prompting for robustness.
Additionally, when dealing with accept/reject intents, we verify that a recommendation was indeed
provided in the previous turn to ensure contextual coherence.</p>
        </sec>
        <sec id="sec-3-4-2">
          <title>3.4.2. Dialogue State Update</title>
          <p>
            Following intent classification, the system updates the dialogue state, which is maintained as a
semistructured JSON-like object. Inspired by Kemper et al. [
            <xref ref-type="bibr" rid="ref25">23</xref>
            ], this state representation balances
transparency and flexibility, capturing:
• Recent user and system exchanges (conversation_history).
• Required preferences (hard_constraints) and optional ones (soft_constraints).
• Accepted and rejected destinations (recommendation_feedback).
• Metadata-driven query focus keys
          </p>
          <p>current_subheadings_of_interest).
• Current user intents and selected system action.
(current_destination_of_interest,</p>
          <p>This structure supports dynamic adaptation of the CRS over multiple dialogue turns and prevents
premature recommendations when essential constraints are missing—unless overridden by a strong
Ask Recommendation intent.</p>
        </sec>
        <sec id="sec-3-4-3">
          <title>3.4.3. Context Router and Query Transformation</title>
          <p>A lightweight routing mechanism determines whether external context retrieval is necessary for the
current turn. This decision is based on the identified recommender action: only Answer and Recommend
and Explain actions trigger the retrieval stage. For simpler actions (e.g., Acknowledge Acceptance), a
pre-defined response is returned. This selective routing reduces system latency and computational cost.</p>
          <p>When external context is needed, the system may reformulate the query using relevant elements
from the dialogue state (e.g., inserting a specific destination name or subtopic). This improves semantic
alignment between the user query and vector database content, enhancing retrieval precision.</p>
        </sec>
        <sec id="sec-3-4-4">
          <title>3.4.4. Action Selection and Final Prompt Construction</title>
          <p>Based on the classified intents and current dialogue state, the system selects a recommender action
from a predefined set:
• Recommend and Explain: Generate a recommendation and justify it using user preferences.
• Answer: Provide information in response to a user query.
• Request Information: Ask for missing hard constraints.</p>
          <p>• Acknowledge Acceptance / Rejection: Respond to user feedback.</p>
          <p>
            Unlike scoring-based approaches [
            <xref ref-type="bibr" rid="ref25">23</xref>
            ], we adopt a rule-based mapping strategy, prioritizing explicit
intents (Inquire, Ask Recommendation) and using dialogue completeness to guide fallback actions. This
approach enhances responsiveness and reduces interpretive ambiguity.
          </p>
          <p>The final LLM prompt is constructed by integrating four key components: (1) the transformed
user query, (2) any retrieved context passages (if applicable), (3) the serialized dialogue state, and (4)
instructional framing tailored to the selected recommender action. This structured composition ensures
that the LLM is guided by both the ongoing dialogue context and relevant external information, resulting
in coherent, context-aware, and grounded responses. Listing 1 illustrates the combined system-user
prompt template used for the Recommend and Explain action.</p>
          <p>You are a sustainable tourism recommendation system. A city has been pre-selected for the user
after considering both user preferences and sustainability factors. Using the provided context for
the selected city, your task is to:
1. Summarize the key highlights and explain why the city is recommended, focusing on
sustainability factors.
2. Explain how the city matches the user’s preferences based on the query and context.
3. Highlight the top 3 attractions and green hotels that align with the user’s preferences.
4. Recommend the most sustainable mode of travel from the user’s starting location, and encourage
off-peak/shoulder season travel.
**Instructions**:
1. Begin your response with a bold heading for the city and country (**City, Country**).
2. Follow with: "I recommend [city_name]" and explain why this destination is recommended.
3. Use only the provided context to craft your response. If insufficient, reply: "Sorry, I am
unable to provide a recommendation for the given preferences."
4. Ensure your response is accurate and professional.
**Query:**
{{ query }}. Which city do you recommend and why?
**Context:**
{{ context }}
**Response:**</p>
          <p>Listing 1: Prompt template for Recommend and Explain action</p>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Response Generation</title>
        <p>In the final stage, a response is generated using an LLM:
• For Answer and Recommend and Explain actions, a context-augmented prompt is passed to the</p>
        <p>LLM for free-form response generation.
• For simpler actions (e.g., acknowledgments or information requests), a lightweight template or
hard-coded message is returned to the user.</p>
        <p>This selective generation mechanism ensures that expensive model inference is performed only when
warranted, optimizing both user experience and system eficiency. We use Llama-3.1-8B-Instruct as our
LLM for response generation.</p>
        <p>In summary, our RAG-driven architecture tightly integrates semantic retrieval, structured dialogue
modeling, and LLM-based generation. By decoupling the intent classification, context routing, and
generation processes, the system maintains high modularity, interpretability, and extensibility. This
design facilitates seamless adaptation to evolving use cases, including the integration of new intents
and retrieval strategies.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <p>Due to the multifaceted nature of our CRS, evaluating every component is challenging. For the scope of
this paper, we evaluate the performance of our CRS through ofline evaluation of two critical components:
user intent classification and the RAG pipeline for European city tourism recommendations. To gain a
meaningful understanding of the system’s performance, we use a combination of current state-of-the-art
metrics, including standard classification metrics 4 for user intent classification, and metrics from a
model-based evaluation framework RAGAS5 to evaluate the RAG pipeline.</p>
      <p>Section 4.1 evaluates the system’s ability to classify diverse user intents and sentiments. Section 4.2
evaluates the RAG pipeline, specifically the information retrieval and generation stages for the Answer
action. In both sections, we first discuss the experimental setup and metrics, followed by the results.</p>
      <sec id="sec-4-1">
        <title>4.1. User Intent Classification Evaluation</title>
        <p>
          User intent classification is a multi-label classification problem in which the user intents can be classified
into multiple categories. Classifying the user sentiment is one of the most critical components of the
system because of its direct impact on the recommender action selected and the response returned
by the system. Moreover, the task’s lower complexity provides a good opportunity to benchmark the
results against smaller models. In order to efectively evaluate our user intent classifier, we compare the
performance of four diferent approaches:
1. BERT (Fine-Tuned Sequence Classification) : A fine-tuned BERT model trained on a
synthetically generated dataset [
          <xref ref-type="bibr" rid="ref16">15</xref>
          ].
2. BART-large-MNLI (Zero-Shot): BART model pre-trained on MultiNLI (MNLI) for zero-shot
sequence classification [46].
3. LLMs (Zero-Shot): Classification using LLMs with the prompt containing only the user query
and label names, providing no descriptions or examples.
4. LLMs (Few-Shot): Classification using LLMs, where each user intent is classified individually
with a clear description and few-shot examples. The results are then aggregated to form the final
prediction.
        </p>
        <p>
          We use GPT-4o-mini [
          <xref ref-type="bibr" rid="ref36">34</xref>
          ] as the primary LLM for evaluation. For the supervised baseline, BERT is
ifne-tuned on a synthetically generated dataset, while BART-large-MNLI is used out of the box.
        </p>
        <p>
          Due to the lack of labeled training data for our use case, we manually created user examples covering
over ten intent combinations. To augment this dataset, we leveraged Gemini-1.5-pro-001 [
          <xref ref-type="bibr" rid="ref44">42</xref>
          ], following
a prompt-based intent description strategy inspired by Parikh et al. [
          <xref ref-type="bibr" rid="ref37">35</xref>
          ].
        </p>
        <p>Our prompt template includes: (1) natural language descriptions of each intent as described
in Section 3.4.1, (2) a few-shot set of manually crafted examples, and (3) specific instructions to
handle edge cases and encourage diverse phrasing. For instance, to generate utterances for the
Accept_Recommendation intent, the model is prompted to act as an expert sustainable travel
consultant and produce 20 distinct examples that clearly accept a recommendation. The instructions emphasize
clarity, intent exclusivity (avoiding preferences or inquiries), and variation in tone and persona. Each
output is structured in CSV format with binary labels across five intent categories: Ask_Recommendation,
Provide_Preference, Inquire, Accept_Recommendation, and Reject_Recommendation. For
example, the utterance “Looks perfect to me!” is labeled with Accept_Recommendation=1 and all
others=0. This structured generation approach ensures label consistency and seamless integration with
supervised intent classification models. The same template structure is adapted for other intents, with
corresponding examples and tailored instructions.</p>
        <p>
          This results in a final dataset with 330+ labeled user inputs, where the label is a binary vector over
the intent classes. The distribution of label combinations is varied, with between 20 and 54 examples
for over 10 unique intent combinations. The dataset is then split into 80% for training and 10% for
validation and testing each. We fine-tune BERT on a MacBook with Apple M2 and 16GB RAM, freezing
all but the last two layers to reduce overfitting given the limited dataset size. During inference, we
apply a decision threshold of 0.5 for BERT and BART-large-MNLI, while LLMs are not provided an
explicit threshold.
4.1.1. Metrics
Four key metrics are used to evaluate the performance of multi-label classification models: Accuracy,
Precision, Recall, and F1-score [
          <xref ref-type="bibr" rid="ref1 ref12">11, 1</xref>
          ]. Accuracy (subset accuracy) measures the proportion of samples
where the predicted label set matches the ground truth label set. Precision reflects the proportion of
correctly predicted labels among all predicted labels, while recall measures the proportion of actual
labels that are correctly predicted. The F1-score combines precision and recall using the harmonic mean,
ofering a balanced view of the model’s performance [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. To account for multiple labels, micro-averaging
is applied when calculating precision, recall, and F1-score.
4.1.2. Results
        </p>
        <p>Our findings in Table 1 reveal that few-shot classification with LLMs outperforms zero-shot
classification, with GPT-4o-mini notably achieving the highest score across most metrics. For the F1-score
particularly, we observed an improvement of 34% for GPT-4o-mini, reinforcing the efectiveness of
having clear intent descriptions and few-shot examples to guide the models. This aligns with our
expectations, as few-shot examples cover several edge cases present in our dataset, which a pre-trained
model may not capture. Sequence classification using BERT also emerges as an efective approach for
a smaller model size, with the highest precision of 91% and an F1-score of 88%, trailing just behind
GPT-4o-mini (few-shot). This shows that, with a suficiently large and diverse training dataset, smaller
models like BERT can compete with, or potentially outperform, few-shot classification using LLMs. One
of the common sources of inaccuracies we observed with LLMs is the misinterpretation of ambiguous
user utterances as preference elicitation.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. RAG Pipeline Evaluation</title>
        <p>The RAG pipeline plays a central role in our C-TRS, both when responding to user inquiries and when
generating recommendations. In this section, we focus on evaluating the former, as recommendations
often lack a clear reference context for objective comparison. Since our primary contribution lies
in the retrieval stage, we restrict our evaluation primarily to this component rather than the entire
pipeline. Furthermore, the open-ended nature of the Recommend and Explain action is better suited to a
comprehensive user study, which we leave for future work.</p>
        <p>
          We design an ofline experiment to compare diferent vector search approaches, evaluating both the
retrieval and generation quality. LLM-judge-based frameworks have emerged as a promising tool for
evaluating RAG pipelines, with research by Zheng et al. [
          <xref ref-type="bibr" rid="ref49">47</xref>
          ] reporting over 80% agreement between
human and LLM judgements for models like GPT-4 [
          <xref ref-type="bibr" rid="ref49">47</xref>
          ]. This approach enables a scalable and
costefective evaluation. Therefore, we utilize the state-of-the-art RAGAS framework proposed by Es et al.
[
          <xref ref-type="bibr" rid="ref17">16</xref>
          ]. RAGAS employs a stronger LLM-judge to assess the response generated by a weaker model, while
taking the question, ground truth, and retrieved context into account.
        </p>
        <p>
          The evaluation dataset consists of 50 synthetically generated Q&amp;A pairs from the Wikivoyage corpus
using the RAGAS TestsetGenerator6. The questions span a diverse range of user personas and
question types, covering articles for the following 5 cities with 10 questions for each: Amsterdam,
Munich, Istanbul, Madrid, and Zurich. We restrict the query category to single-hop specific
queries, which can be answered in one retrieval step from a single document, as the metadata filtering
step may exclude relevant context for cross-city comparison questions [
          <xref ref-type="bibr" rid="ref45">43</xref>
          ].
        </p>
        <p>
          For our evaluation, we employ GPT-4o-mini as the evaluator LLM and set the corresponding city
names as metadata filters prior to vector search. Llama-3.1-8B-Instruct is used as the generator LLM
throughout the evaluation. Additionally, we fix the top_k value to 5 and apply a cross-encoder as the
reranker.
4.2.1. Metrics
We use the following RAGAS metrics to evaluate the retrieval (context recall and precision) and
generation (faithfulness and answer relevancy) stages of the RAG pipeline [
          <xref ref-type="bibr" rid="ref17">16</xref>
          ]:
• Context Recall: Measures the proportion of relevant chunks needed to arrive at the ground
truth that are actually retrieved.
• Context Precision: Measures the proportion of the retrieved chunks that are relevant for arriving
at the ground truth.
• Faithfulness: Measures the extent to which statements generated by the LLM can be inferred
from the retrieved chunks. A lower score indicates hallucination in the generated response.
• Answer Relevancy: Measures how relevant the response generated by LLM is to the user query.
4.2.2. Results
Table 2 presents the evaluation of diferent retrieval strategies using four core metrics: Context Recall,
Context Precision, Faithfulness, and Answer Relevancy. Among all configurations, Sparse Search
+ Rerank achieves the highest Context Recall (0.77) and Context Precision (0.83), indicating its strong
ability to retrieve chunks that are both comprehensive and relevant to the ground truth. This makes it
particularly well-suited for scenarios where accurate and complete contextual grounding is essential.
        </p>
        <p>In contrast, Hybrid Search without reranking yields the highest Faithfulness score (0.81), suggesting
that this method is most efective in reducing hallucinations by ensuring that generated responses are
well-supported by the retrieved content. When it comes to user-centric evaluation, Hybrid Search +
Rerank achieves the best performance in Answer Relevancy (0.90), implying that it delivers the most
query-relevant responses, even though it shows a slight drop in recall. Notably, reranking improves
6https://docs.ragas.io/en/stable/getstarted/rag_testset_generation/
Vector Search Type
Dense Search
Dense Search + Rerank
Sparse Search
Sparse Search + Rerank
Hybrid Search
Hybrid Search + Rerank
precision across all methods by filtering out less relevant chunks, though this sometimes comes at the
cost of reduced recall.</p>
        <p>Finally, Dense Search underperforms relative to sparse and hybrid methods across all metrics,
highlighting its limitations in this task. Overall, we observe that sparse retrieval excels in key retrieval
metrics, while hybrid retrieval performs best for generation quality. Hybrid approaches ofer a balanced
trade-of, making them particularly promising as we extend support for more complex, multi-hop, or
abstract queries in future work.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Work</title>
      <p>In this paper, we introduced a modular Hybrid RAG-based Conversational Tourism Recommender
System (C-TRS) for recommending European cities. Our system integrates LLMs with hybrid
densesparse retrieval and a structured dialogue state tracker to support real-time, multi-turn interactions. By
combining intent classification, dynamic retrieval strategies, and grounded response generation, the
system delivers contextually relevant recommendations and factual answers tailored to user preferences.</p>
      <p>Through ofline evaluation, we demonstrated that few-shot prompting with LLMs significantly
improves multi-label user intent classification over traditional models. Moreover, our hybrid retrieval
strategy—conditioned on conversational intent—balances recall and precision, improving answer
relevance and reducing hallucinations compared to static or single-strategy retrieval methods.</p>
      <p>
        Despite promising initial results, several aspects of our approach can be further improved. First, our
evaluation revealed that few-shot prompting with LLMs for intent extractions can struggle with
ambiguous or complex user utterances. Future work could explore more robust prompting strategies, such
as chain-of-thought (CoT) prompting [
        <xref ref-type="bibr" rid="ref47">45</xref>
        ], or fine-tuning smaller models for domain-specific task [
        <xref ref-type="bibr" rid="ref37">35</xref>
        ].
Second, our current evaluation relies solely on model-based metrics. Conducting a comprehensive user
study would provide deeper insights into user satisfaction and recommendation quality, while also
helping refine the taxonomy of user intents and recommender actions. Moreover, our current evaluation
is limited by the use of a relatively small synthetic intent dataset and single-hop queries, which may not
fully reflect the complexity of real-world conversational recommendation. Future work could expand
evaluation to larger datasets and multi-hop queries, and consider ablation study to better understand the
contribution of intent-driven retrieval. Lastly, the current system lacks guardrails for moderating input
and output, which are essential for ensuring safety, reliability, and responsible behavior in real-world
deployment. Addressing these limitations will be key to advancing this framework toward scalable,
trustworthy, and user-aligned conversational recommender systems.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT (OpenAI) and Grammarly to correct
grammar and spelling inconsistencies and to improve the clarity of the text. ChatGPT was also used
for code snippet suggestions during system development. We have critically reviewed and revised all
GenAI outputs to ensure that accuracy and originality are maintained, and we accept full responsibility
for the content presented in this draft.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] 3.4. Metrics and Scoring: Quantifying the Quality of Predictions. scikit-learn</article-title>
          . url: https://scikitlearn/stable/modules/model_evaluation.
          <source>html (visited on 01/28/</source>
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2] [3] [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Arslan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ghanem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Munawar</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Cruz</surname>
          </string-name>
          .
          <article-title>“A Survey on RAG with LLMs”</article-title>
          .
          <source>In: Procedia computer science 246</source>
          (
          <year>2024</year>
          ), pp.
          <fpage>3781</fpage>
          -
          <lpage>3790</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mahmudov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Adler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. N.</given-names>
            <surname>Aisyah</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Wörndl</surname>
          </string-name>
          .
          <article-title>Modeling Sustainable City Trips: Integrating CO2e Emissions, Popularity, and Seasonality into Tourism Recommender Systems</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          Sept.
          <volume>17</volume>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2403</volume>
          .18604 [cs]. url: http://arxiv.org/abs/2403.18604 (visited on 10/01/
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>In: Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          .
          <year>2025</year>
          , pp.
          <fpage>3743</fpage>
          -
          <lpage>3752</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Satish</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Wörndl</surname>
          </string-name>
          . “
          <article-title>Enhancing Tourism Recommender Systems for Sustainable City Trips Using Retrieval-Augmented Generation”</article-title>
          .
          <source>In: Recommender Systems for Sustainability and Social Good</source>
          . Ed. by
          <string-name>
            <given-names>L.</given-names>
            <surname>Boratto</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. De Filippo</surname>
            , E. Lex, and
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Ricci</surname>
          </string-name>
          . Cham: Springer Nature Switzerland,
          <year>2025</year>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>34</lpage>
          . isbn:
          <fpage>978</fpage>
          -3-
          <fpage>031</fpage>
          -87654-7.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bhardwaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Shah</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Varma</surname>
          </string-name>
          . “
          <article-title>Pre-training LLMs using human-like development data corpus”</article-title>
          .
          <source>In: Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning</source>
          . Ed. by
          <string-name>
            <given-names>A.</given-names>
            <surname>Warstadt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mueller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Choshen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Wilcox</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ciro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mosquera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Paranjabe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Linzen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Cotterell</surname>
          </string-name>
          . Singapore:
          <article-title>Association for Computational Linguistics</article-title>
          , Dec.
          <year>2023</year>
          , pp.
          <fpage>339</fpage>
          -
          <lpage>345</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .conll-babylm.
          <volume>30</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>[7] [8] [9]</source>
          [10]
          <string-name>
            <given-names>W.</given-names>
            <surname>Cai</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          . “
          <article-title>Predicting User Intents and Satisfaction with Dialogue-based Conversational Recommendations”</article-title>
          .
          <source>In: Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization. UMAP '20: 28th ACM Conference on User Modeling, Adaptation and Personalization. Genoa Italy: ACM, July</source>
          <volume>7</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>42</lpage>
          . isbn:
          <fpage>978</fpage>
          -1-
          <fpage>4503</fpage>
          -6861-2. doi:
          <volume>10</volume>
          .1145/3340631.3394856.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>W.</given-names>
            <surname>Cai</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          . “
          <article-title>Predicting user intents and satisfaction with dialogue-based conversational recommendations”</article-title>
          .
          <source>In: Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization</source>
          .
          <year>2020</year>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Cassani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ruberl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Salis</surname>
          </string-name>
          , G. Giannese, and
          <string-name>
            <surname>G. Boanelli.</surname>
          </string-name>
          <article-title>zIA: a GenAI-powered local auntie assists tourists in Italy</article-title>
          .
          <year>2024</year>
          . arXiv:
          <volume>2407</volume>
          .11830 [cs.DC].
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiao</surname>
          </string-name>
          , and
          <string-name>
            <surname>J. Chen.</surname>
          </string-name>
          <article-title>TravelAgent: An AI Assistant for Personalized Travel Planning</article-title>
          .
          <year>2024</year>
          . arXiv:
          <volume>2409</volume>
          .08069 [cs.
          <source>AI].</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Classification</surname>
          </string-name>
          : Accuracy, Recall, Precision, and Related Metrics |
          <article-title>Machine Learning</article-title>
          .
          <source>Google for Developers</source>
          .
          <year>2025</year>
          . url: https : / / developers . google . com / machine - learning / crash - course / classification/accuracy-precision-recall
          <source>(visited on 01/28/</source>
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G. V.</given-names>
            <surname>Cormack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L. A.</given-names>
            <surname>Clarke</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Buettcher</surname>
          </string-name>
          . “
          <article-title>Reciprocal Rank Fusion Outperforms Condorcet and Individual Rank Learning Methods”</article-title>
          .
          <source>In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '09: The 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          . Boston MA USA: ACM, July
          <volume>19</volume>
          ,
          <year>2009</year>
          , pp.
          <fpage>758</fpage>
          -
          <lpage>759</lpage>
          . isbn:
          <fpage>978</fpage>
          -1-
          <fpage>60558</fpage>
          -483-6. doi:
          <volume>10</volume>
          .1145/1571941.1572114.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Deldjoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mcauley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Korikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sanner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramisa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vidal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sathiamoorthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kasrizadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Milano</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Ricci</surname>
          </string-name>
          . “
          <article-title>Recommendation with Generative Models”</article-title>
          .
          <source>In: Foundations and Trends® in Information Retrieval (Sept</source>
          .
          <year>2024</year>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>120</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Deldjoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. McAuley</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Korikov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Sanner</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ramisa</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Vidal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Sathiamoorthy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kasirzadeh</surname>
            , and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Milano</surname>
          </string-name>
          .
          <article-title>“A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys)”</article-title>
          .
          <source>In: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. KDD '24</source>
          .
          <string-name>
            <surname>Barcelona</surname>
          </string-name>
          , Spain: Association for Computing Machinery,
          <year>2024</year>
          , pp.
          <fpage>6448</fpage>
          -
          <lpage>6458</lpage>
          . isbn:
          <volume>9798400704901</volume>
          . doi:
          <volume>10</volume>
          .1145/3637528.3671474.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          . “BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding”</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). Ed. by
          <string-name>
            <given-names>J.</given-names>
            <surname>Burstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Doran</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Solorio</surname>
          </string-name>
          . Minneapolis, Minnesota: Association for Computational Linguistics,
          <year>June 2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . doi:
          <volume>10</volume>
          .18653/ v1/
          <fpage>N19</fpage>
          -1423.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Es</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>James</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Espinosa</given-names>
            <surname>Anke</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Schockaert</surname>
          </string-name>
          . “
          <article-title>RAGAs: Automated Evaluation of Retrieval Augmented Generation”</article-title>
          .
          <source>In: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations</source>
          . Ed. by
          <string-name>
            <given-names>N.</given-names>
            <surname>Aletras</surname>
          </string-name>
          and
          <string-name>
            <given-names>O. De</given-names>
            <surname>Clercq. St. Julians</surname>
          </string-name>
          , Malta: Association for Computational Linguistics, Mar.
          <year>2024</year>
          , pp.
          <fpage>150</fpage>
          -
          <lpage>158</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          .eacl-demo.
          <volume>16</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>T.</given-names>
            <surname>Formal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lassance</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Piwowarski</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Clinchant</surname>
          </string-name>
          . “
          <article-title>From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Efective”</article-title>
          .
          <source>In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '22</source>
          . Madrid, Spain: Association for Computing Machinery,
          <year>2022</year>
          , pp.
          <fpage>2353</fpage>
          -
          <lpage>2359</lpage>
          . isbn:
          <volume>9781450387323</volume>
          . doi:
          <volume>10</volume>
          .1145/3477495.3531857.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>L.</given-names>
            <surname>Friedman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahuja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Allen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sidahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Schubiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Tiwari</surname>
          </string-name>
          .
          <source>Leveraging Large Language Models in Conversational Recommender Systems</source>
          .
          <year>2023</year>
          . arXiv:
          <volume>2305</volume>
          .07961 [cs.IR].
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>C.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. de Rijke</surname>
          </string-name>
          , and T.-S. Chua. “
          <article-title>Advances and challenges in conversational recommender systems: A survey”</article-title>
          .
          <source>In: AI Open</source>
          <volume>2</volume>
          (
          <year>2021</year>
          ), pp.
          <fpage>100</fpage>
          -
          <lpage>126</lpage>
          . issn:
          <fpage>2666</fpage>
          -
          <lpage>6510</lpage>
          . doi:
          <volume>10</volume>
          . 1016/j.aiopen.
          <year>2021</year>
          .
          <volume>06</volume>
          .002.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Che</surname>
          </string-name>
          , and T. Liu. “
          <article-title>Few-shot learning for multi-label intent detection”</article-title>
          .
          <source>In: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          . Vol.
          <volume>35</volume>
          . 14.
          <year>2021</year>
          , pp.
          <fpage>13036</fpage>
          -
          <lpage>13044</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          . “
          <article-title>Lending Interaction Wings to Recommender Systems with Conversational Agents”</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          . Ed. by
          <string-name>
            <given-names>A.</given-names>
            <surname>Oh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Naumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Globerson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Saenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hardt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Levine</surname>
          </string-name>
          . Vol.
          <volume>36</volume>
          . Curran Associates, Inc.,
          <year>2023</year>
          , pp.
          <fpage>27951</fpage>
          -
          <lpage>27979</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>H.</given-names>
            <surname>Joko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramsay</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. P. de Vries</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Dalton</surname>
            , and
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Hasibi</surname>
          </string-name>
          . “
          <string-name>
            <surname>Doing Personal</surname>
            <given-names>LAPS</given-names>
          </string-name>
          :
          <article-title>LLM-Augmented Dialogue Construction for Personalized Multi-Session Conversational Search”</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <source>In: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. July 10</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>796</fpage>
          -
          <lpage>806</lpage>
          . doi:
          <volume>10</volume>
          .1145/3626772.3657815. arXiv:
          <volume>2405</volume>
          .03480 [cs].
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kemper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Dicarlantonio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Korikov</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Sanner</surname>
          </string-name>
          . “
          <article-title>RetrievalAugmented Conversational Recommendation with Prompt-based Semi-Structured Natural Language State Tracking”</article-title>
          .
          <source>In: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          . Washington DC USA: ACM, July
          <volume>10</volume>
          ,
          <year>2024</year>
          , pp.
          <fpage>2786</fpage>
          -
          <lpage>2790</lpage>
          . isbn:
          <volume>9798400704314</volume>
          . doi:
          <volume>10</volume>
          .1145/3626772.3657670.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piktus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Küttler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , W.-t. Yih,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          , et al. “
          <article-title>Retrieval-augmented generation for knowledge-intensive nlp tasks”</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          ), pp.
          <fpage>9459</fpage>
          -
          <lpage>9474</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Q.</given-names>
            <surname>He</surname>
          </string-name>
          .
          <article-title>“User-centric conversational recommendation with multi-aspect user modeling”</article-title>
          .
          <source>In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          .
          <year>2022</year>
          , pp.
          <fpage>223</fpage>
          -
          <lpage>233</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ge</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>Vaiage: A Multi-Agent Solution to Personalized Travel Planning</article-title>
          .
          <year>2025</year>
          . arXiv:
          <volume>2505</volume>
          .10922 [cs.MA].
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , and
          <string-name>
            <surname>X. Zhang.</surname>
          </string-name>
          “
          <article-title>Few-Shot Intent Detection with Label-Enhanced Hierarchical Feature Learning and Graph Neural Networks”</article-title>
          .
          <source>In: Proceedings of the ACM Turing Award Celebration Conference-China</source>
          <year>2024</year>
          .
          <year>2024</year>
          , pp.
          <fpage>226</fpage>
          -
          <lpage>227</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>N. F.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hewitt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Paranjape</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bevilacqua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          . “
          <article-title>Lost in the Middle: How Language Models Use Long Contexts”</article-title>
          .
          <source>In: Transactions of the Association for Computational Linguistics</source>
          <volume>12</volume>
          (
          <year>2024</year>
          ), pp.
          <fpage>157</fpage>
          -
          <lpage>173</lpage>
          . doi:
          <volume>10</volume>
          .1162/tacl_a_
          <fpage>00638</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lyu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sanner</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Bouadjenek</surname>
          </string-name>
          .
          <article-title>“A Workflow Analysis of Context-driven Conversational Recommendation”</article-title>
          .
          <source>In: Proceedings of the Web Conference 2021. WWW '21: The Web Conference</source>
          <year>2021</year>
          . Ljubljana Slovenia: ACM, Apr.
          <volume>19</volume>
          ,
          <year>2021</year>
          , pp.
          <fpage>866</fpage>
          -
          <lpage>877</lpage>
          . isbn:
          <fpage>978</fpage>
          -1-
          <fpage>4503</fpage>
          - 8312-7. doi:
          <volume>10</volume>
          .1145/3442381.3450123.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>D.</given-names>
            <surname>Massimo</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Ricci</surname>
          </string-name>
          . “
          <article-title>Building Efective Recommender Systems for Tourists”</article-title>
          .
          <source>In: AI Magazine 43.2 (June</source>
          <year>2022</year>
          ), pp.
          <fpage>209</fpage>
          -
          <lpage>224</lpage>
          . issn:
          <fpage>0738</fpage>
          -
          <lpage>4602</lpage>
          ,
          <fpage>2371</fpage>
          -
          <lpage>9621</lpage>
          . doi:
          <volume>10</volume>
          .1002/aaai.12057.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>S.</given-names>
            <surname>Meyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Tam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ton</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Ren</surname>
          </string-name>
          .
          <article-title>A Comparison of LLM Finetuning Methods &amp; Evaluation Metrics with Travel Chatbot Use Case</article-title>
          .
          <year>2024</year>
          . arXiv:
          <volume>2408</volume>
          .03562 [cs.CL].
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>S.</given-names>
            <surname>Moradizeyveh</surname>
          </string-name>
          .
          <source>Intent Recognition in Conversational Recommender Systems</source>
          .
          <year>2022</year>
          . arXiv:
          <volume>2212</volume>
          . 03721 [cs.CL].
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>F.</given-names>
            <surname>Nagy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Haroun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Abdel-Kader</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Keshk</surname>
          </string-name>
          .
          <article-title>“A Review for Recommender System Models and Deep Learning”</article-title>
          . In: IJCI.
          <source>International Journal of Computers and Information 8</source>
          .2 (
          <issue>Dec</issue>
          . 1,
          <year>2021</year>
          ), pp.
          <fpage>170</fpage>
          -
          <lpage>176</lpage>
          . issn:
          <fpage>2735</fpage>
          -
          <lpage>3257</lpage>
          . doi:
          <volume>10</volume>
          .21608/ijci.
          <year>2021</year>
          .
          <volume>207864</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [34]
          <article-title>OpenAI et al</article-title>
          .
          <source>GPT-4 Technical Report</source>
          .
          <year>2024</year>
          . arXiv:
          <volume>2303</volume>
          .08774 [cs.CL].
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>S.</given-names>
            <surname>Parikh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tiwari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tumbade</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Q.</given-names>
            <surname>Vohra</surname>
          </string-name>
          . “
          <article-title>Exploring Zero and Few-shot Techniques for Intent Classification”</article-title>
          .
          <source>In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>5</volume>
          :
          <string-name>
            <given-names>Industry</given-names>
            <surname>Track</surname>
          </string-name>
          <article-title>)</article-title>
          .
          <source>Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>5</volume>
          : Industry Track). Toronto, Canada: Association for Computational Linguistics,
          <year>2023</year>
          , pp.
          <fpage>744</fpage>
          -
          <lpage>751</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .acl-industry.
          <volume>71</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>J.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>“A user preference and intent extraction framework for explainable conversational recommender systems”</article-title>
          .
          <source>In: Companion Proceedings of the 2023 ACM SIGCHI Symposium on Engineering Interactive Computing Systems</source>
          .
          <year>2023</year>
          , pp.
          <fpage>16</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>J.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Zhang,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          . “
          <string-name>
            <surname>RAG-Optimized Tibetan Tourism LLMs: Enhancing Accuracy</surname>
          </string-name>
          and
          <article-title>Personalization”</article-title>
          .
          <source>In: Proceedings of the 2024 7th International Conference on Artificial Intelligence and Pattern Recognition . AIPR '24. Association for Computing Machinery</source>
          ,
          <year>2025</year>
          , pp.
          <fpage>1185</fpage>
          -
          <lpage>1192</lpage>
          . isbn:
          <volume>9798400717178</volume>
          . doi:
          <volume>10</volume>
          .1145/3703935.3704112.
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>J.</given-names>
            <surname>Rao</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          . “RAMO:
          <article-title>Retrieval-Augmented Generation for Enhancing MOOCs Recommendations”</article-title>
          . In:
          <article-title>Joint Proceedings of the Human-Centric eXplainable AI in Education and the Leveraging Large Language Models for Next Generation Educational Technologies Workshops (HEXEDL3MNGET 2024) co-located with 17th International Conference on Educational Data Mining (EDM 2024)</article-title>
          . Vol.
          <volume>3840</volume>
          . CEUR Workshop Proceedings. CEUR-WS.org,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Asaadi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Küch</surname>
          </string-name>
          . “
          <article-title>Knowledge distillation meets few-shot learning: An approach for few-shot intent classification within and across domains”</article-title>
          .
          <source>In: Proceedings of the 4th Workshop on NLP for Conversational AI</source>
          .
          <year>2022</year>
          , pp.
          <fpage>108</fpage>
          -
          <lpage>119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>O.</given-names>
            <surname>Şerbetçi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. D.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>U.</given-names>
            <surname>Leser</surname>
          </string-name>
          . “
          <article-title>HU-WBI at BioASQ12B Phase A: Exploring Rank Fusion of Dense Retrievers and Re-rankers”</article-title>
          .
          <source>In: Working Notes of the Conference and Labs of the Evaluation Forum (CLEF</source>
          <year>2024</year>
          ). Vol.
          <volume>3740</volume>
          . CEUR Workshop Proceedings.
          <year>2024</year>
          , pp.
          <fpage>269</fpage>
          -
          <lpage>275</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>S.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chang</surname>
          </string-name>
          . “
          <article-title>TravelRAG: A Tourist Attraction Retrieval Framework Based on Multi-Layer Knowledge Graph”</article-title>
          .
          <source>In: ISPRS International Journal of GeoInformation 13.11</source>
          (
          <year>2024</year>
          ). issn:
          <fpage>2220</fpage>
          -
          <lpage>9964</lpage>
          . doi:
          <volume>10</volume>
          .3390/ijgi13110414.
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>G.</given-names>
            <surname>Team</surname>
          </string-name>
          et al.
          <source>Gemini</source>
          <volume>1</volume>
          .
          <article-title>5: Unlocking multimodal understanding across millions of tokens of context</article-title>
          .
          <year>2024</year>
          . arXiv:
          <volume>2403</volume>
          .05530 [cs.CL].
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [43]
          <article-title>Testset Generation for RAG - Ragas</article-title>
          . url: https://docs.ragas.io/en/stable/concepts/test_data_ generation/rag/ (visited on 07/21/
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhou</surname>
          </string-name>
          . “
          <article-title>MINILM: deep self-attention distillation for task-agnostic compression of pre-trained transformers”</article-title>
          .
          <source>In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS '20</source>
          . Vancouver, BC, Canada: Curran Associates Inc.,
          <year>2020</year>
          . isbn:
          <volume>9781713829546</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ichter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. H.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          . “
          <article-title>Chain-of-thought prompting elicits reasoning in large language models”</article-title>
          .
          <source>In: Proceedings of the 36th International Conference on Neural Information Processing Systems</source>
          . NIPS '
          <fpage>22</fpage>
          . New Orleans, LA, USA: Curran Associates Inc.,
          <year>2022</year>
          . isbn:
          <volume>9781713871088</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [44] [46]
          <string-name>
            <given-names>W.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hay</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Roth</surname>
          </string-name>
          . “
          <article-title>Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach”</article-title>
          .
          <source>In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          . Ed. by
          <string-name>
            <given-names>K.</given-names>
            <surname>Inui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Wan</surname>
          </string-name>
          . Hong Kong, China: Association for Computational Linguistics, Nov.
          <year>2019</year>
          , pp.
          <fpage>3914</fpage>
          -
          <lpage>3923</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          -1404.
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zheng</surname>
          </string-name>
          , W.-L. Chiang,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. P.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Stoica.</surname>
          </string-name>
          “
          <article-title>Judging LLM-as-a-judge with MT-bench and Chatbot Arena”</article-title>
          .
          <source>In: Proceedings of the 37th International Conference on Neural Information Processing Systems</source>
          . NIPS '
          <fpage>23</fpage>
          . New Orleans, LA, USA: Curran Associates Inc.,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>