=Paper= {{Paper |id=Vol-3784/paper1 |storemode=property |title=PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents |pdfUrl=https://ceur-ws.org/Vol-3784/paper1.pdf |volume=Vol-3784 |authors=Saber Zerhoudi,Michael Granitzer |dblpUrl=https://dblp.org/rec/conf/ir-rag/ZerhoudiG24 }} ==PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents== https://ceur-ws.org/Vol-3784/paper1.pdf
                         PersonaRAG: Enhancing Retrieval-Augmented Generation
                         Systems with User-Centric Agents
                         Saber Zerhoudi1 , Michael Granitzer1
                         1
                             University of Passau, Passau, Germany


                                           Abstract
                                           Large Language Models (LLMs) struggle with generating reliable outputs due to outdated knowledge and hallucinations. Retrieval-
                                           Augmented Generation (RAG) models address this by enhancing LLMs with external knowledge, but often fail to personalize the retrieval
                                           process. This paper introduces PersonaRAG, a novel framework incorporating user-centric agents to adapt retrieval and generation
                                           based on real-time user data and interactions. Evaluated across various question answering datasets, PersonaRAG demonstrates
                                           superiority over baseline models, providing tailored answers to user needs. The results suggest promising directions for user-adapted
                                           information retrieval systems. Findings and resources are available at https://github.com/padas-lab-de/ir-rag-sigir24-persona-rag.

                                           Keywords
                                           User interactions, Retrieval-Augmented Generation (RAG), Personalized Information Retrieval, Multi-Agent RAG



                         1. Introduction                                                                                                 In this study, we present PersonaRAG, an innovative
                                                                                                                                      methodology that extends traditional RAG frameworks by
                         Large Language Models (LLMs) such as GPT-4 [2] and                                                           incorporating user-centric agents into the retrieval process.
                         LLaMA 3 [3] have significantly advanced the field of natural                                                 This approach addresses the previously mentioned limita-
                         language processing (NLP) by demonstrating impressive                                                        tions by promoting active engagement with retrieved con-
                         performance across various tasks and exhibiting emergent                                                     tent and utilizing dynamic, real-time user data to continu-
                         abilities that push the boundaries of artificial intelligence [4].                                           ously refine and personalize interactions. PersonaRAG aims
                         However, these models face challenges such as generating                                                     to enhance the precision and relevance of LLM outputs,
                         unreliable outputs due to issues like hallucination and out-                                                 adapting dynamically to user-specific needs while maintain-
                         dated parametric memories [5].                                                                               ing full transparency regarding the personalization process.
                            Retrieval-Augmented Generation (RAG) models have                                                             Our experiments, conducted using GPT-3.5, develop the
                         shown promise in addressing these issues by integrating ex-                                                  PersonaRAG model and evaluate its performance across
                         ternally retrieved information to support more effective per-                                                various question answering datasets. The results indicate
                         formance on complex, knowledge-intensive tasks [6]. De-                                                      that PersonaRAG achieves an improvement of over 5% in
                         spite these advancements, the deployment of RAG systems                                                      accuracy compared to baseline models. Furthermore, Per-
                         within broader AI frameworks continues to face significant                                                   sonaRAG demonstrates an ability to adapt responses based
                         challenges, particularly in handling noise and irrelevance                                                   on user profiles and information needs, enhancing the per-
                         in retrieved data [7].                                                                                       sonalization of results. Additional analysis shows that the
                            A key limitation of existing RAG systems is their inability                                               principles underlying PersonaRAG can be generalized to
                         to adapt outputs to users’ specific informational and contex-                                                different LLM architectures, such as Llama 3 70b and Mix-
                         tual needs. Personalized techniques in information retrieval,                                                ture of Experts (MoE) 8x7b [12]. These architectures benefit
                         such as adaptive retrieval based on user interaction data and                                                from the integration of external knowledge facilitated by
                         context-aware strategies, are increasingly recognized as es-                                                 PersonaRAG, with improvements exceeding 10% in some
                         sential for enhancing user interaction and satisfaction [8, 9].                                              cases. This evidence indicates that PersonaRAG not only
                         These methods aim to refine the retrieval process dynam-                                                     contributes to the progress of RAG systems but also provides
                         ically, tailoring it more closely to individual user profiles                                                notable advantages for various LLM applications, signify-
                         and situational contexts [10].                                                                               ing a meaningful step forward in the development of more
                            The integration of agent-based systems with personal-                                                     intelligent and user-adapted information retrieval systems.
                         ized RAG architectures presents a compelling avenue for
                         research. Such systems utilize a multi-agent framework
                         to simulate complex, adaptive interactions tailored to user-                                                 2. Related Work
                         specific requirements [11]. By embedding intelligent, user-
                         oriented agents within the RAG framework, these systems                                                      Retrieval-Augmented Generation (RAG) systems have
                         can evolve into more sophisticated tools that not only re-                                                   emerged as a significant advancement in natural language
                         trieve relevant information but also align it closely with the                                               processing and machine learning, enhancing language mod-
                         user’s specific preferences and contexts in real-time. Im-                                                   els by integrating external knowledge bases to improve per-
                         portantly, the personalization strategy employed in these                                                    formance across various tasks, such as question answering,
                         systems is fully transparent to the user, ensuring that the                                                  dialog understanding, and code generation [6, 13]. These
                         user is aware of how their information is being used to tailor                                               systems employ dense retrievers to pull relevant informa-
                         the results.                                                                                                 tion, which the language model then uses to generate re-
                                                                                                                                      sponses. However, the development of RAG systems and
                         Information Retrieval’s Role in RAG Systems (IR-RAG) workshop at SIGIR,                                      their integration within broader artificial intelligence frame-
                         2024, Washington D.C., USA                                                                                   works is an ongoing area of research, with several challenges
                         Envelope-Open saber.zerhoudi@uni-passau.de (S. Zerhoudi);                                                    and opportunities for improvement.
                         michael.granitzer@uni-passau.de (M. Granitzer)
                         Orcid 0000-0003-2259-0462 (S. Zerhoudi); 0000-0003-3566-5507
                                                                                                                                         Recent developments in RAG systems have focused on re-
                         (M. Granitzer)                                                                                               fining these models to better handle the noise and irrelevant
                                       © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                       Attribution 4.0 International (CC BY 4.0).
                                                                                                                                      information often retrieved during the process. Xu et al.

CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
                                                                 needs. For instance, Jeong et al. [17] proposed adaptive
                                                                 retrieval strategies that dynamically adjust the retrieval pro-
                                                                 cess based on the complexity of the query and the user’s
                                                                 historical interaction data. These personalized approaches
                                                                 not only improve user satisfaction but also increase the effi-
                                                                 ciency of information retrieval by reducing the time users
                                                                 spend sifting through irrelevant information.
                                                                    The integration of personalized techniques with agent-
                                                                 based systems provides a promising pathway to augment
                                                                 the capabilities of RAG systems. Agent-based systems, par-
                                                                 ticularly in the form of LLM-Based Multi-Agent Frame-
                                                                 works [18], enable the simulation of complex interactions
                                                                 that can lead to more nuanced and contextually appropri-
                                                                 ate outputs. By incorporating multi-agent systems into
                                                                 RAG frameworks, there is potential for developing more
                                                                 robust and adaptive retrieval mechanisms that can handle
                                                                 a broader range of queries and generate more accurate re-
                                                                 sponses, closely tailored to the specific needs and contexts
                                                                 of individual users.
                                                                    In conclusion, while significant progress has been made
                                                                 in enhancing the effectiveness and personalization of RAG
                                                                 systems, ongoing research is crucial to address their existing
                                                                 limitations and expand their applications. The integration of
                                                                 personalized information retrieval and agent-based enhance-
                                                                 ments represents a promising avenue for further enhancing
                                                                 the adaptability and accuracy of RAG systems, potentially
                                                                 leading to intelligent information retrieval tailored to the
                                                                 specific needs of users.


                                                                 3. Methodology
                                                                 In this section, we present the methodology underlying our
                                                                 PersonaRAG approach, which aims to enhance the ability
                                                                 of Language Large Models (LLMs) to actively engage with,
                                                                 understand, and leverage user profile information for per-
                                                                 sonalized content generation. We begin by discussing the
                                                                 fundamental concepts of Retrieval-Augmented Generation
Figure 1: Illustrations of Various RAG Models. Vanilla RAG and
                                                                 (RAG) models (Section 3.1) and then introduce our Per-
Chain-of-Thought [1] use passive learning, while PersonaRAG
involves user-centric knowledge acquisition.                     sonaRAG technique, which encourages LLMs to actively
                                                                 assimilate knowledge from live search sessions (Section
                                                                 3.2).

[13] addressed this issue by employing natural language
inference models to select pertinent sentences, thereby en-      3.1. Fundamentals of Retrieval-Augmented
hancing the RAG’s robustness. Additionally, advancements              Generation (RAG) Models
have been made in adaptively retrieving information, with        State-of-the-art RAG models, as described in previous stud-
systems like those proposed by Jiang et al. [14] dynamically     ies [19, 20, 21], employ retrieval systems to identify a set
fetching passages that are most likely to improve generation     of passages 𝐷 = {𝑑1 , … , 𝑑𝑛 } when given a query q. These
accuracy.                                                        passages are intended to enhance the generative capabili-
   Despite these improvements, RAG systems still face limi-      ties of LLMs by providing them with contextually relevant
tations, particularly in adapting their output to the user’s     information.
specific profile, such as their information needs or intel-         Early versions of RAG models typically employ a tra-
lectual knowledge. This limitation stems from the current        ditional retrieval-generation framework, in which the re-
design of most RAG systems, which do not typically incor-        trieved data set 𝐷 = {𝑑1 , … , 𝑑𝑛 } is directly fed into LLMs
porate user context or personalized information retrieval        to generate responses to the query 𝑞. However, these pas-
strategies [15]. Consequently, there exists a gap between        sages often contain irrelevant information, and the direct
the general effectiveness of RAG systems and their applica-      utilization approach in RAG has been shown to restrict the
bility in personalized user experiences, where context and       potential benefits of the RAG framework [22]. This limi-
individual user preferences play a crucial role.                 tation has sparked further discussion on how to improve
   Personalization in information retrieval is increasingly      LLMs by integrating retrieval results and outputs generated
recognized as essential for enhancing user interaction and       by the models themselves [23].
satisfaction [16]. Techniques such as user profiling, context-
aware retrieval, and adaptive feedback mechanisms are com-
monly employed to tailor search results to individual users’
         Figure 2: Overview of Our PersonaRAG Model showcasing the dynamic interaction among specialized agents within the
         system, facilitated by a global message pool for structured communication. The diagram illustrates the flow from user query
         input through various agents, including User Profile, Context Retrieval, Session Analysis, Document Ranking, and Feedback
         Agents, highlighting their contributions to real-time adaptation and personalized content generation by integrating live user
         data and feedback for continuous improvement and contextually relevant search experiences.



3.2. PersonaRAG: RAG with User-Centric                               processes. Specifically, we prompt the system to adjust its
     Agents                                                          query responses based on an initial understanding of the
                                                                     user’s needs and refine these responses as more user data
Drawing from the principles of adaptive learning and user-           becomes available. This approach not only personalizes the
centered design, we develop a new PersonaRAG architecture            search results but also helps in correcting any misalignments
to enable IR systems to dynamically learn from and adapt             or errors in real-time.
to user behavior in real-time. As shown in Figure 2, Per-               PersonaRAG employs a highly specialized agent architec-
sonaRAG introduces a three-step pipeline: retrieval, user            ture, with each agent focusing on a specific aspect of the
interaction analysis, and cognitive dynamic adaptation. Un-          information retrieval process. All agents utilize in-context
like traditional IR models that statically respond to queries,       learning, i.e., prompting, to perform their designated tasks.
PersonaRAG focuses on leveraging live user data to contin-           This role specialization allows for the efficient decompo-
ually refine its understanding and responses without the             sition of complex user queries into manageable tasks [25].
need for manual retraining.                                          To foster this, we engage the IR system as five specialized
                                                                     agents to analyze user interactions based on retrieved data.
3.2.1. User Interaction Analysis                                     At present, the focus is on the functionality and interaction
                                                                     of these agents rather than their individual performance
To understand user behavior from live interactions, Person-
                                                                     metrics.
aRAG treats the IR system as a cognitive structure capable of
receiving, interpreting, and acting upon user feedback [24].
Mimicking human learning behaviors, we establish four                User Profile Agent This component manages and up-
distinct agents within the system dedicated to analyzing             dates user profile data, incorporating historical user inter-
user interactions from different perspectives: engagement            actions and preferences [26, 27]. It monitors how users
tracking, preference analysis, context understanding, and            interact with search results, such as click-through rates and
feedback integration. These agents’ roles are detailed in            navigation paths. The User Profile Agent helps the system
Section 3.2.2.                                                       understand what captures user interest and leads to deeper
                                                                     engagement, enabling personalized search experiences.
3.2.2. Cognitive Dynamic Adaptation
                                                                     Contextual Retrieval Agent This agent is responsible
Following adaptive learning principles, we employ a dy-              for the initial retrieval of documents based on the user’s
namic adaptation mechanism to assist the IR system in uti-           current query. It accesses both a traditional search index
lizing real-time user data for continuous improvement. This          and a more dynamic context-aware system that can con-
mechanism facilitates the integration of insights gained             sider broader aspects of the query environment. It utilizes
from User Interaction Analysis into the system’s retrieval           user profile data to modify and refine search queries or to
prioritize search results. For instance, if a user consistently   protocols [28]. The process involves the User Profile Agent,
engages more with certain types of documents or topics,           Contextual Retrieval Agent, Live Session Agent, Document
the retrieval agent can boost those document types in the         Ranking Agent, and Feedback Agent working together to
search results, ensuring that the most relevant information       refine search queries, prioritize relevant results, and im-
is presented to the user.                                         prove document scoring and re-ranking based on user pro-
                                                                  file, session-specific contexts, and feedback.
Live Session Agent This agent analyzes the current ses-              PersonaRAG’s modular design allows for flexibility in the
sion in real-time, observing user actions such as clicks, time    system setup, enabling researchers to focus on the most rele-
spent on documents, modifications to the query, and any           vant aspects of the user’s profile, session, and feedback data.
feedback provided. It creates a session-specific context          Agents work collaboratively by utilizing content from the
model that captures the user’s immediate needs and inter-         Global Message Pool, which serves as a central hub for inter-
ests. The real-time data collected by this agent is used to       agent communication [28], eliminating inefficiencies and
adjust the ongoing session, potentially re-ranking search         enabling agents to access or update information as required.
results or suggesting new queries based on the user’s behav-         The Feedback Agent collects and analyzes implicit and
ior and preferences. Additionally, the Live Session Agent         explicit user feedback to generate insights into the effective-
updates the user profile with new insights gleaned from           ness of retrieval strategies and document relevance. This
the session, allowing for a more personalized and efficient       feedback is used to make dynamic adjustments to the sys-
search experience in future interactions.                         tem, refining retrieval methods and altering the weighting of
                                                                  user profile factors. Through this iterative process, Person-
Document Ranking Agent This agent is responsible                  aRAG continuously adapts and improves its performance,
for re-ranking the documents retrieved by the Contextual          enhancing the accuracy and user satisfaction of the retrieval
Retrieval Agent. It integrates insights from both the User        results [29].
Profile Agent and the Live Session Agent to score and order
the documents more effectively. By considering the user’s         4. Experimental Setups
historical preferences and their current session behavior,
the Document Ranking Agent ensures that the most rele-            In this section, we present the experimental setup employed
vant and valuable documents are presented to the user in          in our study, including the datasets, baseline models, evalu-
a prioritized manner. This agent continuously adapts its          ation metrics, and implementation details. We also provide
ranking algorithms based on the feedback received from the        an overview of the prompts used in our experiments.
user and the insights provided by the other agents in the
system.
                                                                  4.1. Datasets
Feedback Agent This agent gathers implicit and explicit           Our experiments are conducted on three widely used single-
feedback during and after user interactions. Implicit feed-       hop benchmark datasets in the field of Information Retrieval
back includes behavioral data like time spent on documents,       (IR): NaturalQuestions (NQ) [30], TriviaQA [31], and We-
click counts, and navigation patterns. Explicit feedback          bQuestions (WebQ) [32]. NQ is a well-known dataset in
involves direct user input on document relevance and qual-        Natural Language Understanding (NLU), consisting of struc-
ity, collected through ratings, surveys, or comments. The         tured questions and corresponding Wikipedia pages anno-
agent uses this information to train and refine models for        tated with long and short answers. TriviaQA comprises
other agents, particularly the Document Ranking Agent.            question-answer pairs collected from trivia and quiz-league
This process enhances the system’s ability to anticipate user     websites, while WebQ consists of questions selected using
needs and deliver relevant documents based on accumulated         the Google Suggest API, with answers being entities in Free-
feedback and insights.                                            base.
   By dynamically integrating insights from the User Pro-            Table 1 summarizes the datasets used in our initial study.
file Agent, Contextual Retrieval Agent, Live Session Agent,       Due to the high cost of using language models and the large
Document Ranking Agent, and Feedback Agent into the IR            number of API calls required, we randomly sampled 500
processes, PersonaRAG not only adapts to immediate user           questions from each raw dataset to create more manageable
needs but also evolves over time to better anticipate and         subsets for our experiments. While this sampling approach
meet user expectations. This multi-agent approach enables         limits the scope of our study, it allows us to conduct an
PersonaRAG to embody a truly adaptive and user-focused            initial investigation into the performance of different RAG
information retrieval system, leveraging specialized agents       systems on these datasets. We acknowledge that future
to analyze user interactions from different behavioral per-       work with larger sample sizes and more comprehensive ex-
spectives and deliver highly personalized and contextually        periments will be necessary to draw definitive conclusions.
relevant search experiences. The inclusion of the Document        Nonetheless, we believe this preliminary study provides
Ranking Agent ensures that the most pertinent documents           valuable insights into the relative strengths and weaknesses
are identified and presented to users, further enhancing the      of the tested RAG approaches.
system’s ability to effectively satisfy user information needs.
                                                                  4.2. Models
3.3. PersonaRAG Operational Workflow                              We compare PersonaRAG with several baseline models, in-
The PersonaRAG framework employs a structured work-               cluding prompt learning and RAG models. The prompt tem-
flow that allows for sequential and parallel processing of        plates used in user interaction analysis and dynamic adap-
tasks, ensuring clarity and consistency in communication          tation are presented in Section 4.4. Initially, the question-
between agents through well-defined data structures and           answering (QA) instruction is fed to ChatGPT to conduct
     Dataset     #Query     #Corpus      Sampling Rate           of IR.
     NQ           8,757       79,168           5.7%
     TriviaQA     8,837       78,785           5.7%              4.4. Implementation Details
     WebQ         2,032        3,417          24.6%
                                                                 For a fair comparison and following the work of Mallen et al.
Table 1                                                          [35] and Trivedi et al. [37], the same retriever, a term-based
Summary of datasets. Each dataset consists of randomly sampled   sparse retrieval model known as BM25 [38], is used across all
500 questions from the raw dataset.                              different models. The retrieval model is implemented using
                                                                 the OpenMatch toolkit [39]. For the external document
                                                                 corpus, the KILT-Wikipedia corpus preprocessed by Petroni
the vanilla answer generation model. Following the work of       et al. [40] is used, and the top-k relevant documents are
Wei et al. [33], the Chain-of-Thought model is implemented,      retrieved.
which generates question rationale results to produce the           Regarding the LLMs used to generate answers, the Llama
final results. Additionally, the Guideline model serves as       3 model instruct (ref) with 70b parameters, Mixture of Ex-
a baseline, generating problem-solving steps and guiding         perts (MoE) 8x7b (ref), and the GPT-3.5 model (gpt-3.5-
Language Models (LLMs) to generate the answer.                   turbo-0125) are employed. For the retrieval-augmented
   For the RAG-based baselines, two models are imple-            LLM design, the implementation details from Trivedi et al.
mented: vanilla RAG and Chain-of-Thought, which include          [37] are followed, which include input prompts, instruc-
utilizing raw retrieved passages (CoT with Passage) and          tions, and the number of test samples for evaluation (e.g.,
refining the passages as notes (CoT with Note). The vanilla      500 samples per dataset).
RAG model directly feeds the top-ranked passages to the
LLM. The Chain-of-Note model [1] is also implemented,
                                                                 4.5. Prompts Used in PersonaRAG
which refines and summarizes the retrieved passages for
generation. Inspired by Self-RAG Asai et al. [34], the Self-     This subsection presents the prompt templates employed in
Rerank model is conducted, which filters out unrelated con-      the construction of the PersonaRAG model. The prompts
tents without fine-tuning LLMs.                                  utilized in the User Interaction Analysis and Cognitive Dy-
                                                                 namic Adaptation components are detailed below. The
4.3. Evaluation Metrics                                          prompt templates used by the baseline models are available
                                                                 in the project repository 1 . In the templates, {question}
When evaluating adaptive models, it is crucial to consider       represents the input question, {global_memory} the Global
both task performance and user-centric adaptability simul-       Message Pool, while {passages} denotes the retrieved pas-
taneously, along with their trade-offs. Therefore, the results   sages. Additionally, {cot_answer} is populated with the
are reported using different metrics, some of which measure      output generated by the Chain-of-Thought model.
effectiveness and others measure efficiency.                        The placeholder {user_profile_answer} is filled
   For effectiveness, accuracy is used, following the standard   with the response produced by the User Profile agent
evaluation protocol in the field of Information Retrieval        model.       Respectively, {contextual_answer} corre-
(IR) [35, 36, 34]. Accuracy assesses whether the predicted       sponds to the Contextual Retrieval agent model,
answer contains the ground-truth answer. Both the outputs        {live_session_answer} to the Live Session agent
of the Language Learning Model (LLM) and golden answers          model, {document_ranking_answer} to the Document
are converted to lowercase, and string matching (StringEM)       Ranking agent model, and {feedback_answer} to the
is performed between each golden answer and the model            Feedback agent model.
prediction to calculate accuracy.
   To evaluate user-centric adaptability, the BLEU-2 score is    4.5.1. Prompts Used in User Interaction Analysis
measured to assess the text similarity between different RAG
and baseline setups and how well the generated answers           User Profile Agent
resemble each other. This metric provides insights into               Your task is to help the User Profile Agent
the system’s ability to generate consistent and coherent              improve its understanding of user preferences
responses across various configurations. Additionally, the            based on ranked document lists and the shared
average sentence length and the average number of syllables           global memory pool.
of the answers from different RAG setups are reported as
                                                                      Question: {question}
a post-hoc analysis. These measures validate whether the
                                                                      Passages: {passages}
RAG system effectively adjusts its responses based on user
                                                                      Global Memory: {global_memory}
knowledge levels, ensuring that the generated answers are
tailored to the user’s understanding and expertise.                   Task Description:
   Combining these evaluation strategies provides a com-              From the provided passages and global memory
prehensive view of both the effectiveness and user-centric            pool, analyze clues about the user's search
adaptability of the RAG system. The accuracy metric en-               preferences. Look for themes, types of
sures that the system generates correct answers, while the            documents, and navigation behaviors that reveal
BLEU-2 score and post-hoc analysis of sentence length and             user interest. Use these insights to recommend
syllable count confirm the system’s ability to adapt to user          how the User Profile Agent can refine and expand
                                                                      the user profile to deliver better-personalized
knowledge levels. As the understanding of user needs and
                                                                      results.
system capabilities evolves, it is essential to continuously
refine these metrics to maintain the RAG system’s effective-
ness in delivering personalized, context-aware responses
that cater to the diverse requirements of users in the field     1
                                                                     https://github.com/padas-lab-de/ir-rag-sigir24-persona-rag
Contextual Retrieval Agent
                                                     Task Description:
 You are a search technology expert guiding the      Using the retrieved passages and global memory
 Contextual Retrieval Agent to deliver context-      pool, identify methods for collecting implicit
 aware document retrieval.                           and explicit user feedback. Suggest ways to
                                                     refine feedback mechanisms to align with user
 Question: {question}                                preferences, such as ratings, surveys, or
 Passages: {passages}                                behavioral data. Your recommendations should
 Global Memory: {global_memory}                      guide the Feedback Agent in updating other
                                                     agents' models for more personalized and
 Task Description:                                   relevant results.
 Using the global memory pool and the retrieved
 passages, identify strategies to refine document
 retrieval. Highlight how user preferences,         Global Message Pool
 immediate needs, and global insights can
                                                     You are responsible for maintaining and
 be leveraged to adjust search queries and
                                                     enriching the Global Message Pool, serving
 prioritize results that align with the user's
                                                     as a central hub for inter-agent communication.
 interests. Ensure the Contextual Retrieval Agent
 uses this shared information to deliver more
                                                     Question: {question}
 relevant and valuable results.
                                                     Agent Responses: {agent_responses}
                                                     Existing Global Memory: {global_memory}
Live Session Agent
                                                     Task Description:
 Your expertise in session analysis is required      Using the responses from individual agents
 to assist the Live Session Agent in dynamically     and the existing global memory, consolidate
 adjusting results.                                  key insights into a shared repository.
                                                     Your goal is to organize a comprehensive
 Question: {question}                                message pool that includes agent-specific
 Passages: {passages}                                findings, historical user preferences, session-
 Global Memory: {global_memory}                      specific behaviors, search queries, and user
                                                     feedback. This structure should provide
 Task Description:                                   all agents with meaningful data points and
 Examine the retrieved passages and information      strategic recommendations, reducing redundant
 in the global memory pool. Determine how the        communication and improving the system's overall
 Live Session Agent can use this data to refine      efficiency.
 its understanding of the user's immediate
 needs. Suggest ways to dynamically adjust search
 results or recommend new queries in real-time,     4.5.2. Prompts Used in Cognitive Dynamic
 ensuring that session adjustments align with              Adaptation
 user preferences and goals.
                                                    Chain-of-Thought
                                                     To solve the problem, Please think and reason
Document Ranking Agent
                                                     step by step, then answer.
 Your task is to help the Document Ranking Agent
 prioritize documents for better ranking.            Question: {question}
                                                     Passages: {passages}
 Question: {question}                                Reasoning process:
 Passages: {passages}                                1. Read the given question and passages to
 Global Memory: {global_memory}                      gather relevant information.
                                                     2. Write reading notes summarizing the key
 Task Description:                                   points from these passages.
 Analyze the retrieved passages and global           3. Discuss the relevance of the given question
 memory pool to identify ways to rank documents      and passages.
 effectively. Focus on combining historical          4. If some passages are relevant to the given
 user preferences, immediate needs, and session      question, provide a brief answer based on the
 behavior to refine ranking algorithms. Your         passages.
 insights should ensure that documents presented     5. If no passage is relevant, directly provide
 by the Document Ranking Agent are prioritized to    the answer without considering the passages.
 match user interests and search context.
                                                     Answer:

Feedback Agent
                                                    Cognitive Agent
 You are an expert in feedback collection and
 analysis, guiding the Feedback Agent to gather      Your task is to help the Cognitive Agent
 and utilize user insights.                          enhance its understanding of user insights
                                                     to continuously improve the system's responses.
 Question: {question}
 Passages: {passages}                                Question: {question}
 Global Memory: {global_memory}                      Initial Response: {cot_answer}
            Method         Setting                              Top-3                          Top-5
                                                     WebQ      TriviaQA      NQ     WebQ      TriviaQA      NQ
                           gpt-3.5-turbo-0125        59.61       97.36      43.90   62.43       97.36      41.46
            w/o RAG        Guideline                 36.53       42.10      17.07   47.21       36.84      21.95
            vanillaRAG                               38.46       78.94      36.58   50.14       81.57      41.46
                           Chain-of-Thought (CoT)    57.69       89.47      39.02   67.51       89.47      41.46
            Self-Refined   Chain-of-Note (CoN)       57.17       81.57      48.78   65.15       92.10      48.78
                           Self-Rerank (SR)          32.63       81.57      43.90   40.26       84.21      51.21
            PersonaRAG                               63.46       94.73      49.02   67.50       89.47      48.78
    Table 2
    Overall Accuracy Performance Comparison Using Top-3 and Top-5 Passages. PersonaRAG results are reported in bold.



 User Insights from Interaction Analysis:                        crucial role in efficiently extracting the necessary informa-
 User Profile Agent: {user_profile_answer},                      tion regarding the user’s information need to achieve these
 Contextual Retrieval Agent: {contextual_answer},                improvements.
 Live Session Agent: {live_session_answer},                         Furthermore, on the WebQ dataset, PersonaRAG achieved
 Document Ranking Agent:                                         accuracy scores of 63.46% and 67.50% using Top-3 and Top-5
 {document_ranking_answer},                                      passages, respectively, surpassing the vanillaRAG model by
 Feedback Agent: {feedback_answer}                               25% and 17.36%, and nearly all other baseline models (ex-
                                                                 cept for Chain-of-Thought using Top-5, which performed
  Task Description:
                                                                 equally). On the NQ dataset, PersonaRAG maintained simi-
  Verify the reasoning process in the initial
  response for errors or misalignments. Use
                                                                 larly robust performance with scores of 49.02% and 48.78%,
  insights from user interaction analysis                        outperforming all baselines (except for Chain-of-Thought
  to refine this response, correcting any                        and Self-Rerank (SR) using Top-5). This pattern was fur-
  inaccuracies and enhancing the query answers                   ther validated by experiments on other datasets, with re-
  based on user profile. Ensure that your refined                sults showing that PersonaRAG consistently outperforms
  response aligns more closely with the user's                   conventional RAG models with the capability of providing
  immediate needs and incorporates foundational or               an answer tailored to the user’s interaction and informa-
  advanced knowledge from other sources.                         tion need. The comprehensive understanding it provides
                                                                 contributes to the generation of accurate and user-centric
  Answer:
                                                                 answers across various question complexities.


5. Experimental Results and                                      5.2. Comparative Analysis of RAG
                                                                      Configurations
   Analyses
                                                                 Further experiments explored PersonaRAG’s adaptive ca-
In this section, we show the overall experimental results        pabilities (Figure 3). BLEU-2 scores compared outputs
and offer in-depth analyses of our method.                       from Chain-of-Note (consistently best outside PersonaRAG)
                                                                 with other methods. PersonaRAG showed higher similar-
5.1. Main Results                                                ity scores, indicating its ability to generate responses that
                                                                 address user needs rather than just summarizing input. Ad-
Table 2 summarizes the primary findings for PersonaRAG           ditionally, PersonaRAG provides personalized answers tai-
across various single-hop question answering datasets. The       lored to user profiles, extending beyond mere information
approach was evaluated against multiple baseline models,         provision.
including large language models (LLMs) without retrieval-           The Chain-of-Note approach demonstrated comparable
augmented generation (RAG), the conventional RAG model,          performance to the Chain-of-Thought approach, implying
and self-refined variants, such as utilizing raw retrieved       that both techniques effectively extract pertinent informa-
passages (CoT with Passage) or refining passages into notes      tion from the retrieved passages and adapt it to align with
(CoT with Note).                                                 the user’s information need.
   PersonaRAG demonstrated superior performance com-                In contrast, vanillaGPT and vanillaRAG outputs differed
pared to most of the baseline models, achieving significant      significantly from the Chain-of-Note approach, indicating
improvements over the conventional RAG (i.e., vanillaRAG)        that counterfactual cognition often leads to diverse out-
of over 10%, particularly on the WebQ dataset. It also con-      comes rather than focusing solely on query-relevant con-
sistently outperformed the ChatGPT-3.5 model, except on          tent. This suggests LLMs can construct knowledge from
TriviaQA, which we suspect is part of the model’s train-         multiple perspectives and customize responses based on
ing dataset. These results suggest PersonaRAG’s capability       user understanding.
to guide LLMs in extracting relevant information through            Post-hoc analyses of average sentence length and syllable
active learning techniques.                                      count across RAG configurations provided insights into the
   Specifically, the performance of RAG models was assessed      system’s ability to adapt responses to user comprehension
using the top 3 and 5 ranked passages. While other RAG           levels. These observations highlight PersonaRAG’s capac-
models generally benefited from more passages, Person-           ity to synthesize knowledge from various perspectives and
aRAG maintained consistent performance with either 3 or          tailor responses to different levels of user expertise.
5 passages, suggesting that 3 passages were adequate for
generating accurate answers. PersonaRAG agents played a
                                                                relevant knowledge due to limitations in their processing
                                                                capacities.
                                                                   In contrast, the PersonaRAG method provides notable
                                                                performance improvements: over 8% for LLaMA3-70B and
                                                                more than 10% for MoE-8x7b across all datasets, underscor-
                                                                ing its effectiveness. The PersonaRAG methodology distin-
                                                                guishes itself from the Chain-of-Note approach by offering a
                                                                cognitive framework that connects retrieved passages with
                                                                prior knowledge. This framework models the instructor’s
                                                                (GPT-3.5) reasoning process, guiding the student models
                                                                (LLaMA3-70B and MoE-8x7b) to better understand knowl-
                                                                edge retrieved from passages. The results demonstrate that
            (a) Text Similarity for Top-3 Passages              the LLMs are capable of selecting appropriate passages to
                                                                build more accurate responses, highlighting the benefits of
                                                                the PersonaRAG approach for improving generalization.

                                                                  Question: Who stole the Mona Lisa from the Louvre in
                                                                  1911?
                                                                  Answer: ”Vincenzo Peruggia”
                                                                  Chain-of-Thought:
                                                                  1. Read the given question and passages to gather relevant
                                                                  information.
                                                                  2. Write reading notes summarizing the key points.
                                                                  3. Discuss the relevance of the given question and passages.
                                                                  4. If some passages are relevant to the given question,
                                                                  provide a brief answer based on the passages.
                                                                  5. If no passage is relevant, directly provide the answer
            (b) Text Similarity for Top-5 Passages
                                                                  without considering the passages.
Figure 3: Text Similarity between Chain-of-Note (CoN) and         Step 1: Passage Retrieval
Other Methods Using BLEU-2 Score for Evaluation, with Normal-     1. The Mona Lisa was stolen from the Louvre Museum
ized Average Sentence Length and Average Syllable Count.          while the museum was closed for cleaning. Witnesses
                                                                  reported that a tall, stout individual was carrying what
                                                                  appeared to be a large panel covered with a horse blanket.
 Method                    WebQ         TriviaQA       NQ         Two years later, Vincenzo Peruggia was arrested in Florence.
 LLaMA3-70B                                                       2. On August 22, 1911, the Mona Lisa was stolen from the
 w/o RAG                    45.25         82.17       38.95       Louvre. Vincenzo Peruggia claimed he stole the painting to
 vanillaRAG                 55.14         85.02       40.37       return it to Italy.
 Chain-of-Thought           60.52         88.72       45.10       3. In 1911, Vincenzo Peruggia, a Louvre employee, stole the
 Chain-of-Note              62.67         89.37       48.25       Mona Lisa.
 Self-Rerank                54.25         84.50       47.77       Step 2: User Interaction Analysis
 PersonaRAG                 66.09         92.12       50.85       User Insights from Interaction Analysis:
 MoE-8x7b                                                         - User Profile Agent: The user is interested in art heists and
 w/o RAG                    38.24         75.82       34.26       historical mysteries.
 vanillaRAG                 48.44         80.25       38.50       - Contextual Retrieval Agent: The user frequently searches
 Chain-of-Thought           54.12         85.46       42.37       for art history topics.
 Chain-of-Note              55.98         87.55       45.14       - Live Session Agent: The user reads about unsolved
 Self-Rerank                52.50         83.04       44.96       mysteries and historic crimes.
 PersonaRAG                 61.35         90.30       49.27       - Document Ranking Agent: The user favors detailed,
                                                                  chronological accounts.
Table 3                                                           - Feedback Agent: The user interacts with articles about
Overall Accuracy Performance Comparison for Top-5 Passages        famous art heists.
using LLaMA3-70B and MoE-8x7b.                                    Step 3: Cognitive Dynamic Adaptation
                                                                  The user has a consistent interest in famous art heists and
                                                                  historical mysteries. They are drawn to unsolved mysteries
5.3. Analysis on Generalization Ability                           and historic crimes. They favor detailed, chronological
                                                                  accounts, so the final response should highlight theft
This experiment evaluates the quality of knowledge con-           motives, the method, and the investigation leading to the
struction using different large language models (LLMs). As        arrest, providing comprehensive and authoritative sources
illustrated in Table 3, the PersonaRAG outcomes are used          to satisfy their engagement with articles on art heists.
to prompt open-source LLMs, specifically LLaMA3-70B and           Answer: Vincenzo Peruggia, a Louvre employee, stole the
MoE-8x7b, to generate accurate answers.                           Mona Lisa from the Louvre Museum on August 21, 1911. He
   Compared to LLMs without retrieval-augmented gener-            claimed he stole the painting to return it to Italy, and the
                                                                  theft remained unsolved for two years before he was
ation (w/o RAG), vanilla RAG and Chain-of-Note often ex-
                                                                  arrested in Florence in December 1913. The painting was
hibit lower performance. This result suggests that retrieved
                                                                  later recovered and returned to the Louvre.
passages can act as noise, adversely affecting model perfor-
mance even after refinement through note generation. One        Table 4
primary reason for this behavior is that both LLaMA3-70B        PersonaRAG Case Study.
and MoE-8x7b struggle to efficiently analyze and identify
5.4. Case Study                                                  References
Finally, we randomly sample one case in Table to demon-          [1] W. Yu, H. Zhang, X. Pan, K. Ma, H. Wang, D. Yu,
strate the effectiveness of PersonaRAG.                              Chain-of-note: Enhancing robustness in retrieval-
   The user interaction analysis mechanism effectively gen-          augmented language models, CoRR abs/2311.09210
erates comprehensive results by integrating foundational             (2023). URL: https://doi.org/10.48550/arXiv.2311.09210.
and advanced insights from user data. Retrieved pas-                 doi:10.48550/ARXIV.2311.09210 .
sages provide critical clues for answering questions, while      [2] OpenAI, GPT-4 technical report, CoRR abs/2303.08774
agent analyses summarize and illustrate the applicability            (2023).       URL:       https://doi.org/10.48550/arXiv.
of external information to user queries. The cognitive dy-           2303.08774.          doi:10.48550/ARXIV.2303.08774 .
namic adaptation module refines initial chain-of-thought             arXiv:2303.08774 .
responses using these insights, generating accurate answers.     [3] H. Touvron, T. Lavril, G. Izacard, X. Martinet,
For example, including knowledge about the ”theft of the             M. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Ham-
Mona Lisa in 1911,” ”Vincenzo Peruggia,” and ”Florence”              bro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, G. Lam-
enhances the reasoning process’s precision and detail. This          ple, Llama: Open and efficient foundation language
demonstrates PersonaRAG’s effectiveness in helping IR                models, CoRR abs/2302.13971 (2023). URL: https://doi.
agents combine external knowledge with intrinsic user data           org/10.48550/arXiv.2302.13971. doi:10.48550/ARXIV.
to produce well-informed responses.                                  2302.13971 . arXiv:2302.13971 .
                                                                 [4] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Ka-
6. Conclusion                                                        plan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sas-
                                                                     try, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger,
This paper proposes PersonaRAG, which constructs the                 T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu,
retrieval-augmentation architecture incorporating user in-           C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin,
teraction analysis and cognitive dynamic adaptation. Per-            S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish,
sonaRAG builds the user interaction agents and dynamic               A. Radford, I. Sutskever, D. Amodei, Language mod-
cognitive mechanisms to facilitate the understanding of user         els are few-shot learners, in: H. Larochelle, M. Ran-
needs and interests and enhance the system capabilities to           zato, R. Hadsell, M. Balcan, H. Lin (Eds.), Advances in
deliver personalized, context-aware responses with the in-           Neural Information Processing Systems 33: Annual
trinsic cognition of LLMs.                                           Conference on Neural Information Processing Systems
   Furthermore, PersonaRAG demonstrates effectiveness                2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
in leveraging external knowledge and adapting responses              URL: https://proceedings.neurips.cc/paper/2020/hash/
based on user profiles, knowledge levels, and information            1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
needs to support LLMs in generation tasks without fine-          [5] Y. Bang, S. Cahyawijaya, N. Lee, W. Dai, D. Su, B. Wilie,
tuning. However, this approach requires multiple calls to the        H. Lovenia, Z. Ji, T. Yu, W. Chung, Q. V. Do, Y. Xu,
LLM’s API, which can introduce additional time latency and           P. Fung, A multitask, multilingual, multimodal eval-
increase API calling costs when addressing questions. The            uation of chatgpt on reasoning, hallucination, and
process involves constructing the initial Chain-of-Thought,          interactivity, in: J. C. Park, Y. Arase, B. Hu, W. Lu,
processing the User Interaction Agents results, and execut-          D. Wijaya, A. Purwarianti, A. A. Krisnadhi (Eds.), Pro-
ing the Cognitive Dynamic Adaptation to generate the final           ceedings of the 13th International Joint Conference on
answer. Furthermore, the inputs to LLMs in this approach             Natural Language Processing and the 3rd Conference
tend to be lengthy due to the inclusion of extensive retrieved       of the Asia-Pacific Chapter of the Association for Com-
passages and the incorporation of user needs, interests, and         putational Linguistics, IJCNLP 2023 -Volume 1: Long
profile construction results. These factors can impact the ef-       Papers, Nusa Dua, Bali, November 1 - 4, 2023, Associa-
ficiency and cost-effectiveness of the PersonaRAG approach           tion for Computational Linguistics, 2023, pp. 675–718.
in practical applications of Information Retrieval (IR) sys-         URL: https://doi.org/10.18653/v1/2023.ijcnlp-main.45.
tems.                                                                doi:10.18653/V1/2023.IJCNLP- MAIN.45 .
   Future research will aim to optimize the process by reduc-    [6] P. S. H. Lewis, E. Perez, A. Piktus, F. Petroni,
ing API calls and developing concise representations of user         V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W. Yih,
profiles and retrieved information without compromising              T. Rocktäschel, S. Riedel, D. Kiela,           Retrieval-
response quality. We also plan to explore more user-centric          augmented generation for knowledge-intensive NLP
agents to better capture writing styles and characteristics          tasks, in: H. Larochelle, M. Ranzato, R. Hadsell,
of RAG users/searchers. This will enhance the system’s               M. Balcan, H. Lin (Eds.), Advances in Neural In-
ability to understand and adapt to individual preferences,           formation Processing Systems 33: Annual Confer-
improving personalization and relevance in IR tasks.                 ence on Neural Information Processing Systems 2020,
                                                                     NeurIPS 2020, December 6-12, 2020, virtual, 2020.
                                                                     URL: https://proceedings.neurips.cc/paper/2020/hash/
Acknowledgments                                                      6b493230205f780e1bc26945df7481e5-Abstract.html.
                                                                 [7] J. Chen, H. Lin, X. Han, L. Sun, Benchmarking
This work has received funding from the European Union’s             large language models in retrieval-augmented gen-
Horizon Europe research and innovation program under                 eration, in: M. J. Wooldridge, J. G. Dy, S. Natara-
grant agreement No 101070014 (OpenWebSearch.EU, https:               jan (Eds.), Thirty-Eighth AAAI Conference on Artifi-
//doi.org/10.3030/101070014).                                        cial Intelligence, AAAI 2024, Thirty-Sixth Conference
                                                                     on Innovative Applications of Artificial Intelligence,
                                                                     IAAI 2024, Fourteenth Symposium on Educational Ad-
                                                                     vances in Artificial Intelligence, EAAI 2014, February
     20-27, 2024, Vancouver, Canada, AAAI Press, 2024, pp.      [18] Y. Li, Y. Zhang, L. Sun, Metaagents: Simulating in-
     17754–17762. URL: https://doi.org/10.1609/aaai.v38i16.          teractions of human behaviors for llm-based task-
     29728. doi:10.1609/AAAI.V38I16.29728 .                          oriented coordination via collaborative generative
 [8] J. Teevan, S. T. Dumais, E. Horvitz, Personalizing              agents, CoRR abs/2310.06500 (2023). URL: https://doi.
     search via automated analysis of interests and activ-           org/10.48550/arXiv.2310.06500. doi:10.48550/ARXIV.
     ities, SIGIR Forum 51 (2017) 10–17. URL: https://doi.           2310.06500 .
     org/10.1145/3190580.3190582. doi:10.1145/3190580.          [19] Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai,
     3190582 .                                                       J. Sun, Q. Guo, M. Wang, H. Wang, Retrieval-
 [9] K. Sugiyama, K. Hatano, M. Yoshikawa, Adaptive                  augmented generation for large language models: A
     web search based on user profile constructed without            survey, CoRR abs/2312.10997 (2023). URL: https://doi.
     any effort from users, in: S. I. Feldman, M. Uretsky,           org/10.48550/arXiv.2312.10997. doi:10.48550/ARXIV.
     M. Najork, C. E. Wills (Eds.), Proceedings of the 13th          2312.10997 .
     international conference on World Wide Web, WWW            [20] Y. Huang, J. Huang, A survey on retrieval-augmented
     2004, New York, NY, USA, May 17-20, 2004, ACM,                  text generation for large language models, arXiv
     2004, pp. 675–684. URL: https://doi.org/10.1145/988672.         preprint arXiv:2404.10981 (2024).
     988764. doi:10.1145/988672.988764 .                        [21] S. Siriwardhana, R. Weerasekera, T. Kaluarachchi,
[10] G. Adomavicius, B. Mobasher, F. Ricci, A. Tuzhilin,             E. Wen, R. Rana, S. Nanayakkara, Improving the
     Context-aware recommender systems, AI Mag. 32                   domain adaptation of retrieval augmented genera-
     (2011) 67–80. URL: https://doi.org/10.1609/aimag.v32i3.         tion (RAG) models for open domain question an-
     2364. doi:10.1609/AIMAG.V32I3.2364 .                            swering,      Trans. Assoc. Comput. Linguistics 11
[11] M. J. Wooldridge, An Introduction to MultiAgent Sys-            (2023) 1–17. URL: https://transacl.org/ojs/index.php/
     tems, Second Edition, Wiley, 2009.                              tacl/article/view/4029.
[12] A. Q. Jiang, A. Sablayrolles, A. Roux, A. Mensch,          [22] J. Chen, H. Lin, X. Han, L. Sun, Benchmarking
     B. Savary, C. Bamford, D. S. Chaplot, D. de Las Casas,          large language models in retrieval-augmented gen-
     E. B. Hanna, F. Bressand, G. Lengyel, G. Bour, G. Lam-          eration, in: M. J. Wooldridge, J. G. Dy, S. Natara-
     ple, L. R. Lavaud, L. Saulnier, M. Lachaux, P. Stock,           jan (Eds.), Thirty-Eighth AAAI Conference on Artifi-
     S. Subramanian, S. Yang, S. Antoniak, T. L. Scao,               cial Intelligence, AAAI 2024, Thirty-Sixth Conference
     T. Gervet, T. Lavril, T. Wang, T. Lacroix, W. E.                on Innovative Applications of Artificial Intelligence,
     Sayed, Mixtral of experts, CoRR abs/2401.04088                  IAAI 2024, Fourteenth Symposium on Educational Ad-
     (2024). URL: https://doi.org/10.48550/arXiv.2401.04088.         vances in Artificial Intelligence, EAAI 2014, February
     doi:10.48550/ARXIV.2401.04088 .                                 20-27, 2024, Vancouver, Canada, AAAI Press, 2024, pp.
[13] F. Xu, W. Shi, E. Choi, RECOMP: improving retrieval-            17754–17762. URL: https://doi.org/10.1609/aaai.v38i16.
     augmented lms with compression and selective aug-               29728. doi:10.1609/AAAI.V38I16.29728 .
     mentation, CoRR abs/2310.04408 (2023). URL: https:         [23] K. Wu, E. Wu, J. Zou, How faithful are rag models?
     //doi.org/10.48550/arXiv.2310.04408. doi:10.48550/              quantifying the tug-of-war between rag and llms’ in-
     ARXIV.2310.04408 .                                              ternal prior, arXiv preprint arXiv:2404.10198 (2024).
[14] Z. Jiang, F. F. Xu, L. Gao, Z. Sun, Q. Liu, J. Dwivedi-    [24] R. C. Atkinson, R. M. Shiffrin, Human memory: A
     Yu, Y. Yang, J. Callan, G. Neubig,           Active re-         proposed system and its control processes, in: K. W.
     trieval augmented generation, in: H. Bouamor,                   Spence, J. T. Spence (Eds.), Psychology of Learning and
     J. Pino, K. Bali (Eds.), Proceedings of the 2023                Motivation, volume 2 of Psychology of Learning and
     Conference on Empirical Methods in Natural Lan-                 Motivation, Elsevier, 1968, pp. 89–195. URL: https://
     guage Processing, EMNLP 2023, Singapore, Decem-                 doi.org/10.1016/s0079-7421(08)60422-3. doi:10.1016/
     ber 6-10, 2023, Association for Computational Lin-              S0079- 7421(08)60422- 3 .
     guistics, 2023, pp. 7969–7992. URL: https://doi.org/       [25] A. Sharma, S. Kumar, Semantic web-based informa-
     10.18653/v1/2023.emnlp-main.495. doi:10.18653/V1/               tion retrieval models: a systematic survey, in: Data
     2023.EMNLP- MAIN.495 .                                          Science and Analytics: 5th International Conference
[15] H. Zamani, W. B. Croft, Embedding-based query lan-              on Recent Developments in Science, Engineering and
     guage models, in: B. Carterette, H. Fang, M. Lalmas,            Technology, REDSET 2019, Gurugram, India, Novem-
     J. Nie (Eds.), Proceedings of the 2016 ACM on Inter-            ber 15–16, 2019, Revised Selected Papers, Part II 5,
     national Conference on the Theory of Information                Springer, 2020, pp. 204–222.
     Retrieval, ICTIR 2016, Newark, DE, USA, September          [26] A. Kacem, Personalized Information Retrieval based
     12- 6, 2016, ACM, 2016, pp. 147–156. URL: https://doi.          on Time-Sensitive User Profile. (Recherche d’Infor-
     org/10.1145/2970398.2970405. doi:10.1145/2970398.               mation Personalisée basée sur un Profil Utilisateur
     2970405 .                                                       Sensible au Temps), Ph.D. thesis, Paul Sabatier Uni-
[16] M. R. Ghorab, D. Zhou, A. O’Connor, V. Wade, Person-            versity, Toulouse, France, 2017. URL: https://tel.
     alised information retrieval: survey and classification,        archives-ouvertes.fr/tel-01707423.
     User Model. User Adapt. Interact. 23 (2013) 381–443.       [27] A. Singh, A. Sharma, A multi-agent framework for
     URL:       https://doi.org/10.1007/s11257-012-9124-1.           context-aware dynamic user profiling for web person-
     doi:10.1007/S11257- 012- 9124- 1 .                              alization, in: Software Engineering: Proceedings of
[17] S. Jeong, J. Baek, S. Cho, S. J. Hwang, J. C. Park,             CSI 2015, Springer, 2019, pp. 1–16.
     Adaptive-rag: Learning to adapt retrieval-augmented        [28] S. Hong, X. Zheng, J. Chen, Y. Cheng, J. Wang,
     large language models through question complex-                 C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou, C. Ran,
     ity, CoRR abs/2403.14403 (2024). URL: https://doi.              L. Xiao, C. Wu, Metagpt: Meta programming for multi-
     org/10.48550/arXiv.2403.14403. doi:10.48550/ARXIV.              agent collaborative framework, CoRR abs/2308.00352
     2403.14403 .                                                    (2023). URL: https://doi.org/10.48550/arXiv.2308.00352.
     doi:10.48550/ARXIV.2308.00352 .                                  of the 2023 Conference on Empirical Methods in Nat-
[29] D. K. Limbu, A. M. Connor, R. Pears, S. G. MacDonell,            ural Language Processing, EMNLP 2023, Singapore,
     Contextual relevance feedback in web information                 December 6-10, 2023, Association for Computational
     retrieval, in: I. Ruthven (Ed.), Proceedings of the 1st          Linguistics, 2023, pp. 1720–1736. URL: https://doi.org/
     International Conference on Information Interaction              10.18653/v1/2023.emnlp-main.107. doi:10.18653/V1/
     in Context, IIiX 2006, Copenhagen, Denmark, October              2023.EMNLP- MAIN.107 .
     18-20, 2006, ACM, 2006, pp. 138–143. URL: https://doi.      [37] H. Trivedi, N. Balasubramanian, T. Khot, A. Sabharwal,
     org/10.1145/1164820.1164848. doi:10.1145/1164820.                Interleaving retrieval with chain-of-thought reason-
     1164848 .                                                        ing for knowledge-intensive multi-step questions, in:
[30] T. Kwiatkowski, J. Palomaki, O. Redfield, M. Collins,            A. Rogers, J. L. Boyd-Graber, N. Okazaki (Eds.), Pro-
     A. P. Parikh, C. Alberti, D. Epstein, I. Polosukhin,             ceedings of the 61st Annual Meeting of the Associa-
     J. Devlin, K. Lee, K. Toutanova, L. Jones, M. Kelcey,            tion for Computational Linguistics (Volume 1: Long
     M. Chang, A. M. Dai, J. Uszkoreit, Q. Le, S. Petrov, Nat-        Papers), ACL 2023, Toronto, Canada, July 9-14, 2023,
     ural questions: a benchmark for question answering               Association for Computational Linguistics, 2023, pp.
     research, Trans. Assoc. Comput. Linguistics 7 (2019)             10014–10037. URL: https://doi.org/10.18653/v1/2023.
     452–466. URL: https://doi.org/10.1162/tacl_a_00276.              acl-long.557. doi:10.18653/V1/2023.ACL- LONG.557 .
     doi:10.1162/TACL\_A\_00276 .                                [38] S. E. Robertson, S. Walker, S. Jones, M. Hancock-
[31] M. Joshi, E. Choi, D. S. Weld, L. Zettlemoyer, Triviaqa:         Beaulieu, M. Gatford, Okapi at TREC-3, in: D. K.
     A large scale distantly supervised challenge dataset             Harman (Ed.), Proceedings of The Third Text RE-
     for reading comprehension, in: R. Barzilay, M. Kan               trieval Conference, TREC 1994, Gaithersburg, Mary-
     (Eds.), Proceedings of the 55th Annual Meeting of                land, USA, November 2-4, 1994, volume 500-225 of
     the Association for Computational Linguistics, ACL               NIST Special Publication, National Institute of Stan-
     2017, Vancouver, Canada, July 30 - August 4, Volume              dards and Technology (NIST), 1994, pp. 109–126. URL:
     1: Long Papers, Association for Computational Lin-               http://trec.nist.gov/pubs/trec3/papers/city.ps.gz.
     guistics, 2017, pp. 1601–1611. URL: https://doi.org/10.     [39] S. Yu, Z. Liu, C. Xiong, Z. Liu, Openmatch-v2: An
     18653/v1/P17-1147. doi:10.18653/V1/P17- 1147 .                   all-in-one multi-modality plm-based information re-
[32] J. Berant, A. Chou, R. Frostig, P. Liang, Semantic pars-         trieval toolkit, in: H. Chen, W. E. Duh, H. Huang,
     ing on freebase from question-answer pairs, in: Pro-             M. P. Kato, J. Mothe, B. Poblete (Eds.), Proceedings of
     ceedings of the 2013 Conference on Empirical Methods             the 46th International ACM SIGIR Conference on Re-
     in Natural Language Processing, EMNLP 2013, 18-21                search and Development in Information Retrieval, SI-
     October 2013, Grand Hyatt Seattle, Seattle, Washing-             GIR 2023, Taipei, Taiwan, July 23-27, 2023, ACM, 2023,
     ton, USA, A meeting of SIGDAT, a Special Interest                pp. 3160–3164. URL: https://doi.org/10.1145/3539618.
     Group of the ACL, ACL, 2013, pp. 1533–1544. URL:                 3591813. doi:10.1145/3539618.3591813 .
     https://aclanthology.org/D13-1160/.                         [40] F. Petroni, A. Piktus, A. Fan, P. S. H. Lewis, M. Yaz-
[33] J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter,             dani, N. D. Cao, J. Thorne, Y. Jernite, V. Karpukhin,
     F. Xia, E. H. Chi, Q. V. Le, D. Zhou, Chain-of-thought           J. Maillard, V. Plachouras, T. Rocktäschel, S. Riedel,
     prompting elicits reasoning in large language                    KILT: a benchmark for knowledge intensive language
     models, in: S. Koyejo, S. Mohamed, A. Agarwal,                   tasks, in: K. Toutanova, A. Rumshisky, L. Zettlemoyer,
     D. Belgrave, K. Cho, A. Oh (Eds.), Advances in                   D. Hakkani-Tür, I. Beltagy, S. Bethard, R. Cotterell,
     Neural Information Processing Systems 35: Annual                 T. Chakraborty, Y. Zhou (Eds.), Proceedings of the
     Conference on Neural Information Processing                      2021 Conference of the North American Chapter of
     Systems 2022, NeurIPS 2022, New Orleans, LA,                     the Association for Computational Linguistics: Hu-
     USA, November 28 - December 9, 2022, 2022. URL:                  man Language Technologies, NAACL-HLT 2021, On-
     http://papers.nips.cc/paper_files/paper/2022/hash/               line, June 6-11, 2021, Association for Computational
     9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.            Linguistics, 2021, pp. 2523–2544. URL: https://doi.org/
     html.                                                            10.18653/v1/2021.naacl-main.200. doi:10.18653/V1/
[34] A. Asai, Z. Wu, Y. Wang, A. Sil, H. Hajishirzi,                  2021.NAACL- MAIN.200 .
     Self-rag: Learning to retrieve, generate, and cri-          s
     tique through self-reflection, CoRR abs/2310.11511
     (2023). URL: https://doi.org/10.48550/arXiv.2310.11511.
     doi:10.48550/ARXIV.2310.11511 .
[35] A. Mallen, A. Asai, V. Zhong, R. Das, D. Khashabi,
     H. Hajishirzi, When not to trust language models:
     Investigating effectiveness of parametric and non-
     parametric memories, in: A. Rogers, J. L. Boyd-Graber,
     N. Okazaki (Eds.), Proceedings of the 61st Annual
     Meeting of the Association for Computational Lin-
     guistics (Volume 1: Long Papers), ACL 2023, Toronto,
     Canada, July 9-14, 2023, Association for Computa-
     tional Linguistics, 2023, pp. 9802–9822. URL: https://
     doi.org/10.18653/v1/2023.acl-long.546. doi:10.18653/
     V1/2023.ACL- LONG.546 .
[36] J. Baek, S. Jeong, M. Kang, J. C. Park, S. J. Hwang,
     Knowledge-augmented language model verification,
     in: H. Bouamor, J. Pino, K. Bali (Eds.), Proceedings