A Framework for Knowledge Integration in Conversational Information Retrieval Praveen Acharya1,∗ , Noriko Kando2 and Gareth J.F. Jones1 1 School of Computing, Dublin City University, Dublin, Ireland 2 National Institute of Informatics, Tokyo, Japan Abstract In both traditional and conversational information retrieval, users can choose to engage in exploratory searches with varying levels of knowledge about the subject of their information needs. They formulate search queries to express these needs, which are then used by retrieval systems to find relevant information. Users with substantial prior domain knowledge about the topic of the information need can create sufficiently rich queries which include appropriate domain-specific vocabulary, leading to retrieval of relevant search results, while those with limited domain knowledge struggle to formulate effective queries. The latter must refine their queries over multiple search passes as they learn more. This iterative search process imposes a high cognitive load and limits the effectiveness of traditional search systems. Conversely, conversational information retrieval (CIR) offers a multi-turn, iterative process where the user and the system can work collaboratively to help the user satisfy their information needs with reduced cognitive effort. With each interaction, information is progressively accumulated aiding users in better understanding the topic and improving their knowledge. By representing the user’s knowledge and its continuous refinement, a CIR system can better comprehend and respond to the information need and support the users in satisfying their information needs, resulting in more effective search outcomes. However, existing CIR systems lack a framework for representing the user’s knowledge during the current search dialogue. Leveraging a user’s prior knowledge and information gathered during each interaction can potentially enhance CIR system performance by guiding subsequent system actions. To address this, we propose a framework for capturing and utilizing knowledge in CIR. This framework aims to improve the performance and adaptability of conversational search systems, making them more effective and responsive to users’ evolving information needs. Keywords Knowledge Integration, Conversational Search, User Knowledge, Framework for CIR 1. Introduction When using a search system with informational intent [1] to acquire information about a topic, users exhibit varying levels of familiarity with the topic. This knowledge disparity influences how different users formulate their search queries, resulting in different degrees of precision and search effectiveness. For example, an expert (knowledgeable) user can specify their information need with sufficient detail (well-defined query, including correct use of domain-specific vocabulary) for the search system to retrieve relevant documents. In contrast, a non-expert (ill-informed) user will have difficulty specifying their information need, leading to under-specified (vague) queries and poorly retrieved documents. The range of query specificity, influenced by the user’s knowledge, is illustrated in Figure 1. Users with limited knowledge about the topic often struggle to accurately articulate their information needs, a challenge referred to as the non-specifiability of need problem [2]. Consequently, their queries might not precisely convey their requirements, making it difficult to retrieve relevant documents and often resulting in unsatisfactory search results. Therefore, a user’s knowledge of the search topic greatly influences the search process, and a search system that can adapt based on this knowledge is highly desirable. In exploratory information search scenarios, fulfilling a user’s information need is typically a multi- turn and iterative process. Unlike straightforward searches, exploratory searches involve dynamic UM-CIR 2024: The 1st Workshop on User Modelling in Conversational Information Retrieval, December 12, 2024, Tokyo, Japan ∗ Corresponding author. Envelope-Open praveen.acharya2@mail.dcu.ie (P. Acharya); noriko.kando@nii.ac.jp (N. Kando); gareth.jones@dcu.ie (G. J.F. Jones) Orcid 0000-0001-5181-9831 (P. Acharya); 0000-0002-2133-0215 (N. Kando); 0000-0003-2923-8365 (G. J.F. Jones) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Figure 1: Query Specificity and User Knowledge range. interactions where each query-response cycle provides valuable information that incrementally builds the user’s understanding of the topic. This iterative process allows users to refine their queries based on the feedback received from previous searches. As users gather more information, their knowledge about the topic deepens, enabling them to formulate more precise queries. This ongoing cycle of query refinement and knowledge acquisition is often crucial to the user in satisfying complex information needs. Therefore, a search system designed specifically to support exploratory information retrieval should facilitate this iterative process, offering mechanisms to guide users through successive stages of query formulation and refinement. Ultimately, such a system would help users navigate through vast information spaces, progressively honing in on the information they seek. Traditional information retrieval (IR) systems lack the capability to engage interactively with users. These systems are entirely user-driven, offering no interaction beyond basic relevance feedback methods. This places a high cognitive load on users, who must formulate progressive queries, examine retrieved results, and determine whether their information needs are satisfied. If not, they must repeat the process, which can be cumbersome and inefficient. Conversational Information Retrieval (CIR) addresses this limitation by incorporating an interactive and collaborative process between the system and the user. CIR systems reduce the user’s cognitive load by adapting to user preferences and providing more personalized assistance. This interaction is often modelled as a user model that the system can exploit to better understand and meet the user’s information needs as illustrated in Figure 2. An important component of a user model for use in a search system is the representation of their Knowledge, potentially alongside that of other users with similar interests. As discussed earlier, knowl- edge plays a crucial role in the search process, enabling systems to be better able to satisfy a user’s information needs. The significance of the role of knowledge exploitation in the search process is height- ened in CIR systems because knowledge can aid in understanding users’ needs and facilitating effective dialogue with the user and tasks such as query expansion, clarification, and information elicitation. Radlinski and Craswell [3] delved into conversational approaches to information retrieval and identified adaptability in conversation as a key characteristic for effective CIR systems. This adaptability allows search systems to dynamically adjust conversational dialogue based on users’ existing knowledge and newly acquired information, continuously refining the process until users’ information needs are met. We argue that since Knowledge plays such a critical role in effective CIR, the user’s domain knowl- edge—both prior to and developed during a search conversation should be modelled to support search systems in better understanding the user’s information need and guiding the dialogue’s progression [4]. Consequently, we seek to answer the following questions related to the role of knowledge in conversa- tional information retrieval: 1. RQ1: How can knowledge be captured and utilized in conversational information retrieval? 2. RQ2: Does knowledge integration improve the effectiveness of a conversational search system? To address these questions, we propose a framework for knowledge integration in conversational information retrieval. In this paper, we discuss the problem and several associated research challenges in implementing this framework. Figure 2: Knowledge Integration in Conversational Information Retrieval (CIR). 2. Background and Related Works Modern information retrieval systems are utilized for a myriad of tasks, each contingent upon the user’s specific objectives. Users approach these systems with varying intents. These intents are commonly classified into three categories: informational, navigational, or transactional [1]. Among these intents, a notable focus has emerged on leveraging retrieval systems for learning purposes, a concept termed Searching-as-learning (SAL) [5, 6, 7]. This paradigm reflects a shift towards utilizing retrieval systems not solely for accessing information but also as tools for learning. In the context of SAL, users engage with retrieval systems with the explicit aim of acquiring knowledge about a specific topic. Through iterative interactions with the retrieved documents, users progressively accumulate information to satisfy their learning goals. This process entails not only accessing relevant documents but also assimilating and comprehending the information contained within them. A study focusing on the differences in searching patterns between domain experts and non-experts regarding a shared subject is reported in White et al. [8]. Their findings highlight distinctions in query formulation strategies across various levels of expertise, emphasizing the importance of understanding how different users approach search tasks. Furthermore, the level of domain knowledge influences how the user formulates their queries [9, 10, 11], with non-experts using more keywords than experts and experts producing more new keywords than non-experts [12, 13]. Moreover, Zhang et al. [14] found that insights from data collected during the search process can provide valuable indications of a user’s domain knowledge suggesting that analyzing user interactions with retrieval systems can yield valuable information about their knowledge. A study by Hagen et al. [15] showed that users can learn query terms while engaging in searching and reading activities. This suggests that during the search process, users gradually refine their query formulation and enhance their understanding of the topic through iterative interactions with search results highlighting the importance of incorporating knowledge mechanisms into the retrieval systems. A search system can assist users in achieving their learning objectives more quickly by estimating how much they learn. For example, it can do this by retrieving documents that match not only their specific query but also their existing knowledge about the topic. Several frameworks have been developed to integrate user knowledge into retrieval systems [16, 17, 18]. Câmara et al. [19] introduced a framework focused on representing user knowledge during search sessions. This framework estimates a user’s knowledge about a specific topic by maintaining an internal representation that continuously updates throughout the session. They achieve this by employing a combination of keyword-based methods and Large Language Models (LLMs) based methods. Their Figure 3: Proposed Framework Architecture study demonstrates that this internal representation effectively correlates with users’ actual knowledge levels. Expanding on this work, the framework was extended to incorporate named entities, leveraging the relationships between these entities to better represent and measure user knowledge during a search session [20]. This enhancement suggested that utilizing named entities complements earlier approaches, offering a more nuanced estimate of users’ knowledge. Additionally, Nasser et al. [21] utilized knowledge graphs to represent both the user’s knowledge and the knowledge goal, demonstrating that the graph- based approach can capture complementary aspects of knowledge. Despite the importance of the role of knowledge in the search process highlighted in the aforemen- tioned work, the formal utilization of knowledge in CIR remains surprisingly unexplored. Leveraging a user’s prior knowledge and their knowledge of the search topic accumulated during a conversational dialogue would appear to have the potential to significantly enhance CIR system performance by guiding subsequent actions resulting in more efficient and effective search outcomes. The remainder of this paper discusses a proposed framework for the exploitation of knowledge in CIR with a focus on the theoretical aspects of the proposal and reviews some relevant operational approaches used in previous studies, leaving the full implementation details of the framework for future research. 3. Framework Our proposed framework for integrating knowledge into CIR is built around three primary components. In the following sections, we discuss each component in detail, explaining their roles and functionalities within the framework. 3.1. Components 3.1.1. Knowledge Extractor (KE) The knowledge extractor is responsible for identifying and extracting pertinent information from various sources, including queries, documents, and user knowledge history. It analyzes these sources to extract what is deemed relevant knowledge e.g. topical knowledge, task knowledge etc... 𝐾source = 𝐾 𝐸(Source) (1) where source refers to any of the following: Query (Q), Document (D), or User Knowledge (𝑈𝑘 ). The Knowledge Extractor plays a crucial role in various parts of the proposed framework. The user’s interaction with a CIR system begins with the issuance of a query. The KE extracts knowledge from the user query (𝐾𝑞𝑢𝑒𝑟𝑦 ) as well as from the existing user knowledge history stored in a user model (𝐾𝑈𝑘 ). This initial extraction of knowledge constitutes the initial current knowledge state (𝑐𝑘𝑠𝑖𝑛𝑖𝑡𝑖𝑎𝑙 ) in the CIR process. 𝑐𝑘𝑠initial = {𝐾query , 𝐾𝑈𝑘 } (2) Subsequently, the KE is used again after the CIR system retrieves relevant documents. The decision of which documents to extract knowledge from depends on whether the user has interacted with these documents. 𝐾Document = 𝐾 𝐸(Documents) (3) 3.1.2. Current Knowledge State (cks) The Current Knowledge State reflects the current state of knowledge in the conversation at any given moment. It primarily interacts with the KU , which is responsible for incorporating new information and is updated at each step of the conversation. As the dialogue progresses, the cks is continuously updated to accurately represent the user’s information needs and the context of the conversation. This ensures that the conversational system can take various actions based on strategies tailored to the user’s current knowledge state, facilitating a more natural and fluid progression of the dialogue. Consequently, the system can effectively adapt to the user’s evolving needs throughout the interaction. 3.1.3. Knowledge Updater (KU) The Knowledge Updater, on the other hand, is responsible for seamlessly integrating the extracted knowledge into the existing knowledge state (𝑐𝑘𝑠). This component ensures that the system is updated with new knowledge at each turn in the conversation. By continuously incorporating new information, the KU keeps the knowledge state current and comprehensive, enabling the CIR system to maintain an accurate and up-to-date understanding of the user’s needs and the context of the conversation. This continuous updating process is crucial for the system’s ability to provide relevant and accurate responses and document rankings throughout the interaction. 𝑐𝑘𝑠current = 𝐾 𝑈 (𝑐𝑘𝑠prev , 𝐾Document ) (4) where, 𝑐𝑘𝑠current is the updated knowledge state (𝑐𝑘𝑠) after the user’s interaction with documents. The document is represented by 𝐾Document , generated by Equation 3, and 𝐾 𝑈 is a function that takes 𝑐𝑘𝑠prev and 𝐾Document as inputs, combining them into an updated representation of the knowledge state. These components collaborate to integrate and update the relevant knowledge represented by the current knowledge state (cks) in the conversation. The KE interacts with the system by extracting knowledge from retrieved documents at each interaction in the conversation. This extracted knowledge is subsequently used by the KU to modify and update the cks. The updated knowledge state helps the conversational system understand the user’s current needs and based on this understanding employ various strategies to take appropriate actions. Additionally, the current state of knowledge in the conversation cks can potentially be leveraged and used to rank relevant documents according to the user’s current knowledge at each step. Figure 3 illustrates how the Knowledge Extractor and Knowledge Updater interact within the CIR system. One significant challenge in dealing with knowledge is how to effectively represent it. Various approaches have been suggested for this, including extracting keywords [22, 17], using concept maps [23], identifying named entities [20], creating knowledge graphs [21] and using LLMs[19]. The choice of representation method has significant implications for how the knowledge state is modelled during the search process. It also affects operational aspects linked to the KU component, influencing the efficiency and effectiveness of knowledge retrieval and application. Thus, selecting the appropriate representation strategy is crucial for optimizing knowledge management and utilization. 4. Concluding Remarks This paper presents a framework for integrating knowledge into Conversational Information Retrieval (CIR) systems, addressing the challenge of making knowledge a central component of information retrieval. By leveraging both user knowledge and knowledge accumulated during conversations, the framework aims to improve CIR systems’ ability to understand user’s information needs, engage in contextually relevant dialogue, and provide more accurate responses. The framework consists of three main components: the Knowledge Extractor (KE), Current Knowledge State (cks) and the Knowledge Updater (KU ). The KE extracts relevant information from user queries, documents, and prior knowledge, while the KU integrates this knowledge into the current knowledge state, ensuring the system remains up-to-date and comprehensive throughout the interaction represented by the cks. Future work will focus on implementing and refining these components, optimizing knowledge integration techniques, and conducting real-world evaluations. Advancing these areas will enhance CIR systems’ ability to manage and utilize knowledge more effectively to meet users’ evolving information needs. Acknowledgments This work was conducted with the financial support of the Science Foundation Ireland Centre for Research Training in Artificial Intelligence under Grant No. 18/CRT/6223. References [1] A. Broder, A taxonomy of web search, in: ACM Sigir forum, volume 36, ACM New York, NY, USA, 2002, pp. 3–10. [2] N. J. Belkin, Anomalous states of knowledge as a basis for information retrieval, Canadian journal of information science 5 (1980) 133–143. [3] F. Radlinski, N. Craswell, A theoretical framework for conversational search, CHIIR ’17, Association for Computing Machinery, New York, NY, USA, 2017, p. 117–126. URL: https://doi.org/10.1145/ 3020165.3020183. doi:10.1145/3020165.3020183 . [4] P. Acharya, Towards effective modeling and exploitation of search and user context in con- versational information retrieval, in: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM ’23, Association for Computing Machin- ery, New York, NY, USA, 2023, p. 5161–5164. URL: https://doi.org/10.1145/3583780.3616005. doi:10.1145/3583780.3616005 . [5] K. Collins-Thompson, P. Hansen, C. Hauff, Search as learning (dagstuhl seminar 17092) (2017). [6] S. Y. Rieh, K. Collins-Thompson, P. Hansen, H.-J. Lee, Towards searching as a learning process: A review of current perspectives and future directions, Journal of Information Science 42 (2016) 19–34. [7] J. Gwizdka, P. Hansen, C. Hauff, J. He, N. Kando, Search as learning (sal) workshop 2016, in: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’16, Association for Computing Machinery, New York, NY, USA, 2016, p. 1249–1250. URL: https://doi.org/10.1145/2911451.2917766. doi:10.1145/2911451.2917766 . [8] R. W. White, S. T. Dumais, J. Teevan, Characterizing the influence of domain expertise on web search behavior, in: Proceedings of the second ACM international conference on web search and data mining, 2009, pp. 132–141. [9] H. L. O’Brien, A. Kampen, A. W. Cole, K. Brennan, The role of domain knowledge in search as learning, in: Proceedings of the 2020 conference on human information interaction and retrieval, 2020, pp. 313–317. [10] T. Willoughby, S. A. Anderson, E. Wood, J. Mueller, C. Ross, Fast searching for information on the internet to use in a learning context: The impact of domain knowledge, Computers & Education 52 (2009) 640–648. [11] C. Dosso, L. Tamine, P.-V. Paubel, A. Chevalier, Navigational and thematic exploration–exploitation trade-offs during web search: effects of prior domain knowledge, search contexts and strategies on search outcome, Behaviour & Information Technology 43 (2024) 2232–2258. [12] C. Dosso, L. Tamine, P.-V. Paubel, A. Chevalier, The impact of expertise on query formulation strategies during complex learning task solving: a study with students in medicine and computer science, in: Proceedings of the 21st Congress of the International Ergonomics Association (IEA 2021) Volume V: Methods & Approaches 21, Springer, 2022, pp. 621–627. [13] M. Sanchiz, A. Chevalier, F. Amadieu, How do older and young adults start searching for in- formation? impact of age, domain knowledge and problem complexity on the different steps of information searching, Computers in Human Behavior 72 (2017) 67–78. [14] X. Zhang, M. Cole, N. Belkin, Predicting users’ domain knowledge from search behaviors, in: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval, 2011, pp. 1225–1226. [15] M. Hagen, M. Potthast, M. Völske, J. Gomoll, B. Stein, How writers search: Analyzing the search and writing logs of non-fictional essays, in: Proceedings of the 2016 ACM on conference on human information interaction and retrieval, 2016, pp. 193–202. [16] A. Hoppe, P. Holtz, Y. Kammerer, R. Yu, S. Dietze, R. Ewerth, Current challenges for studying search as learning processes, in: 7th Workshop on Learning & Education with Web Data (LILE2018), in conjunction with ACM Web Science, 2018. [17] R. Syed, K. Collins-Thompson, Optimizing search results for human learning goals, Information Retrieval Journal 20 (2017) 506–523. [18] R. Syed, K. Collins-Thompson, Retrieval algorithms optimized for human learning, in: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, 2017, pp. 555–564. [19] A. Câmara, D. El-Zein, C. da Costa-Pereira, Rulk: A framework for representing user knowledge in search-as-learning (2022). [20] D. El Zein, A. Câmara, C. Da Costa Pereira, A. Tettamanzi, Rulkne: Representing user knowledge state in search-as-learning with named entities, in: Proceedings of the 2023 Conference on Human Information Interaction and Retrieval, 2023, pp. 388–393. [21] H. Nasser, D. El Zein, C. da Costa Pereira, C. Escazut, A. Tettamanzi, Rulkkg: Estimating user’s knowledge gain in search-as-learning using knowledge graphs, in: Proceedings of the 2024 Conference on Human Information Interaction and Retrieval, 2024, pp. 364–369. [22] D. El Zein, C. da Costa Pereira, A cognitive agent framework in information retrieval: Using user beliefs to customize results, in: PRIMA 2020: Principles and Practice of Multi-Agent Systems: 23rd International Conference, Nagoya, Japan, November 18–20, 2020, Proceedings 23, Springer, 2021, pp. 325–333. [23] J. D. Novak, Learning, creating, and using knowledge: Concept maps as facilitative tools in schools and corporations, Routledge, 2010.