Towards Improving a Student Advisory Service Chatbot Using Knowledge Graphs Daniel Delev1 , Emmie Schiffer1 , Felix Vogl1 , Andreea Iana1 and Heiko Paulheim1,∗ 1 University of Mannheim, Data And Web Science Group, Germany Abstract Student advisory services at universities face a high volume of repetitive inquiries, which can be time-consuming and labor-intensive to address. In this paper, we explore the potential of chatbots to provide personalized support by leveraging university web pages and study regulation documents. Our prototype demonstrates the feasibility of chatbots in identifying relevant information and answering student queries. However, we also identify limitations in handling nuanced cases, particularly cohort-specific regulations. To address these challenges, we propose the integration of knowledge graphs as a potential extension to enhance the dialogue capabilities of the chatbot. Keywords Chatbot, Retrieval Augmented Generation (RAG), Large Language Model, ChatGPT, Knowledge Graph, Dialogue 1. Introduction Throughout their time at the university, students frequently contact the student advisory service with various questions about planning and conducting their studies. Many of those questions can be answered from information found on university web pages and study regulation documents. While there is clear evidence that student advisory is helpful and impacts students’ success [1], the capacity of student advisors is usually limited. At the same time, they spend a substantial amount of time with questions that are trivial enough to be answered directly from publicly available documents, such as university web pages and study regulation documents. In order to decrease the burden of study advisors, chatbots may help answering such simple questions, giving the advisors more time to care about the non-trivial cases. In this paper, we introduce a prototype of a chatbot based on the Large Language Model ChatGPT [2] built with LangChain [3], which can be used to answer standard questions about study programs that are often directed to study advisors. There are two main challenges that need to be faced: • The answers need to be truthful and free from hallucinations [4]. Especially LLMs trained from large amounts of texts from various sources are likely to have ingested many university Web pages, but should only give answers which are in line with the regulations of the specific university where the chatbot is deployed. • The answers need to be tailored to the student at hand. The answer to many questions, e.g., whether a student can register for a particular course, may depend on the study program the student is enrolled in, their academic record so far, and other conditions. The first challenge is addressed by following the Retrieval Augmented Generation paradigm [5], pro- viding the LLM with pre-retrieved fragments from study regulation documents and Web pages about study programs. For the second challenge, the current prototype uses a fixed prompt template asking the student for their background and study program. Here, we argue that knowledge graphs could help improving the interaction, making the answers more concise and, at the same time, facilitating a better dialogue and user experience by limiting the amount of unnecessary questions asked upfront. Workshop on Retrieval-Augmented Generation Enabled by Knowledge Graphs (RAGE-KG), 2024 ∗ Corresponding author. Envelope-Open andreea.iana@uni-mannheim.de (A. Iana); heiko.paulheim@uni-mannheim.de (H. Paulheim) GLOBE https://www.heikopaulheim.com/ (H. Paulheim) Orcid 0000-0002-7248-7503 (A. Iana); 0000-0003-4386-8195 (H. Paulheim) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Preprocessing 1 2 Crawling Text and Chunks Scraping Embedding Metadata Chunk Vectors 4a Retrieval User Interaction Question Embedding 3 Chunk User Filtering 4b User Question Context Chatbot Answer Generation 5 User Answer Figure 1: Schematic depiction of the prototype 2. Prototype The prototype introduced in this paper consists of two preprocessing steps: (1) web scraping, and (2) text chunking and augmentation. At the user interaction stage, the chatbot executes a fixed protocol, (3) collecting initial personal information on the candidate, (4a) retrieving and (4b) filtering relevant document chunks, and (5) generating an answer from those chunks. Figure 1 shows the overall process. The prototype has been developed and tested for study programs of the School of Business Informatics at the University of Mannheim, but can be applied to other study programs as well. 2.1. Data Preparation To collect relevant data, we run a web scraper starting from the School of Business Informatics Web page, and following links to Web pages and PDF files (which are the common format to provide documents such study regulations) up to a depth of 3. The final dataset consists of of 668 HTML files (43 MB) and 983 PDF files (1,234 GB). Not all of the documents are relevant for answering questions in the context of academic advisory (for example, by crawling PDFs from the faculty Web page, academic papers and CVs are also caught, among others). However, manual filtering is infeasible, so we rely on later processing steps and/or automatic filtering to identify the relevant documents for a question at hand. In order to use texts in a RAG setting, they need to be injected in the prompts (see below). Since there are token limits for prompts (4,096 tokens for ChatGPT-3.5 Turbo, which was used for this project), most of the texts are too large to be used directly. Therefore, they are divided into smaller chunks (using a chunk size of 1,000 characters, with an overlap of 200 characters) before further processing. Furthermore, each text chunk is augmented with metadata. The prototype uses two metadata fields, i.e., the study program (one of the study programs taught at the School of Business Informatics, or “general”), and a short summary. Both are generated by feeding the corresponding chunk into ChatGPT and making it determine the study program and a summary in a zero-shot setting. An evaluation on a small sample showed that the metadata are correct in 65% of the cases. For all text chunks, embedding vectors are created using LlamaIndex1 . Those are stored in a vector index so that they can be used for passage retrieval. Note that while the data collection and preparation has been done once for this proof-of-concept prototype, in a productive deployment, it would be re-run periodically in order to always deliver up to date responses. 2.2. User Interactions As shown in Fig 1, ChatGPT is not used directly, but invoked by the chatbot that interfaces with the user. When collecting the question, it asks for context like the study program the user is enrolled in. In parallel, the user’s question is embedded using the same method as for the text chunks, and the text chunks with the closest vectors are retrieved and filtered by the metadata according to the context provided by the user. The final prompt used to provide an answer to the user which is passed to ChatGPT looks as follows: Use t h e f o l l o w i n g p i e c e s o f c o n t e x t t o answer t h e q u e s t i o n a t t h e end . Execute these steps : 1 − a l w a y s answer i n t h e l a n g u a g e t h e q u e s t i o n was g i v e n in 2 − r e a d t h e c o n t e x t , do n o t u s e i n f o r m a t i o n o u t s i d e o f t h e c o n t e x t t o answer t h e q u e s t i o n 3 − i f t h e answer i s n o t p r o v i d e d i n t h e g i v e n c o n t e x t , s a y where more i n f o r m a t i o n can p o s s i b l y be f o u n d 4 − answer t h e q u e s t i o n −−−−−−−−−−−−−−−−−−−−−−−− Context : { context } Q u e s t i o n : I am s t u d y i n g t h e { s t u d y _ p r o g r a m } . { q u e s t i o n } where study_program and context are the study program asked for in the previous dialogue, and the text chunks retrieved, respectively, and question is the question provided by the user. 2.3. Evaluation We have evaluated the proposed approach using a set of 23 questions, both in English and German. Each question was tested with two different study programs as a context, leading to an overall set of 46 questions and gold standard answers. The answers given by the chatbot were manually evaluated against the gold standard. The final prototype yields an overall rate of correct answers of 83%. 1 https://www.llamaindex.ai/ doc54783 relatedTo MSc Data Science extractedFrom extractedFrom CS214 relatedTo chunk34723 chunk43987 contradicts hasPrerequisite CS101 relatedTo chunk51378 extractedFrom Msc Comp. relatedTo Science doc79832 Figure 2: Example for a Knowledge Graph describing the extracted text snippets In a preliminary study, we also evaluated PaperQA [6] as an out of the box end-to-end solution, but achieved less than 50% correct answers. Therefore, the approach was discarded. We also evaluated the document chunk retrieval step in isolation for each of the test questions, considering the precision@5 (i.e., the rate of relevant documents among the top 5 retrieved document chunks). The approach achieves a total rate of 87%, i.e., on average, 4.4 out of the top 5 document chunks are relevant for answering the question at hand. Interestingly, without considering the metadata, the rate drops to 63% (i.e., 3.1 out of the top 5 document chunks). 3. Potential of Using a Knowledge Graph As discussed above, the dialogue currently follows a fixed script. This also means that the same context information is always collected, regardless of whether that information is required or not. However, some questions require no context information (When do the lectures start in the fall semester? ), others may require the study program (How many credits do I need to collect in the fundamentals module? ), others may even require other information on the student’s individual track record (Can I attend the advanced course on software engineering?, e.g., if this course has specific requirements). Organizing the collected text information in a knowledge graph, as shown in Fig. 2, can help identifying those required pieces of context information in an interative process of retrieving document chunks and narrowing down the set of relevant chunks in an interactive dialogue with the user. The information in the knowledge graph may include the metadata discussed above, but also further information on the curriculum [7], like information extracted from a module catalogue (e.g., course prerequisites, as shown in the left part of the figure). Although the rate of relevant document chunks retrieved is rather good, as discussed above, we often observe the retrieval of contradicting chunks, which then leads to wrong or unspecific answers. This may be the case, e.g., for chunks extracted from documents concerning different study programs, in which different regulations are in place. Detecting such contradictions by means of automatic stance detection [8] and explicitly modeling them in the knowledge graph, as shown in the figure, is a good way to both identify those cases, as well as making the chatbot ask specific questions to narrow down the set of retrieved chunks. In the example shown in the figure, retrieving the two contradicting chunks chunk43987 and chunk51378 , the knowledge graph could be traversed to find out that both refer to different study programs, to make the chatbot ask for the user’s study program, and ultimately discard non-fitting document chunks before passing them to the answer generation. Finally, if the knowledge graph becomes deeper and more connected, encompassing more metadata and interlinks inbetween the text chunks, which are represented as nodes in the graph, knowledge graph embeddings [9] can be used to improve the retrieval process. 4. Conclusion In this paper, we have introduced a first prototype for a student advisory chatbot. The chatbot is based on a document collection harvested from the Web, which is preprocessed and enriched using an LLM. The text chunks are then used in the information retrieval block in a retrieval augmented generation (RAG) based chatbot implemented with LangChain and ChatGPT. In the future, it would be interesting to test the approach in a broader setting, covering more study programs and/or schools. While the approach itself is considered scalable, this will also pose challenges with respect to identifying relevant information if the amount of processed contents is larger. Moreover, we have discussed how a knowledge graph can help improving the behavior and output of the chatbot. Especially for identifying which context information is required from the user, a knowledge graph may be beneficial and help extending the system from a chatbot following a static script to an interactive bot asking directed questions based on information modeled in the knowledge graph. This will be even more crucial if the approach is used on a broader scale, as discussed above. References [1] M. Goemans, B. Kapinos, A quantitative study of community college student-advisor appointments and student success metrics, NACADA Journal 44 (2024) 38–54. [2] T. Wu, S. He, J. Liu, S. Sun, K. Liu, Q.-L. Han, Y. Tang, A brief overview of chatgpt: The history, status quo and potential future development, IEEE/CAA Journal of Automatica Sinica 10 (2023) 1122–1136. [3] O. Topsakal, T. C. Akinci, Creating large language model applications utilizing langchain: A primer on developing llm apps fast, in: International Conference on Applied Engineering and Natural Sciences, volume 1, 2023, pp. 1050–1056. [4] V. Rawte, A. Sheth, A. Das, A survey of hallucination in large foundation models, arXiv preprint arXiv:2309.05922 (2023). [5] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, et al., Retrieval-augmented generation for knowledge-intensive nlp tasks, Advances in Neural Information Processing Systems 33 (2020) 9459–9474. [6] J. Lála, O. O’Donoghue, A. Shtedritski, S. Cox, S. G. Rodriques, A. D. White, Paperqa: Retrieval- augmented generative agent for scientific research, arXiv preprint arXiv:2312.07559 (2023). [7] M. Zouri, A. Ferworn, An ontology-based approach for curriculum mapping in higher education, in: 2021 IEEE 11th Annual Computing and communication workshop and conference (CCWC), IEEE, 2021, pp. 0141–0147. [8] D. Küçük, F. Can, Stance detection: A survey, ACM Computing Surveys (CSUR) 53 (2020) 1–37. [9] H. Paulheim, P. Ristoski, J. Portisch, Embedding Knowledge Graphs with RDF2vec, Springer Nature, 2023.