1. Introduction

1613-0073

Towards Improving a Student Advisory Service Chatbot Using Knowledge Graphs

Daniel Delev

0 1

Emmie Schifer

0 1

Felix Vogl

0 1

Andreea Iana

andreea.iana@uni-mannheim.de 0 1

Heiko Paulheim

heiko.paulheim@uni-mannheim.de 0 1

Workshop

0 0 Chatbot, Retrieval Augmented Generation (RAG) , Large Language Model, ChatGPT, Knowledge Graph, Dialogue 1 University of Mannheim, Data And Web Science Group , Germany

Student advisory services at universities face a high volume of repetitive inquiries, which can be time-consuming and labor-intensive to address. In this paper, we explore the potential of chatbots to provide personalized support by leveraging university web pages and study regulation documents. Our prototype demonstrates the feasibility of chatbots in identifying relevant information and answering student queries. However, we also identify limitations in handling nuanced cases, particularly cohort-specific regulations. To address these challenges, we propose the integration of knowledge graphs as a potential extension to enhance the dialogue capabilities of the chatbot.

1. Introduction

https://www.heikopaulheim.com/ (H. Paulheim) CEUR

ceur-ws.org 1 Crawling

and Scraping

Text

Chunks

4a RCehtruienvkal

3 User Interaction Question

Embedding

User Question

User

Context Chatbot

FCiltheurninkg 4b

Answer Generation 5 User Answer

2. Prototype

The prototype introduced in this paper consists of two preprocessing steps: (1) web scraping, and (2) text chunking and augmentation. At the user interaction stage, the chatbot executes a fixed protocol, (3) collecting initial personal information on the candidate, (4a) retrieving and (4b) filtering relevant document chunks, and (5) generating an answer from those chunks. Figure 1 shows the overall process.

The prototype has been developed and tested for study programs of the School of Business Informatics at the University of Mannheim, but can be applied to other study programs as well.

2.1. Data Preparation

To collect relevant data, we run a web scraper starting from the School of Business Informatics Web page, and following links to Web pages and PDF files (which are the common format to provide documents such study regulations) up to a depth of 3. The final dataset consists of of 668 HTML files (43 MB) and 983 PDF files (1,234 GB).

Not all of the documents are relevant for answering questions in the context of academic advisory (for example, by crawling PDFs from the faculty Web page, academic papers and CVs are also caught, among others). However, manual filtering is infeasible, so we rely on later processing steps and/or automatic filtering to identify the relevant documents for a question at hand.

In order to use texts in a RAG setting, they need to be injected in the prompts (see below). Since there are token limits for prompts (4,096 tokens for ChatGPT-3.5 Turbo, which was used for this project), most of the texts are too large to be used directly. Therefore, they are divided into smaller chunks (using a chunk size of 1,000 characters, with an overlap of 200 characters) before further processing.

Furthermore, each text chunk is augmented with metadata. The prototype uses two metadata fields, i.e., the study program (one of the study programs taught at the School of Business Informatics, or “general”), and a short summary. Both are generated by feeding the corresponding chunk into ChatGPT and making it determine the study program and a summary in a zero-shot setting. An evaluation on a small sample showed that the metadata are correct in 65% of the cases.

For all text chunks, embedding vectors are created using LlamaIndex1. Those are stored in a vector index so that they can be used for passage retrieval.

Note that while the data collection and preparation has been done once for this proof-of-concept prototype, in a productive deployment, it would be re-run periodically in order to always deliver up to date responses.

2.2. User Interactions

As shown in Fig 1, ChatGPT is not used directly, but invoked by the chatbot that interfaces with the user. When collecting the question, it asks for context like the study program the user is enrolled in. In parallel, the user’s question is embedded using the same method as for the text chunks, and the text chunks with the closest vectors are retrieved and filtered by the metadata according to the context provided by the user.

The final prompt used to provide an answer to the user which is passed to ChatGPT looks as follows: Use t h e f o l l o w i n g p i e c e s o f c o n t e x t t o a n s w e r t h e q u e s t i o n a t t h e end .

E x e c u t e t h e s e s t e p s : 1 − a l w a y s a n s w e r i n t h e l a n g u a g e t h e q u e s t i o n was g i v e n i n 2 − r e a d t h e c o n t e x t , do n o t u s e i n f o r m a t i o n o u t s i d e o f t h e c o n t e x t t o a n s w e r t h e q u e s t i o n 3 − i f t h e a n s w e r i s n o t p r o v i d e d i n t h e g i v e n c o n t e x t , s a y where more i n f o r m a t i o n c a n p o s s i b l y be f o u n d 4 − a n s w e r t h e q u e s t i o n −−−−−−−−−−−−−−−−−−−−−−−− C o n t e x t : { c o n t e x t } Q u e s t i o n : I am s t u d y i n g t h e { s t u d y _ p r o g r a m } . { q u e s t i o n } where study_program and context are the study program asked for in the previous dialogue, and the text chunks retrieved, respectively, and question is the question provided by the user.

2.3. Evaluation

We have evaluated the proposed approach using a set of 23 questions, both in English and German. Each question was tested with two diferent study programs as a context, leading to an overall set of 46 questions and gold standard answers. The answers given by the chatbot were manually evaluated against the gold standard. The final prototype yields an overall rate of correct answers of 83%.

CS214

relatedTo hasPrerequisite chunk34723

CS101 doc54783 relatedTo

MSc Data

Science extractedFrom extractedFrom chunk43987 contradicts relatedTo chunk51378 extractedFrom doc79832 relatedTo

Msc Comp.

Science

In a preliminary study, we also evaluated PaperQA [ 6 ] as an out of the box end-to-end solution, but achieved less than 50% correct answers. Therefore, the approach was discarded.

We also evaluated the document chunk retrieval step in isolation for each of the test questions, considering the precision@5 (i.e., the rate of relevant documents among the top 5 retrieved document chunks). The approach achieves a total rate of 87%, i.e., on average, 4.4 out of the top 5 document chunks are relevant for answering the question at hand. Interestingly, without considering the metadata, the rate drops to 63% (i.e., 3.1 out of the top 5 document chunks). 3. Potential of Using a Knowledge Graph As discussed above, the dialogue currently follows a fixed script. This also means that the same context information is always collected, regardless of whether that information is required or not. However, some questions require no context information (When do the lectures start in the fall semester? ), others may require the study program (How many credits do I need to collect in the fundamentals module? ), others may even require other information on the student’s individual track record (Can I attend the advanced course on software engineering? , e.g., if this course has specific requirements).

Organizing the collected text information in a knowledge graph, as shown in Fig. 2, can help identifying those required pieces of context information in an interative process of retrieving document chunks and narrowing down the set of relevant chunks in an interactive dialogue with the user. The information in the knowledge graph may include the metadata discussed above, but also further information on the curriculum [ 7 ], like information extracted from a module catalogue (e.g., course prerequisites, as shown in the left part of the figure).

Although the rate of relevant document chunks retrieved is rather good, as discussed above, we often observe the retrieval of contradicting chunks, which then leads to wrong or unspecific answers. This may be the case, e.g., for chunks extracted from documents concerning diferent study programs, in which diferent regulations are in place. Detecting such contradictions by means of automatic stance detection [ 8 ] and explicitly modeling them in the knowledge graph, as shown in the figure, is a good way to both identify those cases, as well as making the chatbot ask specific questions to narrow down the set of retrieved chunks. In the example shown in the figure, retrieving the two contradicting chunks chunk43987 and chunk51378, the knowledge graph could be traversed to find out that both refer to diferent study programs, to make the chatbot ask for the user’s study program, and ultimately discard non-fitting document chunks before passing them to the answer generation.

Finally, if the knowledge graph becomes deeper and more connected, encompassing more metadata and interlinks inbetween the text chunks, which are represented as nodes in the graph, knowledge graph embeddings [ 9 ] can be used to improve the retrieval process.

4. Conclusion

In this paper, we have introduced a first prototype for a student advisory chatbot. The chatbot is based on a document collection harvested from the Web, which is preprocessed and enriched using an LLM. The text chunks are then used in the information retrieval block in a retrieval augmented generation (RAG) based chatbot implemented with LangChain and ChatGPT.

In the future, it would be interesting to test the approach in a broader setting, covering more study programs and/or schools. While the approach itself is considered scalable, this will also pose challenges with respect to identifying relevant information if the amount of processed contents is larger.

Moreover, we have discussed how a knowledge graph can help improving the behavior and output of the chatbot. Especially for identifying which context information is required from the user, a knowledge graph may be beneficial and help extending the system from a chatbot following a static script to an interactive bot asking directed questions based on information modeled in the knowledge graph. This will be even more crucial if the approach is used on a broader scale, as discussed above.

[1]

Goemans ,

Kapinos , A quantitative study of community college student-advisor appointments and student success metrics , NACADA Journal 44 ( 2024 ) 38 - 54 .

[2]

Wu ,

He ,

Liu ,

Sun ,

Liu , Q.-L. Han,

Tang , A brief overview of chatgpt: The history, status quo and potential future development , IEEE/CAA Journal of Automatica Sinica 10 ( 2023 ) 1122 - 1136 .

[3]

Topsakal ,

T. C.

Akinci , Creating large language model applications utilizing langchain: A primer on developing llm apps fast , in: International Conference on Applied Engineering and Natural Sciences , volume 1 , 2023 , pp. 1050 - 1056 .

[4]

Rawte ,

Sheth , A. Das , A survey of hallucination in large foundation models , arXiv preprint arXiv:2309.05922 ( 2023 ).

[5]

Lewis ,

Perez ,

Piktus ,

Petroni ,

Karpukhin ,

Goyal ,

Küttler ,

Lewis , W.-t. Yih,

Rocktäschel , et al., Retrieval-augmented generation for knowledge-intensive nlp tasks , Advances in Neural Information Processing Systems 33 ( 2020 ) 9459 - 9474 .

[6]

Lála ,

O. O

'Donoghue ,

Shtedritski ,

Cox ,

S. G.

Rodriques ,

A. D.

White , Paperqa: Retrievalaugmented generative agent for scientific research , arXiv preprint arXiv:2312.07559 ( 2023 ).

[7]

Zouri ,

Ferworn , An ontology-based approach for curriculum mapping in higher education , in: 2021 IEEE 11th Annual Computing and communication workshop and conference (CCWC) , IEEE, 2021 , pp. 0141 - 0147 .

[8]

Küçük ,

Can , Stance detection: A survey, ACM Computing Surveys (CSUR) 53 ( 2020 ) 1 - 37 .

[9]

Paulheim ,

Ristoski ,

Portisch , Embedding Knowledge Graphs with RDF2vec , Springer Nature, 2023 .