1. Introduction

Uni-Mate: A Retrieval-Augmented Generation System to Provide High School Students with Accurate Academic Guidance

Samuele Mazzei

Lorenzo Zambotto

Gabriele Tealdo

Alberto Macagno

Alessio Palmero Aprosio

0 0 Department of Psychology and Cognitive Science, University of Trento , Corso Bettini 84, Rovereto , Italy

2025

0 4

This paper introduces the development and evaluation of a Retrieval-Augmented Generation (RAG) system designed to assist prospective students in navigating university options. The system provides accurate academic guidance by retrieving and synthesizing information on undergraduate and single-cycle master's degree programs, as well as library resources, from the University of Trento and the University of Verona. The RAG pipeline utilizes a streamlined toolchain, incorporating a Markdown parser for eficient data handling and the Llama3-8b-8192 Large Language Model (LLM) for query processing. The system's performance was assessed through both automated evaluation, using the Llama3-70b LLM as a reference, and blinded human evaluation. The results demonstrate the system's potential for providing relevant and accurate information to students. The evaluation also highlighted areas for further development, including enhanced retrieval mechanisms and expanded LLM testing. Future work aims to broaden the system's scope to include more degree levels and universities, ultimately creating a comprehensive platform to support students in their academic decision-making journey.

eol>Retrieval-Augmented Generation (RAG) Natural Language Processing (NLP) Large Language Models (LLMs) Dataset Creation Academic Guidance

1. Introduction A survey, conducted among 183 students from the Department of Psychology and Cognitive Science and

Choosing a university path is one of the most complex the School of Innovation between October and Novemand significant decisions for students nearing the end ber 2024, was instrumental in identifying a significant of high school. This, combined with the overwhelming need among students for improved online educational amount of new information encountered when brows- guidance and revealed significant challenges faced by ing various and often inconsistent university websites, students in choosing their academic paths. A striking creates confusion and a sense of being lost, leading to 74% reported at least one major dificulty in the orientawasted time and uncertainty. These challenges stem from tion process. The most common issues included a lack of both the dispersion of available information and the lack clear and comparable information across courses and inof intuitive tools to guide students through the decision- stitutions (43%), uncertainty regarding personal interests making process. and aptitudes (38%), and confusion about the diferences

We deal with this problem by creating a platform called among European universities (29%). Additionally, limUni-Mate (formerly referred to as MyVision and later re- ited access to insights from alumni was also noted (17%). named to better align with startup branding goals, of- When seeking guidance, students primarily relied on offering a more appealing name for potential users and ifcial university websites (65%) and personal networks investors). The system aims to integrate an AI-powered such as parents or friends (58%), while only 21% consulted chatbot that provides relevant information about part- academic counselors. Moreover, fewer than 10% found ner universities and online counseling services within a digital comparison tools to be truly efective. single interface. The data also highlights a strong interest in innovative orientation tools. Notably, 81% of respondents expressed CLiC-it 2025: Eleventh Italian Conference on Computational Linguis- a willingness to use a platform like Uni-Mate, which tics, September 24 — 26, 2025, Cagliari, Italy would feature personalized course matching algorithms * Corresponding author. and structured reviews from former students. Further† These authors contributed equally. more, 67% indicated a readiness to pay for such a service l$oresnamzou.zealem. mboatztzoe@i@stsutduednentit.ui.nunitintn.it.it(L(.S.ZMamazbzoetit)o; ); if it proved to be efective. These results point to a clear gabriele.tealdo@studenti.unitn.it (G. Tealdo); gap in the current academic orientation oferings, which alberto.macagno@studenti.unitn.it (A. Macagno); are seen as fragmented, non-interactive, and lacking pera.palmeroaprosio@unitn.it (A. Palmero Aprosio) sonalization. There is a strong latent demand for com© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License

Attribution 4.0 International (CC BY 4.0). prehensive digital solutions that provide personalized eas. The system was developed with a specific focus on guidance, real-life experiences, and comparative tools to post-diploma university orientation, considering only support students in making well-informed educational bachelor’s degree programs and single-cycle master’s decisions. degrees. This approach addresses the needs of recent

Following this initial validation, the team submitted high school graduates by providing an innovative tool MyVision as a proposal to DigiEduHack,1 a European to explore available academic options in a simple and innovation challenge promoted by the European Com- immediate way. Furthermore, we included information mission and aimed at fostering technological advance- about the universities’ libraries to provide new students ments in the field of education. During the event, the with access to a valuable resource that can support their concept was further developed and ultimately awarded studies. ifrst place in the Expert Category, the most competitive and high-level track of the competition.2

As a result, the team has been invited to present MyVi- 2. Related work sion at the DigiEduHack Award Ceremony, scheduled for June 24th in Brussels, as part of the Digital Education Numerous research groups and institutions have exStakeholder Forum 2025 —- a major annual event orga- plored various strategies to support students in selecting nized by the European Commission to promote dialogue the most suitable university. and policy development in digital education. For example, in a study [ 1 ], the researchers evaluated

Notable precedents attempts to address these chal- an educational app called GC Mobile and concluded that lenges exist in the academic guidance space. In the United it enhanced the counseling process by leveraging techStates, ScholarMatch3 focuses on helping first-generation, nology to provide a scalable, accessible, and confidential low-income students secure scholarships and complete platform for student guidance. As the authors noted, their college education, addressing critical financial and “The GC Mobile App allows students to see a counselor support gaps. In contrast, a comparable all-in-one solu- anytime and from any location without having to visit tion is lacking in Europe, where the challenges for stu- them in the ofice.” dents are less about tuition afordability and more about Another study [ 2 ] developed an AI-powered academic navigating a fragmented ecosystem of academic options. guidance and counseling system with the primary obIn the UK, Bonas MacFarlane4 ofers premium consulting jective of supporting high school seniors in navigating for school and university placements, primarily target- the college application process and selecting suitable acaing afluent families. These examples highlight both the demic paths and universities for tertiary education. It proven demand for personalized academic support and also aimed to address the shortage of human resources in the gap that MyVision seeks to fill in the EU context—by traditional counseling by providing an accessible, conveofering accessible, digital tools for orientation, compari- nient, and time-saving alternative for students to obtain son, reviews, and guidance all in one unified platform. valuable insights without requiring face-to-face interac

In Italy, UniversItaly5 is the oficial portal developed by tion or travel to gather university information. the Italian Ministry of University and Research to support Another approach was the creation of UniCompass students, both Italian and international, in navigating [ 3 ], a platform designed to help students eficiently learn the higher education system in Italy. The web portal about and compare universities and departments, and integrates a conversational assistant powered by large to access diverse perspectives and shared experiences language models, which helps users navigate content and from peers. By consolidating information and providing ifnd relevant information interactively. structured guidance, UniCompass aims to save students

In this paper, we aim to lay the foundation for the devel- time and support more informed academic and career opment of our chatbot by focusing on the academic ofer- decisions. ings of two Italian universities: the University of Trento A similar application is "Major-Selection" [ 4 ], which (Unitn) and the University of Verona (Univr). These insti- functions as intelligent decision support software to astutions were selected due to their geographical proximity sist students with major selection. It features a rule-based and the presence of interdisciplinary and interuniver- knowledge base containing information about university sity courses, which ofer significant opportunities for admission requirements and the skills and preferences prospective students interested in studying in these ar- relevant to various majors. This knowledge is derived from academic advisors and university guidelines.

In another study [ 5 ], the authors developed a web application that provides personalized recommendations and guidance to high school students. By using a questionnaire, the AI system builds a comprehensive profile of the student and delivers data-driven, customized guid1https://digieduhack.com/

2https://digieduhack.com/news/digieduhack-2024-winners-annou

nced-meet-the-innovators-shaping-digital-education

3https://www.scholarmatch.org/ 4https://bonasmacfarlane.co.uk/en 5https://www.universitaly.it/orientamento-universitario

ance to support informed university and career decisions. journey.

Lastly, myAlmaOrienta [ 6 ] was developed to support high school students in choosing a degree programme at the University of Bologna. It helps students navigate the 3. Dataset selection process and identify programmes that match tahtewiros-kleivlleslacnod-dinesteigrnesptsr.oTcehsesaipnpvowlvaisndgebvoetlhopheidghthsrcohuogohl TcooucroslelewctebthsietedsoocfuUmneintnts6 faonrdoUurnitvars7kt,owgeaathcceerstsheedntehcestudents (user-driven innovation) and university students essary data. Since the main objective of this project is (open innovation contest) to incorporate their needs and to provide orientation for high school students, we seperspectives. lected undergraduate degrees and single cycle master’s

The chatbot involved in Uni-Mate uses Retrieval- degrees. For Unitn, we obtained data from the "ProspecAugmented Generation (RAG) to address the limitations tive Student" section, which is divided into three parts: inherent in traditional methods and standalone Large "Course Programme," providing an overview of the deLanguage Models (LLMs) [ 7 ], such as limited context gree; "Course Content," listing all courses ofered over and possible hallucinations. Dieing et al. [ 8 ] describes a the years along with their respective ECTS credits, and system for study program orientation that provides per- in some cases, detailed course descriptions; and "Applicasonalized recommendations using a Mixtral LLM paired tion," which contains enrollment information. For Univr, with a RoBERTa embedding model. Their RAG approach we collected similar information. After selecting a degree, retrieves data from a government website and achieves we retrieved the "Overview" section under the “Find out an average response accuracy above 0.75. Saha and Saha more” option, the study plan from the “Modules” section, [ 9 ] reports that a GPT-3.5–based chatbot enhances sup- and enrollment details from the “How to apply” option. port for international graduate students by combining All collected data of the courses was converted into Markgenerative capabilities with precise retrieval from social down format with the help of an extension of ChatGPT-4 media sources. Dakshit [ 10 ] explored the use of RAG called Markdown converter8. ChatGPT-4 does not always in higher education, focusing on applications as virtual structure the data in the same way, so we manually adteaching assistants and teaching aids. Faculty perspec- justed the formatting when discrepancies were too large. tives gathered in the study highlighted the benefits of We also collected data on the libraries of both universities. RAG in supporting teaching processes, such as the gener- In this case, the data were gathered manually to ensure a ation of study guides, quizzes, and assignment questions, consistent file structure and order. The collected library while also assisting students by providing precise an- data included: a general overview, with information on swers to academic queries. Faculty members emphasized access, location, staf, and available spaces; the services the importance of integrating broader data sources and ofered by the libraries; and the opening hours. advanced functionalities, including the ability to pro- We used Markdown language for several reasons, incess mathematical content and image-based inputs, to cluding eficiency and flexibility. This format allows for improve the system’s efectiveness. a clear structuring of data through the use of headings,

The potential of RAG-powered systems lies in their enabling the RAG to subsequently divide the informaability to provide accurate, contextually relevant, and per- tion into well-defined and interconnected sections. This sonalized support by combining retrieval mechanisms optimization facilitates the retrieval process, making it with generation capabilities [ 7 ]. A retrieval component easier to identify and associate relevant information. Anifrst searches for relevant information from a curated set other advantage of Markdown is its ability to include of academic resources, ensuring the content is accurate tables, which are clearer and more understandable as reand domain-specific. The generation component then sponses for users. Finally, the Markdown format is more synthesizes this information to produce coherent and con- practical during the dataset creation phase, as it allows textually appropriate responses [ 11 ]. This dual approach for the use of tools like scrapers to quickly extract text not only improves the reliability of responses but also from web pages. This process simplifies and accelerates enables the system to adapt to individual learning styles the assembly of necessary information while ensuring and paces, making it a valuable tool for personalized ed- greater consistency and quality of the data. In total, we ucation. These findings align with the goals of Uni-Mate, collected data for 29 degrees from Unitn and 41 degrees particularly in creating a chatbot that integrates multiple from Univr, resulting in 70 course documents. Additionfunctions—academic guidance, counseling services, and ally, we collected data from 5 libraries from Unitn and 34 information retrieval—into a cohesive platform. Drawing libraries from Univr, resulting in 39 library documents. from the studies mentioned above, we plan to leverage This yielded a total of 109 documents.

RAG’s strengths to ensure that Uni-Mate not only meets

students’ informational needs but also provides reliable, 76hhttttppss::////wwwwww..uunniitvnr..iitt//eenn//dateegnreeoe-/1p8ro19g/rparmomgreasmmes-of-study context-aware responses to enhance their educational 8https://chatgpt.com/g/g-lnlmekbGd-markdown-converter

Additionally, all data were translated into English logic, by separating the sources using headings. Morewhen the English version of the site did not contain suf- over, through the use of the include_prev_next_rel ifcient information compared to its Italian counterpart, and include_metadata parameters, we keep relationas the answers provided by our RAG system were more ships between the nodes, supporting the retrieval process. accurate due to the embedding model introduced dur- Nodes are persisted in a local document store in a Google ing the course. The English version of the embedding Drive folder. model is trained and tested on more data and has ac- Subsequently, these nodes are converted into veccess to a larger corpus than the Italian version, which tor embeddings. As for the model of embedding, we typically results in better training, improved generaliza- chose the BAAI/bge-m3 model11 which distinguished ittion, and richer language representations [ 12 ]. To verify self especially for its multi-granularity and the ability this, we consulted the literature and found a paper titled to work with long documents in generating semantic “Retrieval-augmented generation in multilingual settings” representations of the text. The model is loaded using [ 13 ], which confirms our hypothesis. the HuggingFaceEmbedding12 module of LlamaIndex, which provides a convenient interface for working with

Hugging Face models. The embeddings are generated

4. Experiments using the GPU acceleration provided by a T4 instance in

Google Colab13, which significantly speeds up the em

The objective of this study was to develop and evaluate bedding generation process, and are saved in a cache a document retrieval system designed to query infor- folder on Google Drive to avoid redundant computations mation from university course descriptions and library in development. details. The system’s performance was assessed based The retrieval is performed using the BM25 algorithm14, on its accuracy in retrieving relevant and contextually a widely used keyword-based retrieval method that emappropriate information. For this purpose, we utilized ploys lexical matching to retrieve relevant document secGroq9 as the provider for Large Language Models (LLMs). tions. The BM25 algorithm is implemented in LlamaIndex Specifically, two models were employed: Llama3-8b-8192 and is used to retrieve the top 15-k nodes based on the (8 billion parameters) served as the primary LLM for similarity with the user query. query processing, while Llama3-70b (70 billion parame- A graphical representation of the whole pipeline is ters) functioned as the reference ("golden") model during shown in Figure 1. evaluation. 4.1. RAG Pipeline The experimental workflow starts with a corpus of structured Markdown documents, detailing university courses and library information (as described in Section 3). The documents are loaded manually into the system from the two separeted folders for courses and libraries. For each file we then create a LlamaIndex Document object by adding metadata to it, extracting information from the file title. Specifically for the courses we extract the university name, in its shorter form, and the course name, eventually translated in English and dash separated. For the libraries we extract the university name and the name of the library, following the same convention. Because the single documents are considerably long, we decided to split them in smaller chunks to have more meaningful embeddings. Among the diferent strategies available, our ultimate choice for processing documents relied on a specific node parser: MarkdownNodeParser10.

This is a class provided by LlamaIndex that splits the

documents into Nodes following a Markdown splitting

9https://groq.com/

10https://docs.llamaindex.ai/en/v0.10.17/api/llama_index.core.node _parser.MarkdownNodeParser.html 4.2. Evaluation

Evaluation of the system’s performance employed a dual

approach: automated assessment using the Llama3-70b model and blinded human evaluation, ensuring objectivity. Both methods assessed the quality of the generated answers and, for the automated part, the suitability of the retrieved context.

For the automated evaluation of generated answers, the Llama3-70b model assessed relevance and correctness relative to the user query. It assigned a score on a 1-to-5 scale, which was subsequently normalized to a 0to-4 scale for direct comparison with human scores. The model also generated a textual justification explaining its assessment, highlighting aspects like completeness or accuracy. Due to API call limitations with standard evaluation frameworks, custom requests were implemented to facilitate this automated assessment process.

Automated context assessment focused on the text passages retrieved by the BM25 algorithm before answer 11https://huggingface.co/BAAI/bge-m3 12https://docs.llamaindex.ai/en/stable/examples/embeddings/huggi

ngface/ 13https://colab.research.google.com/ 14https://docs.llamaindex.ai/en/stable/examples/retrievers/bm25_

retriever/ Data ingestion

Documents Retrieval and augmentation

User query LLM generation

Metadata extraction Markdown sources

Embedding and indexing

Markdown node parser LlamaIndex database

Vector embeddings Huggingface BAAI/bge- m3

Vector db

Document

store Argument query with

context

Query embedding

BM25 retriever

Retrieve top- 15 docs Llama3- 8B

Generate response

Final answer 15https://github.com/run-llama/llama_index/blob/main/llama-ind

ex-core/llama_index/core/evaluation/correctness.py 16https://github.com/run-llama/llama_index/blob/main/llama-ind

ex-core/llama_index/core/evaluation/context_relevancy.py generation. The Llama3-70b model evaluated the context qualitative notes detailing their reasoning, pointing out based on two criteria: (1) the relevance of the retrieved strengths or weaknesses such as omissions or inaccucontext to the subject matter of the user’s query, and racies. To evaluate the reliability of the annotations, (2) the degree to which the context contained suficient we computed inter-annotator agreement using Krippeninformation to fully answer the query. These assessments dorf’s Alpha [ 14, 15], which is particularly well-suited contributed to a final context alignment score presented for ordinal data. The calculation results in a value of 0.90, on a 0-to-4 scale. suggesting strong agreement between annotators. In case

The prompts used by the Llama3-70b model were of disagreement between the two annotators, a third anadapted from the correctness evaluation15 and context notator evaluated the instance to determine which of the relevancy evaluation16 modules available within the Lla- two grades was more in line with the guidelines (see C. maIndex framework. These prompt templates are in- Guidelines for Human Annotation). Consensus was then cluded as an attachment at the end of this paper for full reached by majority vote. transparency and reproducibility. This comprehensive evaluation process utilized a

In parallel, two human annotators independently eval- dataset of 71 question-answer pairs, selected from a larger uated the final generated answers. They assessed rel- pool generated across all 109 source documents (coverevance and correctness on a 0-to-4 scale and provided ing both university courses and libraries). Notably, 10 of these 71 pairs were specifically designed to query information contained within the library documents, ensuring assessment of the system’s performance on that subset of data. Overall, the system demonstrated comparable performance across both evaluation methodologies. It could be integrated into the workflow. The wide array achieved an average normalized accuracy score of 83.63% of choices required extensive experimentation to deter(SD = 16.45%) in the AI evaluation and 79.22% (SD = mine the most efective combination, leading to increased 28.34%) in the human evaluation. This similarity in over- development time and complexity. all scores suggests reasonably consistent performance, The necessity of a GPU to support computationally although individual query evaluations could difer be- demanding embedding models presented another hurdle. tween the AI and human assessors, underscoring the While Google Colab ofered an accessible environment value of the dual approach. Notably, the context evalua- for initial development, it occasionally failed to provide tion score was 76.36% (SD = 20.89%). Some random test adequate hardware resources for intensive tasks. This ispairs results are shown in Tables 1, 2 3, 4 and 5. sue was eventually resolved by transitioning to a local PC

Detailed implementation procedures, including data equipped with a dedicated graphics card, which provided processing scripts, model configurations, and complete a more stable and powerful development environment. evaluation results, are documented in the associated A particularly limiting factor was the API rate-limiting Jupyter notebook. imposed on the LLM provider. While high-level methods ofered precise functionality, they required multiple

API calls per query, resulting in significant costs and in

5. Discussion creased response times. To mitigate this, a delay was implemented between successive API calls, which, al5.1. Advantages though efective in managing costs, considerably slowed A significant advantage of the implemented system lies in down the evaluation process. Furthermore, the inability its rapid deployment capability, stemming from the sim- to modify built-in API functions to define specific rate plified toolchain. The streamlined setup process enabled limits led to challenges such as unnecessary calls and quick deployment, facilitating eficient testing and devel- system crashes. opment cycles. This ease of use facilitated the integration of various components, reducing the learning curve and 5.3. Other Attempts making the system accessible even for individuals with limited prior experience. One of the most complex approaches attempted was the

Another notable benefit was the availability of multiple creation of agents capable of responding to specific quescomponents, particularly the Markdown parser, which tions for each document to enhance response accuracy. proved invaluable. The parser efectively handled docu- However, we ultimately discarded this idea due to the ment processing, ensuring accurate interpretation and excessive response times, which rendered the approach formatting of content. This feature enhanced the sys- impractical for real-time applications. tem’s overall functionality, enabling seamless handling Another challenge was to implement a more compreof structured documents and consequently improving hensive, state-of-the-art evaluation system, such as Rathe user experience. gas. While this approach showed theoretical promise,

Despite certain challenges, the system achieved rela- API limits prevented us to use more sophisticated evalutively high accuracy in its responses. However, document ation systems. retrieval remains an area for improvement, presenting In conclusion, while the project encountered several an opportunity for optimization to further enhance pre- challenges, the overall results were promising, demoncision and relevance. Nevertheless, the current results strating the potential of the approach. Future eforts demonstrate promising potential, indicating that the fun- should focus on optimizing document retrieval, improvdamental approach is sound and can be further refined ing workflow eficiency, and addressing hardware and with additional eforts. API limitations to further enhance the system’s performance and usability. 5.2. Limitations

6. Release

A primary dificulty encountered was the extensive documentation, which contained a wealth of information The source code of the RAG pipeline and the dataset used requiring considerable time for comprehension and anal- are available on the Github repository of the project.17 ysis. Understanding the optimal implementation and The data downloaded from the websites of University optimization strategies demanded significant efort due of Trento and University of Verona is available along with to the complexity of the available options, which neces- the source where the documents are taken. The Python sitated careful evaluation. code of the tool is released under the Apache 2.0 license.

Another challenge arose from the numerous potential "blocks," such as diferent retrievers and rerankers, that 17https://github.com/Samu01Tech/myVision-universities-RAG 7. Conclusions and Future Work In this paper, we presented the development of a Retrieval-Augmented Generation (RAG) system designed

to provide students with accurate academic guidance, specifically focusing on university course and library information. The system leverages a streamlined toolchain, incorporating a Markdown parser for eficient data handling and the Llama3-8b-8192 LLM for query processing.

While the system demonstrates promising results, there

are areas for enhancement.

Future work will concentrate on several key improvements. Firstly, we aim to enhance the evaluation framework to provide a more comprehensive assessment of the RAG model’s performance, incorporating metrics for contextual relevance, accuracy, and adaptability. Secondly, the integration of reranking mechanisms will be explored to prioritize retrieved results based on relevance and quality. Thirdly, to ensure robust and scalable performance, we plan to test the model with a wider range of LLMs, such as Gemini, Claude and others.

Finally, we plan to extend the current dataset, which remains relatively small, to improve both the retrieval and generation components of the system. This expansion will allow for more robust model training and better generalization across academic contexts. In addition, we will conduct user studies to evaluate the system’s efectiveness in real-world scenarios, gathering insights from student interactions to refine and improve the overall user experience.

Beyond these technical refinements, the myVision service will be expanded to serve a broader audience, including bachelor’s degree graduates and students interested in specialized master’s programs, and to include more universities. We envision the chatbot as a core component of a larger platform that will ofer a dedicated user interface, informative podcasts, and direct interaction with student advisors. Ultimately, this work lays the groundwork for a powerful tool to aid students in navigating their academic journeys. [14] A. F. Hayes, K. Krippendorf, Answering the call for a standard reliability measure for coding data,

Communication methods and measures 1 (2007) 77–89. [15] K. Krippendorf, Content analysis: An introduction to its methodology, Sage publications, 2018. A. Correctness Evaluation Prompt You are an expert evaluation system for a question

answering chatbot. You are given the following information: - a user query, and - a generated answer

C. Guidelines for Human Annotation

0: Wrong Answer The RAG pipeline generated a factually incorrect or completely irrelevant response. 1: Misses Crucial Information The answer provided is generally correct but fails to include essential details, making it incomplete or misleading.

You may also be given a reference answer to use

for reference in your evaluation. Your job is to judge the relevance and correctness of the generated answer. 2: Responded Only in Part The RAG pipeline adOutput a single score that represents a holistic evaluation. dressed only a fraction of the user’s query, leaving

You must return your response in a line with only the significant portions unanswered.

score. Do not return answers in any other format. On a separate line provide your reasoning for the score as well.

Follow these guidelines for scoring: - Your score has to be between 1 and 5, where 1 is the worst and 5 is the best. - If the generated answer is not relevant to the user query, you should give a score of 1. - If the generated answer is relevant but contains mistakes, you should give a score between 2 and 3. - If the generated answer is relevant and fully correct, you should give a score between 4 and 5.

Example Response: 4.0 The generated answer has the exact same metrics as the reference answer, but it is not as concise. B. Context Relevancy Evaluation Prompt

Your task is to evaluate if the retrieved context from the document sources are relevant to the query. The evaluation should be performed in a step-by-step manner by answering the following questions: 1. Does the retrieved context match the subject matter of the user’s query? 2. Can the retrieved context be used exclusively to provide a full answer to the user’s query? Each question above is worth 2 points, where partial marks are allowed and encouraged. Provide detailed feedback on the response according to the criteria questions previously mentioned. After your feedback provide a ifnal result by strictly following this format: ’[RESULT] followed by the float number representing the total score assigned to the response’

Query: \n {query_str} Context: \n {context_str} Feedback:

3: Could Add More Information The answer is good, but additional relevant details or elaboration would significantly improve its quality and completeness.

4: Good Answer The RAG pipeline provided a comprehensive, accurate, and highly relevant response that fully addresses the user’s query." D. Tables

The following tables contain two examples of questions used to test the system. The full list of questions, along with the answers, the human annotation, and the AI evaluation and context scores are available on the Github repository (see Section 6).

• Table 1 contains the text of two questions (IDs 43

and 60). • Table 2 shows the gold and RAG answers, respec

tively. • Table 3 contains the AI evaluations scores. • Table 4 includes the AI context scores. • Table 5 shows the human evaluation. ID 43 60

ID 43 60 What are the admission requirements for the Environmental Engineering program, and what further study opportunities are available after graduation? What are the two diferent orientations that students can choose within the Business Administration and Law degree program, and where can students undertake internships? Gold Answer Admission requires an **Italian high school diploma or equivalent foreign qualification, and passing a TOLC test**. The program has limited enrolment and seats allocated to Italian, EU, and non-EU citizens residing in Italy. Graduates have direct access to the **Master’s program in Environmental and Land Engineering** at the University of Trento, and can also access other Master’s programs in engineering that meet admission requirements. Transfers from other degree programs or universities are also possible, with applications due by February 3, 2025.

Students can choose between two orientations: **one towards private professions and the other towards public professions**. Those choosing the private professions orientation will complete their mandatory internship mainly within **professional firms or auditing companies**. Students choosing the public professions orientation can complete their internship in **public entities and companies**.

RAG Answer Admission requirements for the Environmental Engineering program: No specific admission requirements mentioned. Further study opportunities after graduation: * Direct access to the Master’s program in Environmental and Land Engineering at the University of Trento. * Access to other Master’s programs in engineering disciplines that meet admission requirements. * Graduates can continue their education with a Master’s Degree in Civil Engineering or related fields such as Energy Engineering or Environmental Meteorology. * Alternative options include enrolling in specialized programs at other universities to further enhance expertise in industrial engineering disciplines.

The two diferent orientations that students can choose within the Business Administration and Law degree program are: 1. Orientation towards private professions, mainly within professional firms or auditing companies. 2. Orientation towards public professions, it can take place in public entities and companies. Students can undertake internships in either of these orientations.

AI Evaluation Score (1-5) 3.5 The generated answer is relevant to the user query, as it addresses both the admission requirements and further study opportunities. However, it lacks specific details, such as the Italian high school diploma or equivalent foreign qualification, and the TOLC test, which are mentioned in the reference answer. Additionally, the generated answer provides some correct information, like direct access to the Master’s program in Environmental and Land Engineering, but also includes some vague or incorrect information, such as "no specific admission requirements mentioned" and "alternative options include enrolling in specialized programs at other universities". 4.5 The generated answer is highly relevant and correct, as it accurately identifies the two orientations within the Business Administration and Law degree program and provides the correct information about where students can undertake internships. The generated answer is very similar to the reference answer, with only minor diferences in wording and structure. The generated answer is clear and concise, making it easy to understand for the user. 1. Does the retrieved context match the subject matter of the user’s query? The retrieved context partially matches the subject matter of the user’s query. The query asks about the admission requirements for the Environmental Engineering program and further study opportunities after graduation. The context provides information about various programs, including Environmental Engineering, but it does not directly address the admission requirements. However, it does provide information about further study opportunities, which is relevant to the query. Score: 1.5 2. Can the retrieved context be used exclusively to provide a full answer to the user’s query? The retrieved context cannot be used exclusively to provide a full answer to the user’s query. While it provides some relevant information about further study opportunities, it does not address the admission requirements for the Environmental Engineering program. Additional information would be needed to fully answer the query. Score: 1 Final Result: [RESULT] 2.5 1. Does the retrieved context match the subject matter of the user’s query? The retrieved context is highly relevant to the subject matter of the user’s query. The context provides detailed information about the Business Administration and Law degree program, including its objectives, curriculum, and career opportunities. The context specifically mentions the two orientations within the program, which aligns with the user’s query. However, the context is quite lengthy and covers a wide range of topics, making it challenging to extract the exact information related to the query. Score: 1.8/2.0 2. Can the retrieved context be used exclusively to provide a full answer to the user’s query? The retrieved context provides a comprehensive overview of the Business Administration and Law degree program, including the two orientations mentioned in the query. However, the context does not directly answer the question about where students can undertake internships. Although the context mentions internships and provides some information about the internship experiences, it does not explicitly state where students can undertake them. Score: 1.5/2.0 Final Result: [RESULT] 3.3/4.0 Declaration on Generative AI During the preparation of this work, the author(s) did not use any generative AI tools or services.

Human Evaluation Score (0-4)

Human Evaluation Notes 2/4 4/4

The RAG answer provides an accurate and detailed overview of postgraduate study opportunities, including direct access to the relevant Master’s program and other engineering-related fields, which aligns well with the Gold answer. However, it entirely omits the admission requirements, including the essential TOLC test and diploma criteria, as well as the program’s limited enrolment structure. This missing information is critical to the question, resulting in a response that is only partially complete.

The RAG answer accurately identifies the two orientations—private professions and public professions—and correctly associates each with the corresponding internship opportunities. The phrasing is slightly diferent but conveys the same meaning as the Gold answer. The response is complete, accurate, and fully aligned with the reference.

[1]

Ukaoha ,

Ndunagu ,

Osang , et al., A guidance and counseling mobile application (gc mobile app) for educational institutions , NIPES-Journal of Science and Technology Research 2 ( 2020 ).

[2]

Majjate ,

Bellarhmouch ,

Jeghal ,

Yahyaouy ,

Tairi ,

K. A.

Zidani , Ai-powered academic guidance and counseling system based on student proifle and interests , Applied System Innovation 7 ( 2023 ) 6 .

[3]

L.-C.

Lin ,

Y.-C.

Lai ,

W.-C.

Chang ,

H.-L.

Chiu , T.-Y. Chen, Unicompass: Helping high school students ifnd the right college major , in: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems , 2023 , pp. 1 - 6 .

[4]

A. M. A.

Al , Prototype rule-based expert system with an object-oriented database for university undergraduate major selection , International Journal of Applied Information Systems (IJAIS) Foundation of Computer Science FCS , New York, USA ( 2012 ).

[5]

Jawhar ,

Bitar ,

J. R.

Miller ,

Jawhar , Aipowered customized university and career guidance, in: 2024 Intermountain Engineering, Technology and Computing (IETC) , IEEE, 2024 , pp. 157 - 161 .

[6]

Mirri ,

Prandi ,

Parisini ,

Amico ,

Bracuto ,

Salomoni , User-driven and open innovation as app design tools for high school students , in: 2018 IEEE 29th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC) , IEEE, 2018 , pp. 6 - 10 .

[7]

Lewis ,

Perez ,

Piktus ,

Petroni ,

Karpukhin ,

Goyal ,

Küttler ,

Lewis , W.-t. Yih,

Rocktäschel , et al., Retrieval-augmented generation for knowledge-intensive nlp tasks , Advances in neural information processing systems 33 ( 2020 ) 9459 - 9474 .

[8]

T. I.

Dieing ,

Schefler , L. Cohausz, Enhancing chatbot-assisted study program orientation , in: Proceedings of DELFI Workshops 2024 , Gesellschaft für Informatik eV, 2024 , pp. 10 - 18420 .

[9]

Saha , U. Saha, Enhancing international graduate student experience through ai-driven support systems: A llm and rag-based approach , in: 2024 International Conference on Data Science and Its Applications (ICoDSA) , IEEE, 2024 , pp. 300 - 304 .

[10]

Dakshit , Faculty perspectives on the potential of rag in computer science higher education , in: Proceedings of the 25th Annual Conference on Information Technology Education , 2024 , pp. 19 - 24 .

[11]

Modran ,

I. C.

Bogdan , D. Ursut, iu, C. Samoila,

P. L.

Modran , Llm intelligent agent tutoring in higher education courses using a rag approach , Preprints 2024 2024070519 ( 2024 ).

[12]

Chen ,

Xiao ,

Zhang ,

Luo ,

Lian , Z . Liu, M3 -embedding: Multi-linguality, multifunctionality, multi-granularity text embeddings through self-knowledge distillation , in: L. -W. Ku , A. Martins , V. Srikumar (Eds.), Findings of the Association for Computational Linguistics: ACL 2024 , Association for Computational Linguistics , Bangkok, Thailand, 2024 , pp. 2318 - 2335 . URL: https: / / a c l a n t h o l o g y . o r g / 2 0 2 4 . fi n d i n g s- a c l . 1 3 7 / . doi: 10 .18653/v1/ 2024 .findings-acl. 137 .

[13]

Chirkova ,

Rau ,

Déjean ,

Formal ,

Clinchant ,

Nikoulina , Retrieval-augmented generation in multilingual settings , arXiv preprint arXiv:2407.01463 ( 2024 ).