<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Uni-Mate: A Retrieval-Augmented Generation System to Provide High School Students with Accurate Academic Guidance</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Samuele Mazzei</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lorenzo Zambotto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gabriele Tealdo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alberto Macagno</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessio Palmero Aprosio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Psychology and Cognitive Science, University of Trento</institution>
          ,
          <addr-line>Corso Bettini 84, Rovereto</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <fpage>0</fpage>
      <lpage>4</lpage>
      <abstract>
        <p>This paper introduces the development and evaluation of a Retrieval-Augmented Generation (RAG) system designed to assist prospective students in navigating university options. The system provides accurate academic guidance by retrieving and synthesizing information on undergraduate and single-cycle master's degree programs, as well as library resources, from the University of Trento and the University of Verona. The RAG pipeline utilizes a streamlined toolchain, incorporating a Markdown parser for eficient data handling and the Llama3-8b-8192 Large Language Model (LLM) for query processing. The system's performance was assessed through both automated evaluation, using the Llama3-70b LLM as a reference, and blinded human evaluation. The results demonstrate the system's potential for providing relevant and accurate information to students. The evaluation also highlighted areas for further development, including enhanced retrieval mechanisms and expanded LLM testing. Future work aims to broaden the system's scope to include more degree levels and universities, ultimately creating a comprehensive platform to support students in their academic decision-making journey.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Retrieval-Augmented Generation (RAG)</kwd>
        <kwd>Natural Language Processing (NLP)</kwd>
        <kwd>Large Language Models (LLMs)</kwd>
        <kwd>Dataset Creation</kwd>
        <kwd>Academic Guidance</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>A survey, conducted among 183 students from the</title>
      </sec>
      <sec id="sec-1-2">
        <title>Department of Psychology and Cognitive Science and</title>
        <p>Choosing a university path is one of the most complex the School of Innovation between October and
Novemand significant decisions for students nearing the end ber 2024, was instrumental in identifying a significant
of high school. This, combined with the overwhelming need among students for improved online educational
amount of new information encountered when brows- guidance and revealed significant challenges faced by
ing various and often inconsistent university websites, students in choosing their academic paths. A striking
creates confusion and a sense of being lost, leading to 74% reported at least one major dificulty in the
orientawasted time and uncertainty. These challenges stem from tion process. The most common issues included a lack of
both the dispersion of available information and the lack clear and comparable information across courses and
inof intuitive tools to guide students through the decision- stitutions (43%), uncertainty regarding personal interests
making process. and aptitudes (38%), and confusion about the diferences</p>
        <p>We deal with this problem by creating a platform called among European universities (29%). Additionally,
limUni-Mate (formerly referred to as MyVision and later re- ited access to insights from alumni was also noted (17%).
named to better align with startup branding goals, of- When seeking guidance, students primarily relied on
offering a more appealing name for potential users and ifcial university websites (65%) and personal networks
investors). The system aims to integrate an AI-powered such as parents or friends (58%), while only 21% consulted
chatbot that provides relevant information about part- academic counselors. Moreover, fewer than 10% found
ner universities and online counseling services within a digital comparison tools to be truly efective.
single interface. The data also highlights a strong interest in innovative
orientation tools. Notably, 81% of respondents expressed
CLiC-it 2025: Eleventh Italian Conference on Computational Linguis- a willingness to use a platform like Uni-Mate, which
tics, September 24 — 26, 2025, Cagliari, Italy would feature personalized course matching algorithms
* Corresponding author. and structured reviews from former students.
Further† These authors contributed equally. more, 67% indicated a readiness to pay for such a service
l$oresnamzou.zealem. mboatztzoe@i@stsutduednentit.ui.nunitintn.it.it(L(.S.ZMamazbzoetit)o; ); if it proved to be efective. These results point to a clear
gabriele.tealdo@studenti.unitn.it (G. Tealdo); gap in the current academic orientation oferings, which
alberto.macagno@studenti.unitn.it (A. Macagno); are seen as fragmented, non-interactive, and lacking
pera.palmeroaprosio@unitn.it (A. Palmero Aprosio) sonalization. There is a strong latent demand for
com© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License</p>
        <p>Attribution 4.0 International (CC BY 4.0).
prehensive digital solutions that provide personalized eas. The system was developed with a specific focus on
guidance, real-life experiences, and comparative tools to post-diploma university orientation, considering only
support students in making well-informed educational bachelor’s degree programs and single-cycle master’s
decisions. degrees. This approach addresses the needs of recent</p>
        <p>Following this initial validation, the team submitted high school graduates by providing an innovative tool
MyVision as a proposal to DigiEduHack,1 a European to explore available academic options in a simple and
innovation challenge promoted by the European Com- immediate way. Furthermore, we included information
mission and aimed at fostering technological advance- about the universities’ libraries to provide new students
ments in the field of education. During the event, the with access to a valuable resource that can support their
concept was further developed and ultimately awarded studies.
ifrst place in the Expert Category, the most competitive
and high-level track of the competition.2</p>
        <p>
          As a result, the team has been invited to present MyVi- 2. Related work
sion at the DigiEduHack Award Ceremony, scheduled
for June 24th in Brussels, as part of the Digital Education Numerous research groups and institutions have
exStakeholder Forum 2025 —- a major annual event orga- plored various strategies to support students in selecting
nized by the European Commission to promote dialogue the most suitable university.
and policy development in digital education. For example, in a study [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], the researchers evaluated
        </p>
        <p>
          Notable precedents attempts to address these chal- an educational app called GC Mobile and concluded that
lenges exist in the academic guidance space. In the United it enhanced the counseling process by leveraging
techStates, ScholarMatch3 focuses on helping first-generation, nology to provide a scalable, accessible, and confidential
low-income students secure scholarships and complete platform for student guidance. As the authors noted,
their college education, addressing critical financial and “The GC Mobile App allows students to see a counselor
support gaps. In contrast, a comparable all-in-one solu- anytime and from any location without having to visit
tion is lacking in Europe, where the challenges for stu- them in the ofice.”
dents are less about tuition afordability and more about Another study [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] developed an AI-powered academic
navigating a fragmented ecosystem of academic options. guidance and counseling system with the primary
obIn the UK, Bonas MacFarlane4 ofers premium consulting jective of supporting high school seniors in navigating
for school and university placements, primarily target- the college application process and selecting suitable
acaing afluent families. These examples highlight both the demic paths and universities for tertiary education. It
proven demand for personalized academic support and also aimed to address the shortage of human resources in
the gap that MyVision seeks to fill in the EU context—by traditional counseling by providing an accessible,
conveofering accessible, digital tools for orientation, compari- nient, and time-saving alternative for students to obtain
son, reviews, and guidance all in one unified platform. valuable insights without requiring face-to-face
interac
        </p>
        <p>
          In Italy, UniversItaly5 is the oficial portal developed by tion or travel to gather university information.
the Italian Ministry of University and Research to support Another approach was the creation of UniCompass
students, both Italian and international, in navigating [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], a platform designed to help students eficiently learn
the higher education system in Italy. The web portal about and compare universities and departments, and
integrates a conversational assistant powered by large to access diverse perspectives and shared experiences
language models, which helps users navigate content and from peers. By consolidating information and providing
ifnd relevant information interactively. structured guidance, UniCompass aims to save students
        </p>
        <p>
          In this paper, we aim to lay the foundation for the devel- time and support more informed academic and career
opment of our chatbot by focusing on the academic ofer- decisions.
ings of two Italian universities: the University of Trento A similar application is "Major-Selection" [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], which
(Unitn) and the University of Verona (Univr). These insti- functions as intelligent decision support software to
astutions were selected due to their geographical proximity sist students with major selection. It features a rule-based
and the presence of interdisciplinary and interuniver- knowledge base containing information about university
sity courses, which ofer significant opportunities for admission requirements and the skills and preferences
prospective students interested in studying in these ar- relevant to various majors. This knowledge is derived
from academic advisors and university guidelines.
        </p>
        <p>
          In another study [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], the authors developed a web
application that provides personalized recommendations
and guidance to high school students. By using a
questionnaire, the AI system builds a comprehensive profile
of the student and delivers data-driven, customized
guid1https://digieduhack.com/
        </p>
      </sec>
      <sec id="sec-1-3">
        <title>2https://digieduhack.com/news/digieduhack-2024-winners-annou</title>
        <p>nced-meet-the-innovators-shaping-digital-education</p>
      </sec>
      <sec id="sec-1-4">
        <title>3https://www.scholarmatch.org/</title>
      </sec>
      <sec id="sec-1-5">
        <title>4https://bonasmacfarlane.co.uk/en</title>
      </sec>
      <sec id="sec-1-6">
        <title>5https://www.universitaly.it/orientamento-universitario</title>
        <p>ance to support informed university and career decisions. journey.</p>
        <p>
          Lastly, myAlmaOrienta [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] was developed to support
high school students in choosing a degree programme at
the University of Bologna. It helps students navigate the 3. Dataset
selection process and identify programmes that match
tahtewiros-kleivlleslacnod-dinesteigrnesptsr.oTcehsesaipnpvowlvaisndgebvoetlhopheidghthsrcohuogohl TcooucroslelewctebthsietedsoocfuUmneintnts6
faonrdoUurnitvars7kt,owgeaathcceerstsheedntehcestudents (user-driven innovation) and university students essary data. Since the main objective of this project is
(open innovation contest) to incorporate their needs and to provide orientation for high school students, we
seperspectives. lected undergraduate degrees and single cycle master’s
        </p>
        <p>
          The chatbot involved in Uni-Mate uses Retrieval- degrees. For Unitn, we obtained data from the
"ProspecAugmented Generation (RAG) to address the limitations tive Student" section, which is divided into three parts:
inherent in traditional methods and standalone Large "Course Programme," providing an overview of the
deLanguage Models (LLMs) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], such as limited context gree; "Course Content," listing all courses ofered over
and possible hallucinations. Dieing et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] describes a the years along with their respective ECTS credits, and
system for study program orientation that provides per- in some cases, detailed course descriptions; and
"Applicasonalized recommendations using a Mixtral LLM paired tion," which contains enrollment information. For Univr,
with a RoBERTa embedding model. Their RAG approach we collected similar information. After selecting a degree,
retrieves data from a government website and achieves we retrieved the "Overview" section under the “Find out
an average response accuracy above 0.75. Saha and Saha more” option, the study plan from the “Modules” section,
[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] reports that a GPT-3.5–based chatbot enhances sup- and enrollment details from the “How to apply” option.
port for international graduate students by combining All collected data of the courses was converted into
Markgenerative capabilities with precise retrieval from social down format with the help of an extension of ChatGPT-4
media sources. Dakshit [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] explored the use of RAG called Markdown converter8. ChatGPT-4 does not always
in higher education, focusing on applications as virtual structure the data in the same way, so we manually
adteaching assistants and teaching aids. Faculty perspec- justed the formatting when discrepancies were too large.
tives gathered in the study highlighted the benefits of We also collected data on the libraries of both universities.
RAG in supporting teaching processes, such as the gener- In this case, the data were gathered manually to ensure a
ation of study guides, quizzes, and assignment questions, consistent file structure and order. The collected library
while also assisting students by providing precise an- data included: a general overview, with information on
swers to academic queries. Faculty members emphasized access, location, staf, and available spaces; the services
the importance of integrating broader data sources and ofered by the libraries; and the opening hours.
advanced functionalities, including the ability to pro- We used Markdown language for several reasons,
incess mathematical content and image-based inputs, to cluding eficiency and flexibility. This format allows for
improve the system’s efectiveness. a clear structuring of data through the use of headings,
        </p>
        <p>
          The potential of RAG-powered systems lies in their enabling the RAG to subsequently divide the
informaability to provide accurate, contextually relevant, and per- tion into well-defined and interconnected sections. This
sonalized support by combining retrieval mechanisms optimization facilitates the retrieval process, making it
with generation capabilities [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. A retrieval component easier to identify and associate relevant information.
Anifrst searches for relevant information from a curated set other advantage of Markdown is its ability to include
of academic resources, ensuring the content is accurate tables, which are clearer and more understandable as
reand domain-specific. The generation component then sponses for users. Finally, the Markdown format is more
synthesizes this information to produce coherent and con- practical during the dataset creation phase, as it allows
textually appropriate responses [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. This dual approach for the use of tools like scrapers to quickly extract text
not only improves the reliability of responses but also from web pages. This process simplifies and accelerates
enables the system to adapt to individual learning styles the assembly of necessary information while ensuring
and paces, making it a valuable tool for personalized ed- greater consistency and quality of the data. In total, we
ucation. These findings align with the goals of Uni-Mate, collected data for 29 degrees from Unitn and 41 degrees
particularly in creating a chatbot that integrates multiple from Univr, resulting in 70 course documents.
Additionfunctions—academic guidance, counseling services, and ally, we collected data from 5 libraries from Unitn and 34
information retrieval—into a cohesive platform. Drawing libraries from Univr, resulting in 39 library documents.
from the studies mentioned above, we plan to leverage This yielded a total of 109 documents.
        </p>
      </sec>
      <sec id="sec-1-7">
        <title>RAG’s strengths to ensure that Uni-Mate not only meets</title>
        <p>students’ informational needs but also provides reliable, 76hhttttppss::////wwwwww..uunniitvnr..iitt//eenn//dateegnreeoe-/1p8ro19g/rparmomgreasmmes-of-study
context-aware responses to enhance their educational 8https://chatgpt.com/g/g-lnlmekbGd-markdown-converter</p>
        <p>
          Additionally, all data were translated into English logic, by separating the sources using headings.
Morewhen the English version of the site did not contain suf- over, through the use of the include_prev_next_rel
ifcient information compared to its Italian counterpart, and include_metadata parameters, we keep
relationas the answers provided by our RAG system were more ships between the nodes, supporting the retrieval process.
accurate due to the embedding model introduced dur- Nodes are persisted in a local document store in a Google
ing the course. The English version of the embedding Drive folder.
model is trained and tested on more data and has ac- Subsequently, these nodes are converted into
veccess to a larger corpus than the Italian version, which tor embeddings. As for the model of embedding, we
typically results in better training, improved generaliza- chose the BAAI/bge-m3 model11 which distinguished
ittion, and richer language representations [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. To verify self especially for its multi-granularity and the ability
this, we consulted the literature and found a paper titled to work with long documents in generating semantic
“Retrieval-augmented generation in multilingual settings” representations of the text. The model is loaded using
[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], which confirms our hypothesis. the HuggingFaceEmbedding12 module of LlamaIndex,
which provides a convenient interface for working with
        </p>
      </sec>
      <sec id="sec-1-8">
        <title>Hugging Face models. The embeddings are generated</title>
        <p>4. Experiments using the GPU acceleration provided by a T4 instance in</p>
        <sec id="sec-1-8-1">
          <title>Google Colab13, which significantly speeds up the em</title>
          <p>The objective of this study was to develop and evaluate bedding generation process, and are saved in a cache
a document retrieval system designed to query infor- folder on Google Drive to avoid redundant computations
mation from university course descriptions and library in development.
details. The system’s performance was assessed based The retrieval is performed using the BM25 algorithm14,
on its accuracy in retrieving relevant and contextually a widely used keyword-based retrieval method that
emappropriate information. For this purpose, we utilized ploys lexical matching to retrieve relevant document
secGroq9 as the provider for Large Language Models (LLMs). tions. The BM25 algorithm is implemented in LlamaIndex
Specifically, two models were employed: Llama3-8b-8192 and is used to retrieve the top 15-k nodes based on the
(8 billion parameters) served as the primary LLM for similarity with the user query.
query processing, while Llama3-70b (70 billion parame- A graphical representation of the whole pipeline is
ters) functioned as the reference ("golden") model during shown in Figure 1.
evaluation.
4.1. RAG Pipeline
The experimental workflow starts with a corpus of
structured Markdown documents, detailing university courses
and library information (as described in Section 3). The
documents are loaded manually into the system from
the two separeted folders for courses and libraries. For
each file we then create a LlamaIndex Document object
by adding metadata to it, extracting information from
the file title. Specifically for the courses we extract the
university name, in its shorter form, and the course name,
eventually translated in English and dash separated. For
the libraries we extract the university name and the name
of the library, following the same convention. Because
the single documents are considerably long, we decided
to split them in smaller chunks to have more
meaningful embeddings. Among the diferent strategies
available, our ultimate choice for processing documents
relied on a specific node parser: MarkdownNodeParser10.</p>
        </sec>
      </sec>
      <sec id="sec-1-9">
        <title>This is a class provided by LlamaIndex that splits the</title>
        <p>documents into Nodes following a Markdown splitting</p>
      </sec>
      <sec id="sec-1-10">
        <title>9https://groq.com/</title>
        <p>10https://docs.llamaindex.ai/en/v0.10.17/api/llama_index.core.node
_parser.MarkdownNodeParser.html
4.2. Evaluation</p>
      </sec>
      <sec id="sec-1-11">
        <title>Evaluation of the system’s performance employed a dual</title>
        <p>approach: automated assessment using the Llama3-70b
model and blinded human evaluation, ensuring
objectivity. Both methods assessed the quality of the generated
answers and, for the automated part, the suitability of
the retrieved context.</p>
        <p>For the automated evaluation of generated answers,
the Llama3-70b model assessed relevance and
correctness relative to the user query. It assigned a score on a
1-to-5 scale, which was subsequently normalized to a
0to-4 scale for direct comparison with human scores. The
model also generated a textual justification explaining
its assessment, highlighting aspects like completeness or
accuracy. Due to API call limitations with standard
evaluation frameworks, custom requests were implemented
to facilitate this automated assessment process.</p>
        <p>Automated context assessment focused on the text
passages retrieved by the BM25 algorithm before answer
11https://huggingface.co/BAAI/bge-m3
12https://docs.llamaindex.ai/en/stable/examples/embeddings/huggi</p>
        <p>ngface/
13https://colab.research.google.com/
14https://docs.llamaindex.ai/en/stable/examples/retrievers/bm25_</p>
        <p>retriever/
Data ingestion</p>
        <p>Documents
Retrieval and augmentation</p>
        <p>User query
LLM generation</p>
        <p>Metadata
extraction
Markdown
sources</p>
        <p>Embedding and indexing</p>
        <p>Markdown
node parser
LlamaIndex
database</p>
        <p>Vector
embeddings
Huggingface
BAAI/bge- m3</p>
        <p>Vector db</p>
        <p>Document</p>
        <p>store
Argument
query with</p>
        <p>context</p>
        <p>Query
embedding</p>
        <p>BM25
retriever</p>
        <p>Retrieve
top- 15 docs
Llama3- 8B</p>
        <p>Generate
response</p>
        <p>Final answer
15https://github.com/run-llama/llama_index/blob/main/llama-ind</p>
        <p>ex-core/llama_index/core/evaluation/correctness.py
16https://github.com/run-llama/llama_index/blob/main/llama-ind</p>
        <p>ex-core/llama_index/core/evaluation/context_relevancy.py
generation. The Llama3-70b model evaluated the context qualitative notes detailing their reasoning, pointing out
based on two criteria: (1) the relevance of the retrieved strengths or weaknesses such as omissions or
inaccucontext to the subject matter of the user’s query, and racies. To evaluate the reliability of the annotations,
(2) the degree to which the context contained suficient we computed inter-annotator agreement using
Krippeninformation to fully answer the query. These assessments dorf’s Alpha [ 14, 15], which is particularly well-suited
contributed to a final context alignment score presented for ordinal data. The calculation results in a value of 0.90,
on a 0-to-4 scale. suggesting strong agreement between annotators. In case</p>
        <p>The prompts used by the Llama3-70b model were of disagreement between the two annotators, a third
anadapted from the correctness evaluation15 and context notator evaluated the instance to determine which of the
relevancy evaluation16 modules available within the Lla- two grades was more in line with the guidelines (see C.
maIndex framework. These prompt templates are in- Guidelines for Human Annotation). Consensus was then
cluded as an attachment at the end of this paper for full reached by majority vote.
transparency and reproducibility. This comprehensive evaluation process utilized a</p>
        <p>In parallel, two human annotators independently eval- dataset of 71 question-answer pairs, selected from a larger
uated the final generated answers. They assessed rel- pool generated across all 109 source documents
(coverevance and correctness on a 0-to-4 scale and provided ing both university courses and libraries). Notably, 10 of
these 71 pairs were specifically designed to query
information contained within the library documents, ensuring
assessment of the system’s performance on that subset
of data. Overall, the system demonstrated comparable
performance across both evaluation methodologies. It could be integrated into the workflow. The wide array
achieved an average normalized accuracy score of 83.63% of choices required extensive experimentation to
deter(SD = 16.45%) in the AI evaluation and 79.22% (SD = mine the most efective combination, leading to increased
28.34%) in the human evaluation. This similarity in over- development time and complexity.
all scores suggests reasonably consistent performance, The necessity of a GPU to support computationally
although individual query evaluations could difer be- demanding embedding models presented another hurdle.
tween the AI and human assessors, underscoring the While Google Colab ofered an accessible environment
value of the dual approach. Notably, the context evalua- for initial development, it occasionally failed to provide
tion score was 76.36% (SD = 20.89%). Some random test adequate hardware resources for intensive tasks. This
ispairs results are shown in Tables 1, 2 3, 4 and 5. sue was eventually resolved by transitioning to a local PC</p>
        <p>Detailed implementation procedures, including data equipped with a dedicated graphics card, which provided
processing scripts, model configurations, and complete a more stable and powerful development environment.
evaluation results, are documented in the associated A particularly limiting factor was the API rate-limiting
Jupyter notebook. imposed on the LLM provider. While high-level
methods ofered precise functionality, they required multiple</p>
      </sec>
      <sec id="sec-1-12">
        <title>API calls per query, resulting in significant costs and in</title>
        <p>5. Discussion creased response times. To mitigate this, a delay was
implemented between successive API calls, which,
al5.1. Advantages though efective in managing costs, considerably slowed
A significant advantage of the implemented system lies in down the evaluation process. Furthermore, the inability
its rapid deployment capability, stemming from the sim- to modify built-in API functions to define specific rate
plified toolchain. The streamlined setup process enabled limits led to challenges such as unnecessary calls and
quick deployment, facilitating eficient testing and devel- system crashes.
opment cycles. This ease of use facilitated the integration
of various components, reducing the learning curve and 5.3. Other Attempts
making the system accessible even for individuals with
limited prior experience. One of the most complex approaches attempted was the</p>
        <p>Another notable benefit was the availability of multiple creation of agents capable of responding to specific
quescomponents, particularly the Markdown parser, which tions for each document to enhance response accuracy.
proved invaluable. The parser efectively handled docu- However, we ultimately discarded this idea due to the
ment processing, ensuring accurate interpretation and excessive response times, which rendered the approach
formatting of content. This feature enhanced the sys- impractical for real-time applications.
tem’s overall functionality, enabling seamless handling Another challenge was to implement a more
compreof structured documents and consequently improving hensive, state-of-the-art evaluation system, such as
Rathe user experience. gas. While this approach showed theoretical promise,</p>
        <p>Despite certain challenges, the system achieved rela- API limits prevented us to use more sophisticated
evalutively high accuracy in its responses. However, document ation systems.
retrieval remains an area for improvement, presenting In conclusion, while the project encountered several
an opportunity for optimization to further enhance pre- challenges, the overall results were promising,
demoncision and relevance. Nevertheless, the current results strating the potential of the approach. Future eforts
demonstrate promising potential, indicating that the fun- should focus on optimizing document retrieval,
improvdamental approach is sound and can be further refined ing workflow eficiency, and addressing hardware and
with additional eforts. API limitations to further enhance the system’s
performance and usability.
5.2. Limitations</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>6. Release</title>
      <p>A primary dificulty encountered was the extensive
documentation, which contained a wealth of information The source code of the RAG pipeline and the dataset used
requiring considerable time for comprehension and anal- are available on the Github repository of the project.17
ysis. Understanding the optimal implementation and The data downloaded from the websites of University
optimization strategies demanded significant efort due of Trento and University of Verona is available along with
to the complexity of the available options, which neces- the source where the documents are taken. The Python
sitated careful evaluation. code of the tool is released under the Apache 2.0 license.</p>
      <sec id="sec-2-1">
        <title>Another challenge arose from the numerous potential "blocks," such as diferent retrievers and rerankers, that 17https://github.com/Samu01Tech/myVision-universities-RAG</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>7. Conclusions and Future Work</title>
      <sec id="sec-3-1">
        <title>In this paper, we presented the development of a</title>
      </sec>
      <sec id="sec-3-2">
        <title>Retrieval-Augmented Generation (RAG) system designed</title>
        <p>to provide students with accurate academic guidance,
specifically focusing on university course and library
information. The system leverages a streamlined toolchain,
incorporating a Markdown parser for eficient data
handling and the Llama3-8b-8192 LLM for query processing.</p>
      </sec>
      <sec id="sec-3-3">
        <title>While the system demonstrates promising results, there</title>
        <p>are areas for enhancement.</p>
        <p>Future work will concentrate on several key
improvements. Firstly, we aim to enhance the evaluation
framework to provide a more comprehensive assessment of
the RAG model’s performance, incorporating metrics for
contextual relevance, accuracy, and adaptability.
Secondly, the integration of reranking mechanisms will be
explored to prioritize retrieved results based on relevance
and quality. Thirdly, to ensure robust and scalable
performance, we plan to test the model with a wider range
of LLMs, such as Gemini, Claude and others.</p>
        <p>Finally, we plan to extend the current dataset, which
remains relatively small, to improve both the retrieval
and generation components of the system. This
expansion will allow for more robust model training and better
generalization across academic contexts. In addition, we
will conduct user studies to evaluate the system’s
efectiveness in real-world scenarios, gathering insights from
student interactions to refine and improve the overall
user experience.</p>
        <p>Beyond these technical refinements, the myVision
service will be expanded to serve a broader audience,
including bachelor’s degree graduates and students interested
in specialized master’s programs, and to include more
universities. We envision the chatbot as a core
component of a larger platform that will ofer a dedicated user
interface, informative podcasts, and direct interaction
with student advisors. Ultimately, this work lays the
groundwork for a powerful tool to aid students in
navigating their academic journeys.
[14] A. F. Hayes, K. Krippendorf, Answering the call
for a standard reliability measure for coding data,</p>
      </sec>
      <sec id="sec-3-4">
        <title>Communication methods and measures 1 (2007) 77–89. [15] K. Krippendorf, Content analysis: An introduction to its methodology, Sage publications, 2018.</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>A. Correctness Evaluation Prompt</title>
      <sec id="sec-4-1">
        <title>You are an expert evaluation system for a question</title>
        <p>answering chatbot. You are given the following
information:
- a user query, and
- a generated answer</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>C. Guidelines for Human Annotation</title>
      <p>0: Wrong Answer The RAG pipeline generated a
factually incorrect or completely irrelevant response.
1: Misses Crucial Information The answer provided
is generally correct but fails to include essential
details, making it incomplete or misleading.</p>
      <sec id="sec-5-1">
        <title>You may also be given a reference answer to use</title>
        <p>for reference in your evaluation. Your job is to judge
the relevance and correctness of the generated answer. 2: Responded Only in Part The RAG pipeline
adOutput a single score that represents a holistic evaluation. dressed only a fraction of the user’s query, leaving</p>
        <sec id="sec-5-1-1">
          <title>You must return your response in a line with only the significant portions unanswered.</title>
          <p>score. Do not return answers in any other format. On a
separate line provide your reasoning for the score as well.</p>
          <p>Follow these guidelines for scoring:
- Your score has to be between 1 and 5, where 1 is the
worst and 5 is the best.
- If the generated answer is not relevant to the user
query, you should give a score of 1.
- If the generated answer is relevant but contains
mistakes, you should give a score between 2 and 3.
- If the generated answer is relevant and fully correct,
you should give a score between 4 and 5.</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>Example Response: 4.0</title>
      </sec>
      <sec id="sec-5-3">
        <title>The generated answer has the exact same metrics as the reference answer, but it is not as concise.</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>B. Context Relevancy Evaluation Prompt</title>
      <p>Your task is to evaluate if the retrieved context from
the document sources are relevant to the query. The
evaluation should be performed in a step-by-step manner
by answering the following questions: 1. Does the
retrieved context match the subject matter of the user’s
query? 2. Can the retrieved context be used exclusively
to provide a full answer to the user’s query? Each
question above is worth 2 points, where partial marks
are allowed and encouraged. Provide detailed feedback
on the response according to the criteria questions
previously mentioned. After your feedback provide a
ifnal result by strictly following this format: ’[RESULT]
followed by the float number representing the total score
assigned to the response’</p>
      <sec id="sec-6-1">
        <title>Query: \n {query_str}</title>
      </sec>
      <sec id="sec-6-2">
        <title>Context: \n {context_str} Feedback:</title>
        <p>3: Could Add More Information The answer is good,
but additional relevant details or elaboration
would significantly improve its quality and
completeness.</p>
      </sec>
      <sec id="sec-6-3">
        <title>4: Good Answer The RAG pipeline provided a comprehensive, accurate, and highly relevant response that fully addresses the user’s query."</title>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>D. Tables</title>
      <p>The following tables contain two examples of questions
used to test the system. The full list of questions, along
with the answers, the human annotation, and the AI
evaluation and context scores are available on the Github
repository (see Section 6).</p>
      <p>• Table 1 contains the text of two questions (IDs 43</p>
      <p>and 60).
• Table 2 shows the gold and RAG answers,
respec</p>
      <p>tively.
• Table 3 contains the AI evaluations scores.
• Table 4 includes the AI context scores.
• Table 5 shows the human evaluation.
ID
43
60</p>
      <p>ID
43
60
What are the admission requirements for the Environmental Engineering program, and what further
study opportunities are available after graduation?
What are the two diferent orientations that students can choose within the Business Administration
and Law degree program, and where can students undertake internships?
Gold Answer
Admission requires an **Italian high school diploma
or equivalent foreign qualification, and passing a
TOLC test**. The program has limited enrolment
and seats allocated to Italian, EU, and non-EU
citizens residing in Italy. Graduates have direct access
to the **Master’s program in Environmental and
Land Engineering** at the University of Trento, and
can also access other Master’s programs in
engineering that meet admission requirements. Transfers
from other degree programs or universities are also
possible, with applications due by February 3, 2025.</p>
      <p>Students can choose between two orientations:
**one towards private professions and the other
towards public professions**. Those choosing the
private professions orientation will complete their
mandatory internship mainly within **professional
firms or auditing companies**. Students choosing
the public professions orientation can complete their
internship in **public entities and companies**.</p>
      <p>RAG Answer
Admission requirements for the Environmental
Engineering program: No specific admission
requirements mentioned. Further study opportunities after
graduation: * Direct access to the Master’s program
in Environmental and Land Engineering at the
University of Trento. * Access to other Master’s
programs in engineering disciplines that meet
admission requirements. * Graduates can continue their
education with a Master’s Degree in Civil
Engineering or related fields such as Energy Engineering or
Environmental Meteorology. * Alternative options
include enrolling in specialized programs at other
universities to further enhance expertise in
industrial engineering disciplines.</p>
      <p>The two diferent orientations that students can
choose within the Business Administration and Law
degree program are: 1. Orientation towards private
professions, mainly within professional firms or
auditing companies. 2. Orientation towards public
professions, it can take place in public entities and
companies. Students can undertake internships in
either of these orientations.</p>
      <p>AI Evaluation Score (1-5)
3.5 The generated answer is relevant to the user query, as it addresses both the admission requirements
and further study opportunities. However, it lacks specific details, such as the Italian high school
diploma or equivalent foreign qualification, and the TOLC test, which are mentioned in the reference
answer. Additionally, the generated answer provides some correct information, like direct access to the
Master’s program in Environmental and Land Engineering, but also includes some vague or incorrect
information, such as "no specific admission requirements mentioned" and "alternative options include
enrolling in specialized programs at other universities".
4.5 The generated answer is highly relevant and correct, as it accurately identifies the two orientations
within the Business Administration and Law degree program and provides the correct information
about where students can undertake internships. The generated answer is very similar to the reference
answer, with only minor diferences in wording and structure. The generated answer is clear and concise,
making it easy to understand for the user.
1. Does the retrieved context match the subject matter of the user’s query? The retrieved context partially
matches the subject matter of the user’s query. The query asks about the admission requirements
for the Environmental Engineering program and further study opportunities after graduation. The
context provides information about various programs, including Environmental Engineering, but it does
not directly address the admission requirements. However, it does provide information about further
study opportunities, which is relevant to the query. Score: 1.5 2. Can the retrieved context be used
exclusively to provide a full answer to the user’s query? The retrieved context cannot be used exclusively
to provide a full answer to the user’s query. While it provides some relevant information about further
study opportunities, it does not address the admission requirements for the Environmental Engineering
program. Additional information would be needed to fully answer the query. Score: 1 Final Result:
[RESULT] 2.5
1. Does the retrieved context match the subject matter of the user’s query? The retrieved context is
highly relevant to the subject matter of the user’s query. The context provides detailed information
about the Business Administration and Law degree program, including its objectives, curriculum, and
career opportunities. The context specifically mentions the two orientations within the program, which
aligns with the user’s query. However, the context is quite lengthy and covers a wide range of topics,
making it challenging to extract the exact information related to the query. Score: 1.8/2.0 2. Can the
retrieved context be used exclusively to provide a full answer to the user’s query? The retrieved context
provides a comprehensive overview of the Business Administration and Law degree program, including
the two orientations mentioned in the query. However, the context does not directly answer the
question about where students can undertake internships. Although the context mentions internships
and provides some information about the internship experiences, it does not explicitly state where
students can undertake them. Score: 1.5/2.0 Final Result: [RESULT] 3.3/4.0
Declaration on Generative AI
During the preparation of this work, the author(s) did not use any generative AI tools or services.</p>
      <p>Human
Evaluation Score (0-4)</p>
      <p>Human Evaluation Notes
2/4
4/4</p>
      <p>The RAG answer provides an accurate and detailed overview of postgraduate study
opportunities, including direct access to the relevant Master’s program and other
engineering-related fields, which aligns well with the Gold answer. However, it entirely
omits the admission requirements, including the essential TOLC test and diploma
criteria, as well as the program’s limited enrolment structure. This missing information
is critical to the question, resulting in a response that is only partially complete.</p>
      <p>The RAG answer accurately identifies the two orientations—private professions and
public professions—and correctly associates each with the corresponding internship
opportunities. The phrasing is slightly diferent but conveys the same meaning as the
Gold answer. The response is complete, accurate, and fully aligned with the reference.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ukaoha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ndunagu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Osang</surname>
          </string-name>
          , et al.,
          <article-title>A guidance and counseling mobile application (gc mobile app) for educational institutions</article-title>
          ,
          <source>NIPES-Journal of Science and Technology Research</source>
          <volume>2</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Majjate</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bellarhmouch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jeghal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yahyaouy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Tairi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Zidani</surname>
          </string-name>
          ,
          <article-title>Ai-powered academic guidance and counseling system based on student proifle and interests</article-title>
          ,
          <source>Applied System Innovation</source>
          <volume>7</volume>
          (
          <year>2023</year>
          )
          <article-title>6</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.-C.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-C.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-C.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.-L.</given-names>
            <surname>Chiu</surname>
          </string-name>
          , T.-Y. Chen,
          <article-title>Unicompass: Helping high school students ifnd the right college major</article-title>
          ,
          <source>in: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A. M. A.</given-names>
            <surname>Al</surname>
          </string-name>
          ,
          <article-title>Prototype rule-based expert system with an object-oriented database for university undergraduate major selection</article-title>
          ,
          <source>International Journal of Applied Information Systems (IJAIS) Foundation of Computer Science FCS</source>
          , New York, USA (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Jawhar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Bitar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jawhar</surname>
          </string-name>
          , Aipowered customized university and career guidance, in: 2024 Intermountain Engineering,
          <source>Technology and Computing (IETC)</source>
          , IEEE,
          <year>2024</year>
          , pp.
          <fpage>157</fpage>
          -
          <lpage>161</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mirri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Prandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parisini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Amico</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bracuto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Salomoni</surname>
          </string-name>
          ,
          <article-title>User-driven and open innovation as app design tools for high school students</article-title>
          ,
          <source>in: 2018 IEEE 29th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC)</source>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>6</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piktus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Küttler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , W.-t. Yih,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          , et al.,
          <article-title>Retrieval-augmented generation for knowledge-intensive nlp tasks</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>9459</fpage>
          -
          <lpage>9474</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T. I.</given-names>
            <surname>Dieing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schefler</surname>
          </string-name>
          , L. Cohausz,
          <article-title>Enhancing chatbot-assisted study program orientation</article-title>
          ,
          <source>in: Proceedings of DELFI Workshops</source>
          <year>2024</year>
          , Gesellschaft für Informatik eV,
          <year>2024</year>
          , pp.
          <fpage>10</fpage>
          -
          <lpage>18420</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Saha</surname>
          </string-name>
          , U. Saha,
          <article-title>Enhancing international graduate student experience through ai-driven support systems: A llm and rag-based approach</article-title>
          , in: 2024
          <source>International Conference on Data Science and Its Applications (ICoDSA)</source>
          , IEEE,
          <year>2024</year>
          , pp.
          <fpage>300</fpage>
          -
          <lpage>304</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dakshit</surname>
          </string-name>
          ,
          <article-title>Faculty perspectives on the potential of rag in computer science higher education</article-title>
          ,
          <source>in: Proceedings of the 25th Annual Conference on Information Technology Education</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>H.</given-names>
            <surname>Modran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. C.</given-names>
            <surname>Bogdan</surname>
          </string-name>
          , D. Ursut, iu, C. Samoila,
          <string-name>
            <given-names>P. L.</given-names>
            <surname>Modran</surname>
          </string-name>
          ,
          <article-title>Llm intelligent agent tutoring in higher education courses using a rag approach</article-title>
          ,
          <year>Preprints 2024</year>
          2024070519 (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lian</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z</surname>
          </string-name>
          . Liu, M3
          <article-title>-embedding: Multi-linguality, multifunctionality, multi-granularity text embeddings through self-knowledge distillation</article-title>
          , in: L.
          <string-name>
            <surname>-W. Ku</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Martins</surname>
          </string-name>
          , V. Srikumar (Eds.),
          <source>Findings of the Association for Computational Linguistics: ACL</source>
          <year>2024</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Bangkok, Thailand,
          <year>2024</year>
          , pp.
          <fpage>2318</fpage>
          -
          <lpage>2335</lpage>
          . URL: https: / / a c l
          <article-title>a n t h o l o g y . o r g / 2 0 2 4 . fi n d i n g s- a c l . 1 3 7 /</article-title>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          .findings-acl.
          <volume>137</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Chirkova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Rau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Déjean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Formal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Clinchant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Nikoulina</surname>
          </string-name>
          ,
          <article-title>Retrieval-augmented generation in multilingual settings</article-title>
          ,
          <source>arXiv preprint arXiv:2407.01463</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>