=Paper=
{{Paper
|id=Vol-3938/ELEARNING_paper_1
|storemode=property
|title=None
|pdfUrl=https://ceur-ws.org/Vol-3938/Paper_1.pdf
|volume=Vol-3938
|authors=Sonsoles López-Pernas,Ibrahim Belayachi,Hesham Ahmed,Ramy Elmoazen,Mohammed Saqr
}}
==None==
Augmenting AI with Curated Learning Analytics
Literature: Building and Initial Exploration of a Local RAG
for Supporting Teachers (LARAG)
Sonsoles López-Pernas1*, Ibrahim Belayachi1,2, †, Hesham Ahmed1, †, Ramy Elmoazen1, † and
Mohammed Saqr1†
1 University of Eastern Finland, Yliopistokatu 2, 80100 Joensuu, Finland
2 Université de Technologie de Compiègne, R. du docteur Schweitzer CS 60319, 60203 Compiègne Cedex France
Abstract
Though LLMs have completely taken the world by storm, their use in academic settings still faces significant
challenges. One of these challenges is that LLMs sometimes “hallucinate” when they do not have the
necessary information to reply to the user prompt and, even when they do, they fail to provide a trusted
source to back up their claims. In this article, we explore the use of retrieval-augmented generation (RAG)
as a way to overcome the aforementioned limitation and enable evidence-based LLM-generated insights.
Specifically, we provide the results of our initial exploration of LARAG, a RAG-based system aimed at
providing learning analytics recommendations based on the existing literature. Our initial impressions
about the system are that it may offer some benefits over traditional LLMs. However, these initial benefits
are far from groundbreaking or very accurate.
Keywords 2
large language models (LLMs), retrieval-augmented generation (RAG), generative artificial intelligence,
learning analytics
1. Introduction
Large Language Models (LLMs) are deep-learning algorithms that have been trained on massive
amounts of textual data. They contain numerous parameters that enable them to model and predict
text [1]. Such characteristics allow LLMs to handle practically any topic ever discussed without any
additional training. LLMs have been used in education in diverse ways, for example, as a way to
personalized tutoring, ease content creation, and provide automated scoring and feedback [2].
However, among the worries about LLMs' use in education is that LLMs sometimes struggle with
factual accuracy [3] and fail to provide a source for the information they present [4]. Researchers
have come up with a solution to ensure that LLMs draw their information from a trusted source:
Retrieval-Augmented Generation (RAG). RAG consists in supplementing an LLM (Large Language
Model), which already has a vast text database for generating text in response to various user
requests, with additional files to provide context [5]. This process enables an already powerful LLM
to draw on external documents to contextualize and optimize responses, without having to re-train
the model each time it is used. Moreover, the responses provided by the RAG will be more reliable
and accompanied by sources, unlike LLMs which, due to their obligation to always generate an
answer, can sometimes give false information when they do not have the necessary training. In terms
of cost, training a RAG is much more advantageous, with a less time-consuming and less
computationally intensive process than re-training an LLM [6].
Due to the novelty of RAG, its use in academic settings has been so far scarce with few studies
examining its potential. An example is the work by Lee [7], who developed a RAG-based tutor
Proceedings for the 15th International Conference on e-Learning 2024, September 26-27, 2024, Belgrade, Serbia
∗ Corresponding author.
†
These authors contributed equally
sonsoles.lopez@uef.fi (S. López-Pernas); ibrahim.belayachi@etu.utc.fr (I. Belayachi);
hesham.ahmed@uef.fi (H. Ahmed); ramy.elmoazen@uef.fi (R. Elmoazen); mohammed.saqr@uef.fi (M. Saqr)
0000-0002-9621-1392 (S. López-Pernas); 0000-0002-5792-1340 (R. Elmoazen); 0000-0001-5881-3109 ((M.
Saqr))
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
capable of providing correct and relevant answers regarding R programming. Similarly, Henkel et
al. [8] used a RAG-based system to generate responses to students’ math questions based on a math
textbook. Zhong et al. [9] developed a RAG-augmented generative artificial intelligence agent
capable of effectively fostering learners’ performance in collaborative problem-solving. Lastly, Yan
et al. [10] used RAG to enhance learning analytics dashboards with contextualized explanations.
In this article, we provide an initial look at how the RAG system —augmented with curated up-
to-date learning analytics research— can guide teachers to obtain evidence-based insights and
recommendations in learning analytics. The system was built at the University of Eastern Finland
(UEF) and was named LARAG (referring to Learning Analytics RAG). It is worth noting that this is
an initial exploration that aims at demonstrating the system rather than an exhaustive evaluation,
which will follow in an extended paper.
2. The RAG process
RAG consists of two main phases: (1) document indexing, and (2) search and answer generation (see
Figure 1). In the first phase —document indexing—, the user uploads the curated documents to be
indexed by the RAG system. In our cases, we systematically curated all empirical learning analytics
research that attempted to predict student performance. These files are typically text-based and
should be as structured as possible to aid the next steps of analysis. The input files are then divided
into smaller chunks (e.g., paragraphs or sentences). Each chunk is converted into a high-dimensional
vector (e.g., 768 or 1024 dimensions) using an embedding model [5], typically based on Transformer
architectures (such as BERT). The embeddings capture the semantic content of each chunk, enabling
similarity comparison between chunks and user queries. The embeddings are stored in a vector
database, optimized for similarity search. Some examples of vector databases are Pinecone, Weaviate,
Faiss, and Milvus [11]. The vector database supports efficient search algorithms such as the
Approximate Nearest Neighbor (ANN) search, which finds vectors closest to the query vector.
Metadata (e.g., source document ID or section) is often stored alongside each vector, allowing to
trace back the source of the retrieved chunks at the time of answer generation.
Figure 1: RAG process
The second phase of RAG —search and answer generation— starts by the user inputting a query
—i.e., prompt—, similar to interacting with any LLM. The query is vectorized using the same
embedding model that was used on the document chunks so that the query’s vector representation
is in the same semantic space as the chunk vectors, thus, enabling comparison. The query vector is
matched against chunk vectors in the vector database using similarity measures like cosine similarity
or dot product. These metrics calculate the closeness of vectors based on the angle or magnitude in
vector space. The search algorithm efficiently finds the top k closest vectors (relevant chunks) to the
query vector, returning the most contextually relevant document parts. The retrieved chunks (top k
closest matches) are provided to the LLM for answer synthesis. These chunks are often concatenated
or structured for the LLM to interpret them as context.
The LLM generates a response by using the query and relevant chunks as context. The model
synthesizes an answer based on both the question and the retrieved information, which results in
accurate and contextually appropriate responses. The final output includes the answer synthesized
by the LLM, often along with citations or references to the original documents. Thus, using RAGs,
as opposed to only LLMs, increases transparency and trust, since users know where the factual
information came from.
3. Initial Look at LARAG to Explore How It Generates Evidence-based Insights
in Learning analytics
In this section, we describe our initial results regarding the suitability of using LARAG to generate
evidence-based insights following the latest learning analytics literature. The purpose is to enable
teachers, administrators, and even students themselves —i.e., those who are not familiar with
research practices and results— to obtain rigorous recommendations based on existing literature. In
this way, we hope to contribute to bridging the gap between learning analytics research and practice.
To achieve this goal, we used an open-source RAG system —Kotaemon [12]— that supports both
phases of the RAG process: the document upload, and the querying of the documents using a chat-
like format. Kotaemon is built with Python and integrates technologies like LangChain for
orchestrating the RAG pipeline; ChromaDB as a vector database to store and retrieve document
embeddings, and supports LLMs from multiple sources such as OpenAI, Azure or local (open source)
LLMs such as Ollama. Moreover, Kotaemon provides a web user interface based on Gradio that
allows both the uploading of documents and the interaction with them through chat.
After installing and deploying Kotaemon, we uploaded a corpus of learning analytics articles
using Kotaemon’s web interface. Specifically, we provided the full text of 136 articles that cover the
topic of predictive learning analytics since this is one of the central themes of learning analytics that
is of interest to researchers and practitioners alike. To find all relevant articles about this topic, we
first downloaded all the related literature reviews using the following query: (predictive AND learning
AND analytics AND systematic AND literature AND review), limited to after 2011, and journal and
conference papers. This step resulted in 13 reviews. We obtained the references of each review,
which amounted to 1517 (1427 after duplicate removal). We screened all studies and included only
those that predict a target variable (e.g., engagement, achievement, dropout), contain the word
predict* in the title, abstract, or keywords, rely on student data, and create a predictive model to
predict the target variable. As a final result, 136 papers were included and their respective PDF files
were downloaded and loaded to LARAG.
Once the documents are loaded, the LARAG system is ready to respond to related questions. We
chose Ollama as our LLM since it is an open-source solution that achieves very high accuracy at the
level of commercial solutions. Figure 2 shows the web interface of LARAG. The left side of the
interface allows choosing which documents are used to retrieve the responses. The middle part
contains the conversation prompts and responses. The right side showcases the documents that were
used to generate the response.
To evaluate the suitability of LARAG to provide evidence-based insights compared to a usual
LLM (without context augmentation), we used the same query to prompt our system with and
without RAG. Table 1 shows the prompt and the results with and without context augmentation
from the uploaded articles.
Figure 2: LARAG web interface provided by Kotaemon
Table 1
Prompt and results with and without RAG
Prompt
According to the learning analytics literature, which engagement indicators are predictive of student academic
performance?
Without RAG With RAG
According to the learning analytics literature, several engagement According to the learning analytics literature,
indicators have been found to predict student academic performance. the following types of engagement indicators
These include: are predictive of student academic
1. Classroom participation: Studies have shown that students performance:
who actively participate in class discussions and activities tend 1. Participation indicators:
to perform better academically. * Number of discussion posts viewed
2. Homework completion rates: Students who regularly * Number of content pages viewed
complete their homework assignments tend to demonstrate * Time spent viewing discussion
higher levels of academic success. pages
3. Online activity metrics: Engagement with online learning 2. Interpersonal interaction
platforms, such as logging in frequently or completing online indicators:
activities, has been linked to improved academic performance. * Participation in online discussions
4. Self-reported engagement measures: Students' self-reports * Interactions with peers (e.g.,
of their motivation and engagement have been found to be messaging, commenting)
strong predictors of academic success. 3. Learning outcomes indicators:
5. Screen time and session duration: Research has suggested * Grade on assignments and exams
that students who spend more time on specific learning * Completion of learning materials
platforms or screens tend to demonstrate better academic and activities
performance. 4. Frequency and consistency of
6. Quiz and exam scores: Consistently high scores on quizzes logins indicators:
and exams are robust indicators of student engagement and * Frequency of LMS login attempts
academic achievement. * Consistency of usage patterns
7. Social presence metrics: Measures of students' social These engagement indicators have been
interactions and engagement with peers through online shown to be predictive of student academic
platforms have been linked to improved academic outcomes. performance, including grades, course
It is essential to note that the specific engagement indicators used can completion rates, and overall success in
vary depending on the context, discipline, and population being online courses.
studied.
We evaluated the quality of both responses based on the following criteria: accuracy, specificity,
justification, usefulness, implementability, coherence, and equity. These criteria are based on different
user-centric evaluation frameworks used to evaluate recommendation systems in general and
conversational-based ones specifically [13]–[17]. Accuracy refers to how aligned is the output to the
desired request; specificity refers to how precise and specific there response is; justification refers to
the provision of an explanation as to why the output was provided; usefulness denotes the utility of
the output; implementability refers to the ease of which a recommendation is applied, coherence refers
to the level of the consistency between the different parts of the output, and equity describes the
fairness in resources and support distribution to account for imbalances and differences.
Based on these criteria, it seems that both responses point to comparably accurate engagement
metrics that are known to be predictive of student academic achievement. The RAG response uses
terminology that is more aligned with learning analytics research, whereas the non-RAG response
uses general concepts that may be more understandable to a general audience. As such, the RAG
response contains much more specific metrics to represent similar constructs to what the non-RAG
response proposed. For instance, the response without RAG pointed to “social presence” broadly as
a relevant engagement indicator, while the RAG-based response went one step further and proposed
specific quantitative metrics: “number of discussion posts viewed”, or “time spent viewing discussion
pages”.
Both responses justify the results by broadly declaring “according to the learning analytics
literature”, although they may be simply mimicking the user prompt. An advantage of RAG is that
we can see the actual sources used to generate the RAG response (see Figure 2), whereas we cannot
access that information for the non-RAG response. However, although the RAG points to the part of
the paper that was used to generate the response, the RAG did not provide a direct citation for each
engagement metric pointing to which articles were consulted to generate that specific
recommendation.
Regarding usefulness, the non-RAG response contains more general —or shall we say generic—
indicators that can be applied broadly to more learning contexts whereas the RAG response was
tailored to online learning (even though that was not specified in the prompt). Nonetheless, the RAG
response points to specific quantitative indicators that have been empirically proven to be associated
with student achievement in the specific context of online or blended learning. In turn, the non-RAG
response includes more broad metrics that may or may not be associated with performance
depending on their operationalization. Perhaps for a non-expert audience the non-RAG response is
more useful to understand what the engagement indicators that need to be collected represent,
though the RAG response points to the specific operationalization thereof.
In terms of implementability, the RAG response suggested mostly engagement metrics that can
be easily derived from unobtrusive data collection through the Learning Management System (LMS).
The non-RAG response is mostly unclear in terms of how to calculate the metrics; although some
also involve using LMS data, others are more costly, such as using self-reports (there is no specific
instrument suggested for this).
When it comes to coherence, both responses contain significant overlap in the metrics suggested.
For instance, the non-RAG response suggested measuring “online activity” and “session duration”,
whereas one could argue that session duration is a measure of online activity. The RAG response is
better structured in categories and related metrics, although some of these categories somewhat
overlap, such as “participation indicators” and “interpersonal interaction indicators”. However,
neither of the two responses included contradictory recommendations and both were mostly
coherent.
Lastly, regarding equity, neither of the two responses explicitly mentions the possible bias in this
data collection or the existence of individual differences that need to be taken into account. However,
the non-RAG response did point out that “the specific engagement indicators used can vary
depending on the context, discipline, and population being studied” highlighting that these indicators
do not generalize to all contexts, which is one of the main findings of learning analytics literature on
performance prediction [18]–[20].
In summary, the RAG responses are of higher quality overall, providing more accurate,
implementable, and structured insights relevant to learning analytics. The non-RAG response might
appeal to a broader audience but lacks precision and implementability in an educational context.
4. Reflection and Future Directions
In this exploratory paper, we introduced our initial thoughts on LARAG, a local RAG system built
with curated learning analytics research at the University of Eastern Finland. The system is baked
on open-source software that is easy to build and customize. The curated learning analytics research
was highly relevant and represents the current state of the art. Our initial impressions about the
system are that it may offer some benefits over traditional LLMs, namely more specific and easy-to-
implement recommendations. However, these initial benefits are far from groundbreaking or very
accurate. For instance, although the RAG points to the specific chunks that were used to generate
the response, it does not map specific parts of the response to these chunks (i.e., citations).
In the future, we aim to offer a systematic evaluation across several research domains and relevant
questions and evaluate the system using a rigorous methodology. The LARAG system described is
our initial build and is still in an experimental stage. Future systems are planned to have far larger
datasets that contain massive educational datasets.
Acknowledgments
This article has been co-funded by the European Commission through the ISILA (2023-1-FI01-
KA220-HED-000159757) project.
Declaration on Generative AI
During the preparation of this work, the author(s) used Ollama, Koteamon for Table 1 at the page 8.
After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and
take(s) full responsibility for the publication’s content.
References
[1] M. McDonough, “Large language model,” Encyclopedia Britannica, 22-Jan-2024. [Online].
Available: https://www.britannica.com/topic/large-language-model. [Accessed: 05-Nov-2024].
[2] E. Kasneci et al., “ChatGPT for good? On opportunities and challenges of large language
models for education,” Learn. Individ. Differ., vol. 103, no. 102274, p. 102274, Apr. 2023.
[3] I. Augenstein et al., “Factuality challenges in the era of large language models and
opportunities for fact-checking,” Nat. Mach. Intell., vol. 6, no. 8, pp. 852–863, Aug. 2024.
[4] D. Peskoff and B. Stewart, “Credible without credit: Domain experts assess generative
language models,” in Proceedings of the 61st Annual Meeting of the Association for
Computational Linguistics (Volume 2: Short Papers), Toronto, Canada, 2023, pp. 427–438.
[5] P. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,”
Advances in Neural Information Processing Systems, vol. 33, pp. 9459–9474, 2020.
[6] H. Soudani, E. Kanoulas, and F. Hasibi, “Fine tuning vs. Retrieval Augmented Generation for
less popular knowledge,” arXiv [cs.CL], 03-Mar-2024.
[7] Y. Lee, “Developing a computer-based tutor utilizing Generative Artificial Intelligence (GAI)
and Retrieval-Augmented Generation (RAG),” Educ. Inf. Technol., pp. 1–22, Nov. 2024.
[8] O. Henkel, Z. Levonian, C. Li, and M. Postle, “Retrieval-augmented generation to improve
math question-answering: Trade-offs between groundedness and human preference,”
Proceedings of the 17th International Conference on Educational Data Mining. International
Educational Data Mining Society, pp. 315–320, 2024.
[9] X. Zhong, H. Xin, W. Li, Z. Zhan, and M.-H. Cheng, “The Design and application of RAG-
based conversational agents for collaborative problem solving,” in Proceedings of the 2024 9th
International Conference on Distance Education and Learning, Guangzhou China, 2024, pp. 62–
68.
[10] L. Yan et al., “VizChat: Enhancing learning analytics dashboards with contextualised
explanations using multimodal generative AI chatbots,” in Lecture Notes in Computer Science,
Cham: Springer Nature Switzerland, 2024, pp. 180–193.
[11] P. N. Singh, S. Talasila, and S. V. Banakar, “Analyzing embedding models for embedding
vectors in vector databases,” in 2023 IEEE International Conference on ICT in Business Industry
& Government (ICTBIG), Indore, India, 2023, pp. 1–7.
[12] Cinnamon, kotaemon: An open-source RAG-based tool for chatting with your documents. Github,
2024.
[13] P. Pu, L. Chen, and R. Hu, “A user-centric evaluation framework for recommender systems,”
in Proceedings of the fifth ACM conference on Recommender systems, Chicago Illinois USA, 2011.
[14] H. Kunstmann, J. Ollier, J. Persson, and F. von Wangenheim, “EventChat: Implementation and
user-centric evaluation of a large language model-driven conversational recommender system
for exploring leisure events in an SME context,” arXiv [cs.IR], 05-Jul-2024.
[15] Y. Jin, L. Chen, W. Cai, and X. Zhao, “CRS-Que : A user-centric evaluation framework for
conversational recommender systems,” ACM Trans. Recomm. Syst., Nov. 2023.
[16] P. Jurado de Los Santos, A.-J. Moreno-Guerrero, J.-A. Marín-Marín, and R. Soler Costa, “The
term equity in education: A literature review with scientific mapping in web of science,” Int. J.
Environ. Res. Public Health, vol. 17, no. 10, p. 3526, May 2020.
[17] D. Jannach, “Evaluating conversational recommender systems: A landscape of research,” Artif.
Intell. Rev., vol. 56, no. 3, pp. 2365–2400, Mar. 2023.
[18] R. Conijn, C. Snijders, A. Kleingeld, and U. Matzat, “Predicting Student Performance from LMS
Data: A Comparison of 17 Blended Courses Using Moodle LMS,” IEEE Trans. Learn. Technol.,
vol. 10, no. 1, pp. 17–29, Jan. 2017.
[19] M. Saqr, J. Jovanović, O. Viberg, and D. Gašević, “Is there order in the mess? A single paper
meta-analysis approach to identification of predictors of success in learning analytics,” Studies
in Higher Education, vol. 47, no. 12, pp. 2370–2391, Dec. 2022.
[20] J. Jovanovic, S. López-Pernas, and M. Saqr, “Predictive modelling in learning analytics: A
machine learning approach in R,” in Learning Analytics Methods and Tutorials, Cham: Springer
Nature Switzerland, 2024, pp. 197–229.