Augmenting AI with Curated Learning Analytics
                                Literature: Building and Initial Exploration of a Local RAG
                                for Supporting Teachers (LARAG)
                                Sonsoles López-Pernas1*, Ibrahim Belayachi1,2, †, Hesham Ahmed1, †, Ramy Elmoazen1, † and
                                Mohammed Saqr1†
                                1 University of Eastern Finland, Yliopistokatu 2, 80100 Joensuu, Finland
                                2 Université de Technologie de Compiègne, R. du docteur Schweitzer CS 60319, 60203 Compiègne Cedex France


                                               Abstract
                                               Though LLMs have completely taken the world by storm, their use in academic settings still faces significant
                                               challenges. One of these challenges is that LLMs sometimes “hallucinate” when they do not have the
                                               necessary information to reply to the user prompt and, even when they do, they fail to provide a trusted
                                               source to back up their claims. In this article, we explore the use of retrieval-augmented generation (RAG)
                                               as a way to overcome the aforementioned limitation and enable evidence-based LLM-generated insights.
                                               Specifically, we provide the results of our initial exploration of LARAG, a RAG-based system aimed at
                                               providing learning analytics recommendations based on the existing literature. Our initial impressions
                                               about the system are that it may offer some benefits over traditional LLMs. However, these initial benefits
                                               are far from groundbreaking or very accurate.

                                               Keywords 2
                                               large language models (LLMs), retrieval-augmented generation (RAG), generative artificial intelligence,
                                               learning analytics

                                1. Introduction
                                Large Language Models (LLMs) are deep-learning algorithms that have been trained on massive
                                amounts of textual data. They contain numerous parameters that enable them to model and predict
                                text [1]. Such characteristics allow LLMs to handle practically any topic ever discussed without any
                                additional training. LLMs have been used in education in diverse ways, for example, as a way to
                                personalized tutoring, ease content creation, and provide automated scoring and feedback [2].
                                   However, among the worries about LLMs' use in education is that LLMs sometimes struggle with
                                factual accuracy [3] and fail to provide a source for the information they present [4]. Researchers
                                have come up with a solution to ensure that LLMs draw their information from a trusted source:
                                Retrieval-Augmented Generation (RAG). RAG consists in supplementing an LLM (Large Language
                                Model), which already has a vast text database for generating text in response to various user
                                requests, with additional files to provide context [5]. This process enables an already powerful LLM
                                to draw on external documents to contextualize and optimize responses, without having to re-train
                                the model each time it is used. Moreover, the responses provided by the RAG will be more reliable
                                and accompanied by sources, unlike LLMs which, due to their obligation to always generate an
                                answer, can sometimes give false information when they do not have the necessary training. In terms
                                of cost, training a RAG is much more advantageous, with a less time-consuming and less
                                computationally intensive process than re-training an LLM [6].
                                   Due to the novelty of RAG, its use in academic settings has been so far scarce with few studies
                                examining its potential. An example is the work by Lee [7], who developed a RAG-based tutor


                                Proceedings for the 15th International Conference on e-Learning 2024, September 26-27, 2024, Belgrade, Serbia
                                ∗ Corresponding author.
                                †
                                 These authors contributed equally
                                  sonsoles.lopez@uef.fi (S. López-Pernas); ibrahim.belayachi@etu.utc.fr (I. Belayachi);
                                hesham.ahmed@uef.fi (H. Ahmed); ramy.elmoazen@uef.fi (R. Elmoazen); mohammed.saqr@uef.fi (M. Saqr)
                                   0000-0002-9621-1392 (S. López-Pernas); 0000-0002-5792-1340 (R. Elmoazen); 0000-0001-5881-3109 ((M.
                                Saqr))
                                            © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
capable of providing correct and relevant answers regarding R programming. Similarly, Henkel et
al. [8] used a RAG-based system to generate responses to students’ math questions based on a math
textbook. Zhong et al. [9] developed a RAG-augmented generative artificial intelligence agent
capable of effectively fostering learners’ performance in collaborative problem-solving. Lastly, Yan
et al. [10] used RAG to enhance learning analytics dashboards with contextualized explanations.
    In this article, we provide an initial look at how the RAG system —augmented with curated up-
to-date learning analytics research— can guide teachers to obtain evidence-based insights and
recommendations in learning analytics. The system was built at the University of Eastern Finland
(UEF) and was named LARAG (referring to Learning Analytics RAG). It is worth noting that this is
an initial exploration that aims at demonstrating the system rather than an exhaustive evaluation,
which will follow in an extended paper.

2. The RAG process
RAG consists of two main phases: (1) document indexing, and (2) search and answer generation (see
Figure 1). In the first phase —document indexing—, the user uploads the curated documents to be
indexed by the RAG system. In our cases, we systematically curated all empirical learning analytics
research that attempted to predict student performance. These files are typically text-based and
should be as structured as possible to aid the next steps of analysis. The input files are then divided
into smaller chunks (e.g., paragraphs or sentences). Each chunk is converted into a high-dimensional
vector (e.g., 768 or 1024 dimensions) using an embedding model [5], typically based on Transformer
architectures (such as BERT). The embeddings capture the semantic content of each chunk, enabling
similarity comparison between chunks and user queries. The embeddings are stored in a vector
database, optimized for similarity search. Some examples of vector databases are Pinecone, Weaviate,
Faiss, and Milvus [11]. The vector database supports efficient search algorithms such as the
Approximate Nearest Neighbor (ANN) search, which finds vectors closest to the query vector.
Metadata (e.g., source document ID or section) is often stored alongside each vector, allowing to
trace back the source of the retrieved chunks at the time of answer generation.


Figure 1: RAG process

   The second phase of RAG —search and answer generation— starts by the user inputting a query
—i.e., prompt—, similar to interacting with any LLM. The query is vectorized using the same
embedding model that was used on the document chunks so that the query’s vector representation
is in the same semantic space as the chunk vectors, thus, enabling comparison. The query vector is
matched against chunk vectors in the vector database using similarity measures like cosine similarity
or dot product. These metrics calculate the closeness of vectors based on the angle or magnitude in
vector space. The search algorithm efficiently finds the top k closest vectors (relevant chunks) to the
query vector, returning the most contextually relevant document parts. The retrieved chunks (top k
closest matches) are provided to the LLM for answer synthesis. These chunks are often concatenated
or structured for the LLM to interpret them as context.
    The LLM generates a response by using the query and relevant chunks as context. The model
synthesizes an answer based on both the question and the retrieved information, which results in
accurate and contextually appropriate responses. The final output includes the answer synthesized
by the LLM, often along with citations or references to the original documents. Thus, using RAGs,
as opposed to only LLMs, increases transparency and trust, since users know where the factual
information came from.

3. Initial Look at LARAG to Explore How It Generates Evidence-based Insights
   in Learning analytics
In this section, we describe our initial results regarding the suitability of using LARAG to generate
evidence-based insights following the latest learning analytics literature. The purpose is to enable
teachers, administrators, and even students themselves —i.e., those who are not familiar with
research practices and results— to obtain rigorous recommendations based on existing literature. In
this way, we hope to contribute to bridging the gap between learning analytics research and practice.
    To achieve this goal, we used an open-source RAG system —Kotaemon [12]— that supports both
phases of the RAG process: the document upload, and the querying of the documents using a chat-
like format. Kotaemon is built with Python and integrates technologies like LangChain for
orchestrating the RAG pipeline; ChromaDB as a vector database to store and retrieve document
embeddings, and supports LLMs from multiple sources such as OpenAI, Azure or local (open source)
LLMs such as Ollama. Moreover, Kotaemon provides a web user interface based on Gradio that
allows both the uploading of documents and the interaction with them through chat.
    After installing and deploying Kotaemon, we uploaded a corpus of learning analytics articles
using Kotaemon’s web interface. Specifically, we provided the full text of 136 articles that cover the
topic of predictive learning analytics since this is one of the central themes of learning analytics that
is of interest to researchers and practitioners alike. To find all relevant articles about this topic, we
first downloaded all the related literature reviews using the following query: (predictive AND learning
AND analytics AND systematic AND literature AND review), limited to after 2011, and journal and
conference papers. This step resulted in 13 reviews. We obtained the references of each review,
which amounted to 1517 (1427 after duplicate removal). We screened all studies and included only
those that predict a target variable (e.g., engagement, achievement, dropout), contain the word
predict* in the title, abstract, or keywords, rely on student data, and create a predictive model to
predict the target variable. As a final result, 136 papers were included and their respective PDF files
were downloaded and loaded to LARAG.
    Once the documents are loaded, the LARAG system is ready to respond to related questions. We
chose Ollama as our LLM since it is an open-source solution that achieves very high accuracy at the
level of commercial solutions. Figure 2 shows the web interface of LARAG. The left side of the
interface allows choosing which documents are used to retrieve the responses. The middle part
contains the conversation prompts and responses. The right side showcases the documents that were
used to generate the response.
    To evaluate the suitability of LARAG to provide evidence-based insights compared to a usual
LLM (without context augmentation), we used the same query to prompt our system with and
without RAG. Table 1 shows the prompt and the results with and without context augmentation
from the uploaded articles.
Figure 2: LARAG web interface provided by Kotaemon


Table 1
Prompt and results with and without RAG
 Prompt
 According to the learning analytics literature, which engagement indicators are predictive of student academic
 performance?
 Without RAG                                                                 With RAG
 According to the learning analytics literature, several engagement          According to the learning analytics literature,
 indicators have been found to predict student academic performance.         the following types of engagement indicators
 These include:                                                              are predictive of student academic
     1. Classroom participation: Studies have shown that students            performance:
        who actively participate in class discussions and activities tend       1. Participation indicators:
        to perform better academically.                                              * Number of discussion posts viewed
     2. Homework completion rates: Students who regularly                            * Number of content pages viewed
        complete their homework assignments tend to demonstrate                      * Time spent viewing discussion
        higher levels of academic success.                                           pages
     3. Online activity metrics: Engagement with online learning                2. Interpersonal interaction
        platforms, such as logging in frequently or completing online                indicators:
        activities, has been linked to improved academic performance.                * Participation in online discussions
     4. Self-reported engagement measures: Students' self-reports                    * Interactions with peers (e.g.,
        of their motivation and engagement have been found to be                     messaging, commenting)
        strong predictors of academic success.                                  3. Learning outcomes indicators:
     5. Screen time and session duration: Research has suggested                     * Grade on assignments and exams
        that students who spend more time on specific learning                       * Completion of learning materials
        platforms or screens tend to demonstrate better academic                     and activities
        performance.                                                            4. Frequency and consistency of
     6. Quiz and exam scores: Consistently high scores on quizzes                    logins indicators:
        and exams are robust indicators of student engagement and                    * Frequency of LMS login attempts
        academic achievement.                                                        * Consistency of usage patterns
     7. Social presence metrics: Measures of students' social                These engagement indicators have been
        interactions and engagement with peers through online                shown to be predictive of student academic
        platforms have been linked to improved academic outcomes.            performance, including grades, course
 It is essential to note that the specific engagement indicators used can    completion rates, and overall success in
 vary depending on the context, discipline, and population being             online courses.
 studied.


   We evaluated the quality of both responses based on the following criteria: accuracy, specificity,
justification, usefulness, implementability, coherence, and equity. These criteria are based on different
user-centric evaluation frameworks used to evaluate recommendation systems in general and
conversational-based ones specifically [13]–[17]. Accuracy refers to how aligned is the output to the
desired request; specificity refers to how precise and specific there response is; justification refers to
the provision of an explanation as to why the output was provided; usefulness denotes the utility of
the output; implementability refers to the ease of which a recommendation is applied, coherence refers
to the level of the consistency between the different parts of the output, and equity describes the
fairness in resources and support distribution to account for imbalances and differences.
    Based on these criteria, it seems that both responses point to comparably accurate engagement
metrics that are known to be predictive of student academic achievement. The RAG response uses
terminology that is more aligned with learning analytics research, whereas the non-RAG response
uses general concepts that may be more understandable to a general audience. As such, the RAG
response contains much more specific metrics to represent similar constructs to what the non-RAG
response proposed. For instance, the response without RAG pointed to “social presence” broadly as
a relevant engagement indicator, while the RAG-based response went one step further and proposed
specific quantitative metrics: “number of discussion posts viewed”, or “time spent viewing discussion
pages”.
    Both responses justify the results by broadly declaring “according to the learning analytics
literature”, although they may be simply mimicking the user prompt. An advantage of RAG is that
we can see the actual sources used to generate the RAG response (see Figure 2), whereas we cannot
access that information for the non-RAG response. However, although the RAG points to the part of
the paper that was used to generate the response, the RAG did not provide a direct citation for each
engagement metric pointing to which articles were consulted to generate that specific
recommendation.
    Regarding usefulness, the non-RAG response contains more general —or shall we say generic—
indicators that can be applied broadly to more learning contexts whereas the RAG response was
tailored to online learning (even though that was not specified in the prompt). Nonetheless, the RAG
response points to specific quantitative indicators that have been empirically proven to be associated
with student achievement in the specific context of online or blended learning. In turn, the non-RAG
response includes more broad metrics that may or may not be associated with performance
depending on their operationalization. Perhaps for a non-expert audience the non-RAG response is
more useful to understand what the engagement indicators that need to be collected represent,
though the RAG response points to the specific operationalization thereof.
    In terms of implementability, the RAG response suggested mostly engagement metrics that can
be easily derived from unobtrusive data collection through the Learning Management System (LMS).
The non-RAG response is mostly unclear in terms of how to calculate the metrics; although some
also involve using LMS data, others are more costly, such as using self-reports (there is no specific
instrument suggested for this).
     When it comes to coherence, both responses contain significant overlap in the metrics suggested.
For instance, the non-RAG response suggested measuring “online activity” and “session duration”,
whereas one could argue that session duration is a measure of online activity. The RAG response is
better structured in categories and related metrics, although some of these categories somewhat
overlap, such as “participation indicators” and “interpersonal interaction indicators”. However,
neither of the two responses included contradictory recommendations and both were mostly
coherent.
    Lastly, regarding equity, neither of the two responses explicitly mentions the possible bias in this
data collection or the existence of individual differences that need to be taken into account. However,
the non-RAG response did point out that “the specific engagement indicators used can vary
depending on the context, discipline, and population being studied” highlighting that these indicators
do not generalize to all contexts, which is one of the main findings of learning analytics literature on
performance prediction [18]–[20].
    In summary, the RAG responses are of higher quality overall, providing more accurate,
implementable, and structured insights relevant to learning analytics. The non-RAG response might
appeal to a broader audience but lacks precision and implementability in an educational context.

4. Reflection and Future Directions
In this exploratory paper, we introduced our initial thoughts on LARAG, a local RAG system built
with curated learning analytics research at the University of Eastern Finland. The system is baked
on open-source software that is easy to build and customize. The curated learning analytics research
was highly relevant and represents the current state of the art. Our initial impressions about the
system are that it may offer some benefits over traditional LLMs, namely more specific and easy-to-
implement recommendations. However, these initial benefits are far from groundbreaking or very
accurate. For instance, although the RAG points to the specific chunks that were used to generate
the response, it does not map specific parts of the response to these chunks (i.e., citations).
   In the future, we aim to offer a systematic evaluation across several research domains and relevant
questions and evaluate the system using a rigorous methodology. The LARAG system described is
our initial build and is still in an experimental stage. Future systems are planned to have far larger
datasets that contain massive educational datasets.

Acknowledgments
This article has been co-funded by the European Commission through the ISILA (2023-1-FI01-
KA220-HED-000159757) project.

Declaration on Generative AI
During the preparation of this work, the author(s) used Ollama, Koteamon for Table 1 at the page 8.
After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and
take(s) full responsibility for the publication’s content.

References
[1] M. McDonough, “Large language model,” Encyclopedia Britannica, 22-Jan-2024. [Online].
     Available: https://www.britannica.com/topic/large-language-model. [Accessed: 05-Nov-2024].
[2] E. Kasneci et al., “ChatGPT for good? On opportunities and challenges of large language
     models for education,” Learn. Individ. Differ., vol. 103, no. 102274, p. 102274, Apr. 2023.
[3] I. Augenstein et al., “Factuality challenges in the era of large language models and
     opportunities for fact-checking,” Nat. Mach. Intell., vol. 6, no. 8, pp. 852–863, Aug. 2024.
[4] D. Peskoff and B. Stewart, “Credible without credit: Domain experts assess generative
     language models,” in Proceedings of the 61st Annual Meeting of the Association for
     Computational Linguistics (Volume 2: Short Papers), Toronto, Canada, 2023, pp. 427–438.
[5] P. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,”
     Advances in Neural Information Processing Systems, vol. 33, pp. 9459–9474, 2020.
[6] H. Soudani, E. Kanoulas, and F. Hasibi, “Fine tuning vs. Retrieval Augmented Generation for
     less popular knowledge,” arXiv [cs.CL], 03-Mar-2024.
[7] Y. Lee, “Developing a computer-based tutor utilizing Generative Artificial Intelligence (GAI)
     and Retrieval-Augmented Generation (RAG),” Educ. Inf. Technol., pp. 1–22, Nov. 2024.
[8] O. Henkel, Z. Levonian, C. Li, and M. Postle, “Retrieval-augmented generation to improve
     math question-answering: Trade-offs between groundedness and human preference,”
     Proceedings of the 17th International Conference on Educational Data Mining. International
     Educational Data Mining Society, pp. 315–320, 2024.
[9] X. Zhong, H. Xin, W. Li, Z. Zhan, and M.-H. Cheng, “The Design and application of RAG-
     based conversational agents for collaborative problem solving,” in Proceedings of the 2024 9th
     International Conference on Distance Education and Learning, Guangzhou China, 2024, pp. 62–
     68.
[10] L. Yan et al., “VizChat: Enhancing learning analytics dashboards with contextualised
     explanations using multimodal generative AI chatbots,” in Lecture Notes in Computer Science,
     Cham: Springer Nature Switzerland, 2024, pp. 180–193.
[11] P. N. Singh, S. Talasila, and S. V. Banakar, “Analyzing embedding models for embedding
     vectors in vector databases,” in 2023 IEEE International Conference on ICT in Business Industry
     & Government (ICTBIG), Indore, India, 2023, pp. 1–7.
[12] Cinnamon, kotaemon: An open-source RAG-based tool for chatting with your documents. Github,
     2024.
[13] P. Pu, L. Chen, and R. Hu, “A user-centric evaluation framework for recommender systems,”
     in Proceedings of the fifth ACM conference on Recommender systems, Chicago Illinois USA, 2011.
[14] H. Kunstmann, J. Ollier, J. Persson, and F. von Wangenheim, “EventChat: Implementation and
     user-centric evaluation of a large language model-driven conversational recommender system
     for exploring leisure events in an SME context,” arXiv [cs.IR], 05-Jul-2024.
[15] Y. Jin, L. Chen, W. Cai, and X. Zhao, “CRS-Que : A user-centric evaluation framework for
     conversational recommender systems,” ACM Trans. Recomm. Syst., Nov. 2023.
[16] P. Jurado de Los Santos, A.-J. Moreno-Guerrero, J.-A. Marín-Marín, and R. Soler Costa, “The
     term equity in education: A literature review with scientific mapping in web of science,” Int. J.
     Environ. Res. Public Health, vol. 17, no. 10, p. 3526, May 2020.
[17] D. Jannach, “Evaluating conversational recommender systems: A landscape of research,” Artif.
     Intell. Rev., vol. 56, no. 3, pp. 2365–2400, Mar. 2023.
[18] R. Conijn, C. Snijders, A. Kleingeld, and U. Matzat, “Predicting Student Performance from LMS
     Data: A Comparison of 17 Blended Courses Using Moodle LMS,” IEEE Trans. Learn. Technol.,
     vol. 10, no. 1, pp. 17–29, Jan. 2017.
[19] M. Saqr, J. Jovanović, O. Viberg, and D. Gašević, “Is there order in the mess? A single paper
     meta-analysis approach to identification of predictors of success in learning analytics,” Studies
     in Higher Education, vol. 47, no. 12, pp. 2370–2391, Dec. 2022.
[20] J. Jovanovic, S. López-Pernas, and M. Saqr, “Predictive modelling in learning analytics: A
     machine learning approach in R,” in Learning Analytics Methods and Tutorials, Cham: Springer
     Nature Switzerland, 2024, pp. 197–229.