=Paper= {{Paper |id=Vol-3237/paper-brd |storemode=property |title=Utilizing Interaction Metrics in a Virtual Learning Environment for Early Prediction of Students’ Academic Performance |pdfUrl=https://ceur-ws.org/Vol-3237/paper-brd.pdf |volume=Vol-3237 |authors=Saša Brdnik,Vili Podgorelec,Tjaša Heričko |dblpUrl=https://dblp.org/rec/conf/sqamia/BrdnikPH22 }} ==Utilizing Interaction Metrics in a Virtual Learning Environment for Early Prediction of Students’ Academic Performance== https://ceur-ws.org/Vol-3237/paper-brd.pdf
Utilizing Interaction Metrics in a Virtual Learning
Environment for Early Prediction of Students’
Academic Performance
Saša Brdnik* , Vili Podgorelec and Tjaša Heričko
Faculty of Electrical Engineering and Computer Science, University of Maribor, Koroška cesta 46, Maribor, Slovenia


                                      Abstract
                                      A plethora of data on students’ activities and interactions is acquired through the use of virtual learning
                                      environments. The analysis of gathered data and the prediction of withdrawal and academic success has
                                      been an ongoing point of discussion in the field of Learning Analytics. The main aim of this paper is
                                      an early prediction of students‘ academic performance in terms of the final grade, while also targeting
                                      the early identification of students at risk of failing the course. We introduce new handcrafted features
                                      encompassing interaction patterns of students’ virtual learning environment usage, and highlight their
                                      predictive powers in monthly predictions. The experiment showed that a combination of the proposed
                                      features improved the Root Mean Square Error by up to 7.2 grade points compared to a model without
                                      the proposed features. Improvement of up to 0.52 was also observed in the coefficient of determination.
                                      Proposed model’s accuracy of predicting students at risk of failure reached 74% at the end of April.

                                      Keywords
                                      virtual learning environments, academic performance, at-risk students, interaction behavior, interaction
                                      patterns, machine learning, regression




1. Introduction
Virtual learning environments (VLEs) have become all-important tools for educational institu-
tions in the recent past [1]. By facilitating access to various teaching-learning materials and
enabling learning without spatial or temporal limitations [1, 2], they provide support for both
distance and classroom teaching [3]. Through the use of VLEs, the abundance of data capturing
learners’ activities and interactions is collected from the log records [3, 4]. The gathered data
can be analyzed to mine learners’ behaviors and extract patterns, especially those concerning
distinct groups of learners [4, 5, 6]. Among others, applying such learning analytics helps
identify students possibly at risk of failing to complete the course, which is a major challenge
confronting the education sector [7]. This represents the first step toward addressing the prob-
lem of at-risk students by attaining a meaningful understanding of students’ behavior and their
learning outcomes, as well as assisting in the designing of data-driven corrective strategies and
timely interventions aimed to help improve the students’ academic performance [6, 8, 9, 10].
SQAMIA 2022: Workshop on Software Quality, Analysis, Monitoring, Improvement, and Applications, September 11–14,
2022, Novi Sad, Serbia
*
  Corresponding author.
" sasa.brdnik@um.si (S. Brdnik); vili.podgorelec@um.si (V. Podgorelec); tjasa.hericko@um.si (T. Heričko)
 0000-0001-6955-7868 (V. Podgorelec); 0000-0002-0410-7724 (T. Heričko)
                                    © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)




                                                                                                      2:1
   The analysis of data collected by VLEs has received much research attention in the last
decade, for its potential to impact and improve the teaching-learning process and address signif-
icant challenges in education, including identifying students at risk of failing or withdrawing
from their respective courses. Many research endeavors have applied machine learning-based
approaches, which confirmed that data from VLE activity logs present good predictors for fore-
casting students’ academic performance [1, 3, 4, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16]. Identifying
potentially at-risk students accurately while a course is still in progress is crucial to providing
timely help aimed at minimizing the student failure rate [7, 16]. This paper aims to predict
students’ final grades and risk of failure as early as possible during a course, based on their
interactions with the VLE. New features were also proposed to improve the prediction model.
Motivated by finding significant predictors for the early prediction of student academic success,
we used the Open University Learning Analytics (OULA) dataset to extract students’ interaction
patterns in VLE. We investigated how the interaction patterns captured by the proposed hand-
crafted features relate to academic performance. Then, we approached the prediction problem
from a regression perspective – the final mark in points was predicted for each student, instead
of a reasonably simple pass-fail prediction. The proposed predictive model was constructed
during different points in the course’s timeline from data available until that point, instead of
using all data known after the course had already finished.
   In summary, the key contributions of this work are threefold: (1) We present an insightful
prediction of student academic performance in terms of final mark forecast at a fine-grained
level, i.e., month-wise granularity, focusing on the early identification of students at risk of
failing the course. (2) By introducing new handcrafted features encompassing interaction
patterns of students’ VLE usage, we highlight their predictive powers, and provide an additional
understanding of the characteristics of at-risk students, which builds on findings from previous
work. (3) The proposed features, on average, improve the Root Mean Square Error (RMSE) of
predicted grades by 3.6 in all observed months, with improvement of up to 7.5 grade points in
particular months. Compared to the models without the proposed features, our model improved
the explained proportion of observed variation in grades, with a mean R2 improvement of 0.245.
   The rest of the paper is organized as follows. Section 2 presents a literature review of related
work. The methodology is outlined in Section 3. Section 4 reports the results of the experiment.
In Section 5, the results are discussed and compared to approaches from related work. Section 6
concludes the paper with summarized remarks and proposed future research directions.


2. Literature Review
Predicting students at risk of failing a course with a classification approach has been explored
by [3, 4, 7, 9, 12, 13, 17], namely using C4.5 Decision Tree, K-Nearest Neighbors, Naïve Bayes,
Support Vector Machines, and Neural Networks. A regression approach is less common in
similar studies, and can be observed in [11, 18], most using Logistic Regression. Other focuses
in the field are predicting students at risk of dropout [19, 20, 21], and students who fail to finish
their academic obligations in time [10]. Some studies gathered and presented their datasets
[11, 13, 16, 22, 23], while most have conducted experiments on publicly available datasets
[1, 3, 4, 6, 7, 9, 10, 12, 15] Related work differs in the scope of the predictions, with studies




                                                2:2
focusing on generalized predictions [11, 12, 22] and on specific course predictions [13, 24]. Most
studies analyze data from the entire course timeline [12, 1] , while others focus on predictions
in selected periods [4, 14, 15, 16]. Prediction models generally use students’ demographic data,
VLE logs, assessment scores, and previous academic records. The raw VLE data are in the
form of simple logs. Researchers, therefore, depend on different features extracted from them.
Features vary from studies; the most common are daily or weekly summarizations of activities
in VLE, e.g., the sum of all the daily clicks in a VLE [14], the daily number of sessions [18],
and the number of interactions with each activity type [15]. We observed an opportunity for
exploring new metrics derived from basic log data. Similarly, Mubarak et al. [21] presented
an interesting feature capturing the number of active days per week, while Waheed et al. [6]
included the sum of clicks on different content types before and after the course.
   Classification studies focused on student success (or failure) recognition at different times
in the semester report high accuracy of their predictions. Accuracy is defined as the number
of correct predictions divided by the total number of predictions, while precision refers to the
number of true positives divided by the total number of positive predictions. On the OULA
dataset, Kuzilek et al. [8] conducted weekly predictions of at-risk students, and reported 47%
precision of prediction at the start of the semester and 90% precision at the end of the semester.
Aljohani et al. [4] reported accuracy of around 80% by week 5, 90% by week 10, and 93% by the
end of the semester. Al Azawei et al. [15] conducted daily predictions and reported 70% accuracy
on day 53 (week 9), and 91% accuracy on the day before the end of the semester. Herrmannova
et al. [14] achieved an F-score of 0.80 on week 4 and 0.93 on week 8. Our study differs from
existing work with the presentation of new features obtained from the OULA dataset, and using
regression for student success prediction, offering some more insight into the predicted range of
students’ final grades at different time frames trough the semester. A review of selected related
work using VLE logs for predicting student success is presented in Table 1.

3. Methodology
This study uses an experimental design. The experiment was conducted in Python 3.8.5 in
a Jupyter notebook 3.8.5 environment, using Pandas 1.2.4, Numpy 1.19.2, Sklearn 0.23.2, and
Matplotlib 3.3.2. Statistical analyses were performed using Jamovi 1.2.27.

3.1. Dataset
The OULA dataset [17] was used for experimental analysis. The publicly available dataset
provides anonymized data about courses, students’ demographics, their interaction with a VLE,
and their academic performance. The dataset was gathered by the Open University in the UK in
the academic year 2013/2014. It comprises seven courses, four semesters, and 32,593 students
altogether. For our analysis, the course DDD was selected, as it is the only one with available
final exam scores. Students could obtain grades between 0-100, with 40 set as a treshold for a
positive grade. The course consists of Tutor Marked Assessment (TMA) and a Final Exam, each
worth 50% of the final grade. The TMA grade is calculated from six assessments, weighing 5%,
10%, 10%, 25%, 25%, and 25% of the grade, respectively. The course was organized in the fall
semester, starting in October 2013, with the final exam in May 2014.




                                               2:3
Table 1
Overview of related work
 Study Prediction                     Features                          Time Interval    Results

 [1]    Course pass rate              Demographics, VLE log             Entire course  Acc=78.5%
 [3]    Student success/failure       Demographics, VLE log             Daily          Acc=88%
 [4]    Students at risk of failing   VLE log                           Weekly         AccWeek 5 =80%,
                                                                                       AccWeek 10 =90%
                                                                                       AccWeek 25 =93%
 [6]    Students at risk of failing   Demographics, VLE log             Semester quar- Acc=84%–93%
        and withdrawing                                                 tiles          AccAt-risk =88%
 [7]    Students at risk of failure   VLE log                           Milestone      AccWeek 10 =75%
        and dropout
 [8]    At-risk students (failing,    Demographics, VLE log, course     Weekly, mile- PrecSemester Start =47%
        submitting assessments)       data, grades in other courses     stones        PrecSemester End =90%
 [9]    Students at risk of failing   Demographics, VLE log, assess-    Daily         F1 =0.25–0.71
                                      ment data, date of registration
 [10]   Students at risk of not       VLE log, assessment data          Milestones       F1 =0.98
        completing on time
 [12]   Students at risk of failure   Demographics, VLE log             Entire course    AccAt-risk =92.2%–93.8%
        and marginal students                                                            AccMarginal =91.3%–93.5%
 [14]   Students at risk of failing   Demographics, VLE log             Weekly           F1, Week 4 =0.80
                                                                                         F1, Week 8 =0.93
 [15]   Student final score           Demographics, VLE log, stu- Milestones             AccDay 53 =70%
                                      dent scores                                        AccDay 207 =91%
                                 Acc = Accuracy, Prec = Precision, F1 = F-score


3.2. Data Preparation and Feature Computation
Data preprocessing included several strategies and techniques, with the main goal of selecting
data for analysis and computing features from the data. First, tables with the course, assessments,
and student data were filtered to the values connected to the DDD course. For each assessment,
we joined the data (grades and submission dates) from the table with student assessments.
Missing values were converted to zero. From
                                         ∑︀𝑛 the scores and assessment weights, first, the TMA
was calculated, following the formula 𝑖=1 𝑤𝑒𝑖𝑔ℎ𝑡𝑖 × 𝑠𝑐𝑜𝑟𝑒𝑖 , with 𝑛 being the number of
assessments. The final exam and TMA scores were used to calculate the final score. The resulting
dataset was then joined with the students’ demographics. At this point, 1,803 students were
observed from DDD course. Due to the high unregistration(i.e. student withdrawal) rate, only
students who remained registered during the semester and submitted at least one assignment or
took the final exam were kept. As we were not interested in student performance on a course-
granular level, students were removed who had previously already attended the course. The
final dataset included data for 777 unique students. Categorical data was encoded as numerical
by converting it into dummy variables. The earliest date was extracted of a student’s interaction
with a VLE. VLE interaction dates were transformed into DateTime format, and information
about the date, weekday, and month of each VLE activity were added to the dataframe. The
VLE activity logs were then grouped separately in months and weekdays for each student. The




                                                       2:4
sum of clicks on each weekday and each month were calculated, as well as weekday (Mon–Fri)
activity percentage, compared to weekend (Sat, Sun) activity. The overall sum of clicks and
clicks on a particular activity type were calculated for the whole semester and each month,
including the activities in September before the course started. We were interested in observing
patterns of inactivity in the VLE. Thus, we included features for the number of days when a
student was active in a VLE from the start of the semester, the number of inactive days since
the start of the semester, as well as the highest number of consecutive active and inactive days
during the observed time period, which were calculated as the longest intervals of consecutive
days students interacted (or had not interacted) with the VLE. Eight copies of the dataset were
created, where the metrics were recalculated for observations on the last day of each month
between September and April. The final datasets included 24 to 68 features, with the number
varying due to the available assessment grades and the number of previous months. Features
that have, to the best of our knowledge, not been observed in related work on the OULA dataset,
are presented in Table 2. Additionally, normalization of all inputs with MinMaxScaler was
conducted for prediction, which scaled and translated each feature so that it was in the given
range on the training set (i.e., between 0 and 1), enduring features with a higher value (e.g.,
number of clicks) do not, inherently, have higher importance than features with lower values
(e.g., the first day of interaction).

Table 2
Proposed novel interaction metrics

 Metric                   Description

 WeekdaysClicksRatio      Represents the ratio of VLE activity (clicks) on weekdays (Mon–Fri), compared to
                          the weekends (Sat, Sun) in the observed time period.
 DailyClicksPreSemester   Mean number of daily clicks in the VLE before the start of the semester.
 NrOfDaysActive           Represents the number of days a student interacted with the VLE since the start of
                          the semester. The opposite metric was also computed for inactive days.
 LongestNrOfDaysActive-   Represents the number of consecutive days a student was active in the observed
 Consequently             time period. The opposite metric was also computed for inactive days.


3.3. Exploratory Data Analysis
Our dataset includes 467 (60%) male and 310 (40%) female students. Most students (570 or 73%)
were in the age bracket of 0–35 years, 202 (26%) were aged between 35 and 55, and only five
were older than 55. Most students (395 or 51%) achieved A level or equivalent education before
taking the course; another 259 (33%) had education lower than A level, 114 (15%) had higher
education qualifications, 3 had no formal qualification, and 6 had post-graduate qualifications.
Based on the university’s performance standards thresholds, students were divided into seven
brackets (bare fail: 0–14, fail: 15–29, bare fail: 30–39, pass 4: 40–54, pass 3: 55–69, pass 2:
70–84, pass 1: 85–100). Out of 777 students, 124 failed the class. By exploring students’ activity,
measured with the sum of clicks in VLE, and their success in passing the course, we found out
the following. The hypothesis for normal distribution of the number of clicks in the VLE was
rejected with the Kolmogorov-Smirnov (KS) test (p<.001, D=.199). The Man-Whitney U (MW-U)
test showed a significant difference (U=12763, p<.001) between the number of VLE clicks of




                                                    2:5
students who passed and those who failed. Successful students interacted more with the VLE
(N=653, M=1338, SD=1357), while students who failed the course interacted less (N=124, M=432,
SD=345). The hypothesis for normal distribution of the date of the first interaction with VLE
in the groups of successful and unsuccessful students was rejected with the KS test (p<.001,
D=.24). The MW-U test also showed a significant difference (U=28865, p<.001) in successful
(N=653, M=-20.3, SD=6.5) and unsuccessful students (N=124, M=-17.5, SD=7.1) based on the
date of their first interaction with the VLE, with 0 being the first day of the semester. Further
insight into VLE interaction is presented in Figure 1, displaying the sum of students’ daily
clicks in correspondence with course events. Correlation between students’ interaction with the
VLE and the beginning of the semester is visible, as well as their increased interaction before
the final exam in May. Students’ average activity in the VLE during the week was highest on
Thursdays and Fridays, while the least activity is observed on Tuesdays. The trend of increased
clicks on Thursdays and Fridays and the lowest activity on Tuesdays was similar in groups
of successful and unsuccessful students. On average, students interacted with the VLE more
over the weekend than on Monday, Tuesday, or Wednesday. The mean share of weekday
clicks through the semester was 68.5%, meaning almost one-third of interactions in the VLE
were conducted at the weekends. Observing all VLE activities in the semester by activity type,
students most commonly clicked on forums (M=427), homepage (M=303), subpages (M=170),
and outcontent (M=145).




Figure 1: Daily VLE activity and course calendar

   We observed differences in academically successful and unsuccessful students through the
pro- posed metrics. The mean values of the percentage of VLE clicks over weekdays, average
daily VLE clicks before the semester, the number of active days, and the number of consecutively
active days, were re higher for successful students. On the contrary, the mean values of the
overall number of inactive days and the overall number of consecutively inactive days were
higher in the group of unsuccessful students. Comparing the ratio of weekday clicks, the mean
number was higher for successful students (M=.688) than unsuccessful (M=.665), suggesting
they were more active during the weekdays than at the weekend. The hypothesis for the normal
distribution of the weekday clicks ratio in groups of successful and unsuccessful students was
confirmed with the KS test (p=.187, D=.0401). The assumption of variances’ homogeneity
was violated using Levene’s test (F=19.7, p<.001). Thus, Welch’s T-test was used, and no
significant difference (t(111)=-1.86, p=.065) was observed between the weekday clicks ratio in
the groups of successful and unsuccessful students. Further, the average daily pre-semester
clicks were observed. The hypothesis for the normal distribution of average pre-semester clicks
in groups of successful and unsuccessful students was rejected with the KS test (p<.001, D=.229).




                                                   2:6
The MW-U test showed a significant difference (U=30219, p<.001) in daily VLE clicks pre-
semester between the observed groups. Before the semester, successful students had interacted
daily more with the VLE (N=653, M=4, SD=5.7) than unsuccessful students (N=124, M=2.3,
SD=3.1). The mean daily number of pre-semester clicks for successful students (M=4) was
almost double compared to unsuccessful students (M=2.3). Additionally, the KS test rejected the
hypothesis for normal distribution of the overall number of active days in groups of successful
and unsuccessful students (p<.001, D=.0995). The MW-U test showed a significant difference
(U=10376, p<.001) between the overall number of active days of students who passed and those
who failed. Successful students interacted with the VLE on a more frequent daily basis (N=653,
M=101, SD=50.9), while students who failed the course (N=124, M=42.3, SD=24.8) interacted
with the VLE less often in terms of daily accesses. The KS test rejected the hypothesis for
normal distribution of the overall number of consecutive active days in groups of successful
and unsuccessful students (p<.001, D=.0261). The MW-U test showed a significant difference
(U=-9.34, p<.001) between the overall number of consecutive active days of successful and
unsuccessful students. Successful students interacted consecutively more with the VLE (N=653,
M=11.1, SD=14.7), while students who failed the course (N=124, M=5, SD=3.5) interacted with
the VLE in larger gaps. Oppositely, successful students had shorter consecutive inactive periods
of VLE usage (N=653, M=22.2, SD=11.9), while unsuccessful students had longer (N=124, M=98.6,
SD=66.1). The KS test rejected the hypothesis for normal distribution of the overall number
of consecutive inactive days in the two groups of students (p<.001, D=.0209). The MW-U test
showed that the difference was significant (U=12.8, p<.001)

3.4. Prediction Task
Linear regression was used for prediction. Due to the small sample size, 5-fold cross-validation
with scikit KFold was used to improve the estimate of         √︁the model performance. RMSE and R2
                                                                    1    nsamples −1
were obtained for each period,calculated as RMSE = nsamples            Σ𝑖=0          (𝑦𝑖 − 𝑦ˆ𝑖 )2 , where 𝑦𝑖
is the predicted value
                  ∑︀𝑛 of the 2𝑖-th sample, and 𝑦    ˆ𝑖 is the corresponding true value, and unadjusted
                       (𝑦𝑖 −𝑦^𝑖 )
𝑅2 (𝑦, 𝑦ˆ) = 1 − ∑︀𝑖=1
                    𝑛            2 , where 𝑦
                                           ˆ 𝑖 is the predicted value of the 𝑖-th sample and 𝑦𝑖 is the
                    𝑖=1 (𝑦𝑖 −𝑦
                             ¯)
corresponding true value for total samples.
   To compare the impact of the proposed metrics on the performance, three other models
were created – MB containing grades and VLE metrics, used most commonly in related work;
MC containing demographics, VLE metrics, and grades; MD containing demographics and
grades. A detailed overview of the features included in each model is presented in Table 3.
Prediction models were compared monthly, as times indicated in Figure 1, with mean RMSE and
R2 calculated from all folds for each monthly prediction. The evaluations were conducted after
the end of each month on days 0, 31, 61, 92, 123, 151, 182, and 212. Accuracy in predicting failure
was an important point of interest. It was calculated manually with the same formula as recall
as proposed in [24], as Accuracy(Fail) = TPTP       +FN , where TP (True positive) is the number of
students who failed and were identified and FN (False Negative) the number of students who
failed but were not identified.




                                                    2:7
Table 3
Features used in each model, predictions for each model were made monthly, from Sep-Apr

 Model    Features

 MA Sep-Apr Average Pre-semester Daily Clicks, First Date of VLE interaction (num. of days before/after the first
            day of the semester), Longest Number of Days Active Consequently for Each Passed Month (Sep-Apr)
            Longest Number of Days Inactive Consequently for Each Passed Month (Sep-Apr), Number of Active
            Days for Each Passed Month (Sep-Apr) Number of Inactive Days for Each Passed Month (Sep-Apr),
            Overall Longest Number of Days Active Consequently, Overall Longest Number of Days Inactive
            Consequently, Overall Number of Active Days, Overall Number of inactive Days, Sum of All VLE
            clicks, Sum of Clicks for Each Passed Month (Sep-Apr), Sum of Clicks for Each VLE type, TMA Grades,
            Weekday Percentage Clicks
 MB Sep-Apr First Date of VLE interaction Sum of All VLE clicks, Sum of Clicks for Each Passed Month (Sep-May),
            Sum of Clicks for Each VLE type, TMA Grades
 MC Sep-Apr Age Band, Disability, Gender, Highest Education, Studied Credits, Sum of Clicks for Each Passed Month
            (Sep-May), Sum of Clicks for Each VLE type, TMA Grades
 MD Sep-Apr Age Band, Disability, Gender, Highest Education, Studied Credits, TMA Grades


4. Experimental Results
The predictions of MA are visualized in Figure 2. We can observe a tendency in the improvement
of predictions through time as more data were introduced to the model. A detailed overview
of RMSE and R2 for data for each time period compared to the scores of other models are
represented in Table 4. In April, after gathering all TMA, the model was capable of explaining
86% of the variability in final grades‘ predictions, with an RMSE of 8.0, which, in most cases,
was within one grade bracket. MA was relatively successful in predicting the variability in
data, even before the start of the semester (RMSE=14.2). In the first months, we observed the
highest prediction errors in students whose actual grade fell below the passing threshold of 40
points. Compared to other models, MA covered more variance in predicted final grades in all
the observed months. The mean RMSE was also lower in all observed months in MA compared
to other models. The accuracy of grade predictions for students failing the test was relatively
low, which is visible from Figure 2, where the grades around the 40-point threshold were the
least accurately predicted, as well as from Table 4. In April, a month before the final exam, 74%
of the failing students were detected.




Figure 2: Monthly predictions




                                                      2:8
Table 4
Performance evaluation of the prediction models
             Model    Metric      Sep    Oct    Nov     Dec    Jan    Feb    Mar    Apr

             MA       RMSEMean    14.2   13.1   12.7    12.2   11.0   8.8    8.7    8.0
                      R2 Mean     0.54   0.61   0.64    0.67   0.73   0.83   0.83   0.86
                      Acc(Fail)   51%    51%    53%     56%    63%    69%    69%    74%

             MB       RMSEMean    21.4   19.7   17.6    15.4   14.8   10.5   10.5   8.8
                      R2 Mean     0.02   0.12   0.31    0.47   0.43   0.74   0.75   0.83
                      Acc(Fail)   0%     12%    34%     41%    55%    72%    73%    76%

             MC       RMSEMean    21.6   20.1   18.1    15.9   15.1   10.6   10.4   8.8
                      R2 Mean     0.04   0.08   0.27    0.43   0.41   0.74   0.75   0.83
                      Acc(Fail)   0%     15%    32%     44%    55%    70%    71%    75%

             MD       RMSEMean    21.7   19.5   17.6    15.3   12.3   9.8    9.7    8.4
                      R2 Mean     0.01   0.16   0.31    0.47   0.67   0.78   0.78   0.84
                      Acc(Fail)   0%     10%    27%     44%    54%    69%    69%    75%


   The correlation between features was observed with Spearman’s correlation coefficient for
MA . As expected, there was a moderate to a strong positive correlation between the final result
and the assessment grades (correlation coefficient of TMAs was 0.44, 0.46, 0.65, 0.76, 0.79, and
0.77 for assessments, respectively). The ratio of weekday clicks had a very weak correlation
with the final score (0.05), and the mean number of daily clicks before the start of the semester
had a weak correlation (0.29). A moderate negative correlation was observed between the
longest number of consequently inactive days and the final score. The negative correlation
increased through the months, with the highest correlations observed in March (-0.46), April
(-0.49), and May (-0.53). A weak positive correlation was observed between the longest number
of consequently active days and the final score, again increasing through the months, with
the highest correlations observed in March (0.32), April (0.32), and May (0.35). The number
of inactive days in each month also correlated negatively with the final score, though the
correlation was weak to moderate, increasing slowly from October (-0.33) to April (-0.49) and
May (-0.51). Oppositely, a positive correlation was observed with the number of active days
observed each month, again increasing from October (0.33) to April (0.49) and May (0.51). The
exact contribution of proposed metrics is visible in the differences of the RMSE and R2 scores in
all the observed months between MA and MB , where the only difference between them was in
the introduction of proposed metrics in MA .


5. Discussion
Different models were deployed to predict the final grades of the students based on their
behavior, demographic characteristics, and VLE engagement patterns. The model with the
proposed metrics, created only with data from VLE interactions, performed best in the case
of the observed course in terms of RMSE and R2 , while the accuracy of predicted failure was
higher than comparing models until January. While the accuracy of failure prediction in MB ,




                                                  2:9
MC , and MD was very low in the first months, it improved visibly after February, reaching
up to 3% higher accuracy than MA . Comparing R2 between the models, we can observe the
predictions of MA approximated the real data points better than the other three models up to
January. From February to April, MD was comparable to MA , with the latter having 0.02 to 0.05
lower R2 . Comparing RMSE, MD reached the most comparable values from January to April,
with 0.4 to 1.3 higher RMSE values compared to MA .
   Though not directly comparable, the classification approach to predicting at-risk students in
related work reached a high accuracy; 84% accuracy at 20% of the course (day 50, mid November)
and up to 94% accuracy at the end of the course [2], or 70% accuracy at the beginning of
the semester and 94% at the end [4], with [12] also reported 93% accuracy at the end of the
semester for the observed DDD course. Various classification approaches yielded better results
compared to linear regression in observed the course data in similar timeframes, suggesting
linear regression is not the optimal model in this case. The accuracy of predicting failure in
students at the end of the semester presented by Rivas et al. [1] (78,5%) and Wang et al. [7] (75%)
was only slightly higher than the accuracy obtained with MA at the end of April (74%), one
month before the end of the semester. Using the regression approach offered more insight into
the exact predicted grade, offering students and educators more opportunities to react during
the semester. Admittedly, the limitations of linear regression can be observed in this case. As
the final score was calculated based on some features, a degree of multicollinearity could not be
avoided (i.e., correlation of several independent variables in a model). The highest correlation
between the computed metrics was observed in a score of assessment 5 (ID=25366) and the final
score. Furthermore, due to the use of linear regression, the predictions for outliers were less
accurate. Comparison of different prediction approaches in search of the optimal algorithm is
out of the scope of this work; however, in the observed case, the use of ensemble models or
multilayer perceptron might improve the model’s performance.
   We observed increasing negative correlations between monthly inactive days and the final
score, which can be explained by the lower weight of the first TMAs, as, until the end of
November, students can achieve up to 25% of their final TMA grade with three out of six finished
assessments. In later months, the weights of TMAs increase, and with them, the importance
of interaction with material available in the VLE. Similarly, an increasing negative correlation
between the longest number of consequently inactive days and the final score can be interpreted
that the inactivity in the first months of the semester can be improved during the semester.
However, due to the rising weight of TMA’s, inactivity in the last months of the semester
contributes greatly to a student’s lower final score. We can interpret regular students’ activity
(measured with consequent active days) in VLE as the prerequisite for a positive grade, as it does
not actively contribute directly to their final grade. However, the lack of consequent activity has
noticeable negative consequences. Correlations between the six new proposed metrics and the
final score in MA suggest that metrics measuring the daily clicks pre-semester, the number of
active and inactive days, and the longest number of consequently active and inactive days reach
weak to moderate correlation with the student’s final score, while the metric measuring the
percentage of weekday clicks achieved very weak correlation, and was overall less suitable for
prediction. The findings suggest that it is meaningful for learning analytic researchers to take
into account the possibilities of calculating new, custom features from the available raw data.




                                               2:10
6. Conclusions and future work
This study aimed to predict students’ final grades by converting the problem into monthly
formats and observing the performance evaluation based on different features presented to
the model. The primary motivation of this paper was to draw attention to the exploration of
previously unused metrics on an existing dataset, and observation of their impact on prediction.
The metrics which contributed to this discussion on the OULA dataset are the mean daily
number of pre-semester VLE clicks, the number of active and inactive days, and the number
of consecutive active and inactive days. Main limitations of this work are: (1) Narrow focus
and validation on one course, with a chance that the proposed features prove less useful on
large-scale or generalized data; (2) Used linear model is simple, and can be optimized further to
improve performance, especially around the passing threshold; (3) Limited practical implications,
as withdrawn students and students retaking the course were not included. In future work, the
introduction of various methods for identifying the use of the proposed features in different
courses will be explored, as well as testing the proposed features with classification models.


Acknowledgments
The authors acknowledge the financial support from the Slovenian Research Agency (Research
Core Funding No. P2-0057).


References
 [1] A. Rivas, A. González-Briones, G. Hernández, J. Prieto, P. Chamoso, Artificial neural
     network analysis of the academic performance of students in virtual learning environments,
     Neurocomputing 423 (2021) 713–720.
 [2] M. Adnan, A. Habib, J. Ashraf, S. Mussadiq, A. A. Raza, M. Abid, M. Bashir, S. U. Khan,
     Predicting at-risk students at different percentages of course length for early intervention
     using machine learning models, IEEE Access 9 (2021) 7519–7539.
 [3] H. Heuer, A. Breiter, Student success prediction and the trade-off between big data and
     data minimization, in: D. Krömker, U. Schroeder (Eds.), DeLFI 2018 - Die 16. E-Learning
     Fachtagung Informatik, Gesellschaft für Informatik e.V., Bonn, 2018, pp. 219–230.
 [4] N. R. Aljohani, A. Fayoumi, S.-U. Hassan, Predicting at-risk students using clickstream
     data in the virtual learning environment, Sustainability 11 (2019).
 [5] L. Haiyang, Z. Wang, P. Benachour, P. Tubman, A time series classification method for
     behaviour-based dropout prediction, in: 2018 IEEE 18th International Conference on
     Advanced Learning Technologies (ICALT), 2018, pp. 191–195.
 [6] H. Waheed, S.-U. Hassan, N. R. Aljohani, J. Hardman, S. Alelyani, R. Nawaz, Predicting
     academic performance of students from vle big data using deep learning models, Computers
     in Human Behavior 104 (2020) 106189.
 [7] X. Wang, B. Guo, Y. Shen, Predicting the at-risk online students based on the click data
     distribution characteristics, Scientific Programming 2022 (2022).




                                              2:11
 [8] J. Kuzilek, M. Hlosta, D. Herrmannova, Z. Zdrahal, J. Vaclavek, A. Wolff, Ou analyse:
     analysing at-risk students at the open university, Learning Analytics Review (2015) 1–16.
 [9] M. Hlosta, Z. Zdrahal, J. Zendulka, Ouroboros: Early identification of at-risk students
     without models based on legacy data, in: Proceedings of the Seventh International Learning
     Analytics & Knowledge Conference, LAK ’17, ACM, New York, NY, USA, 2017, p. 6–15.
[10] S. H., K. Ravikumar, Student risk identification learning model using machine learning
     approach, International Journal of Electrical and Computer Engineering 9 (2019) 3872.
[11] D. Gašević, S. Dawson, T. Rogers, D. Gasevic, Learning analytics should not promote one
     size fits all: The effects of instructional conditions in predicting academic success, The
     Internet and Higher Education 28 (2016) 68–84.
[12] K. T. Chui, D. C. L. Fung, M. D. Lytras, T. M. Lam, Predicting at-risk university students in
     a virtual learning environment via a machine learning algorithm, Computers in Human
     Behavior 107 (2020) 105584.
[13] Y.-H. Hu, C.-L. Lo, S.-P. Shih, Developing early warning systems to predict students’ online
     learning performance, Computers in Human Behavior 36 (2014) 469–478.
[14] D. Herrmannova, M. Hlosta, J. Kuzilek, Z. Zdráhal, Evaluating weekly predictions of
     at-risk students at the open university: Results and issues, in: Proceedings of the European
     Distance and E-Learning Network 2015 Annual Conference, 2015.
[15] A. Al-Azawei, M. Al-Masoudy, Predicting learners’ performance in virtual learning environ-
     ment (vle) based on demographic, behavioral and engagement antecedents, International
     Journal of Emerging Technologies in Learning (IJET) 15 (2020) 60–75.
[16] H.-C. Chen, E. Prasetyo, S.-S. Tseng, K. T. Putra, Prayitno, S. S. Kusumawardani, C.-E.
     Weng, Week-wise student performance early prediction in virtual learning environment
     using a deep explainable artificial intelligence, Applied Sciences 12 (2022).
[17] J. Kuzilek, M. Hlosta, Z. Zdrahal, Open university learning analytics dataset, Scientific
     data 4 (2017) 1–8.
[18] S. Palmer, Modelling engineering student academic performance using academic analytics,
     International Journal of Engineering Education 29 (2013) 132–138.
[19] S.-U. Hassan, H. Waheed, N. R. Aljohani, M. Ali, S. Ventura, F. Herrera, Virtual learning
     environment to predict withdrawal by leveraging deep learning, International Journal of
     Intelligent Systems 34 (2019) 1935–1952.
[20] S. Pongpaichet, S. Jankapor, S. Janchai, T. Tongsanit, Early detection at-risk students using
     machine learning, in: 2020 International Conference on Information and Communication
     Technology Convergence (ICTC), 2020, pp. 283–287.
[21] A. A. Mubarak, H. Cao, W. Zhang, Prediction of students’ early dropout based on their
     interaction logs in online learning environment, Interactive Learning Environments (2020)
     1–20.
[22] P. Jia, T. Maloney, Using predictive modelling to identify students at risk of poor university
     outcomes, Higher Education 70 (2015) 127–149.
[23] F. Marbouti, J. Diefes-Dux, Heidi A.and Strobel, Building course-specific regression-based
     models to identify at-risk students, in: 2015 ASEE Annual Conference & Exposition, ASEE
     Conferences, Seattle, Washington, 2015.
[24] F. Marbouti, H. A. Diefes-Dux, K. Madhavan, Models for early prediction of at-risk students
     in a course using standards-based grading, Computers & Education 103 (2016) 1–15.




                                               2:12