=Paper=
{{Paper
|id=Vol-3024/paper5
|storemode=property
|title=Analysing students interaction sequences on Moodle to predict academic performance
|pdfUrl=https://ceur-ws.org/Vol-3024/paper5.pdf
|volume=Vol-3024
|authors=Andreia Cunha,Alvaro Figueira
}}
==Analysing students interaction sequences on Moodle to predict academic performance==
Analysing students’ interaction sequences on Moodle to predict academic performance Andreia Cunhaa , Álvaro Figueiraa a Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal Abstract As e-Learning systems have become gradually prevalent, forcing a (sometimes needed) physical distance between lecturers and their students, new methods need to emerge to fill this enlarging gap. Educators need, more than ever, systems capable of warning them (and the students) of situations that might create future problems for the learning process. The capacity to give and get feedback is naturally the best way to overcome this problem. However, in e-learning contexts, with dozens or hundreds of students, the solution becomes less simple. In this work we propose a system capable of continuously giving feedback on the performance of the students based on the interaction sequences they undertake with the LMS. This work innovates in what concerns the sequences of activity accesses together with the computation of the duration of these online learning activities, which are then encoded and fed into machine learning algorithms. We used a longitudinal experiment from five academic years. From our set of classifiers, the Random Forest obtained the best results for preventing low grades, with an accuracy of nearly 87%. Keywords Student grade prediction, Interaction sequences, Machine learning, Moodle logs 1. Introduction The current pandemic situation changed how students throughout the world are learning. With education leaving the classrooms and entering students’ homes, e-learning platforms such as Moodle have become popular not only as mere content distribution platforms, but as virtual spaces where students can interact with each other and their professors. However, with the transition to digital learning, face-to-face interactions have been replaced with e-mails and video conferencing. The loss of these kinds of interactions can lead to professors not having a full understanding of their students’ performance, and students might not feel as motivated, resulting in lower grades or even in a complete lack of interest in the course. With these constraints, learning analytics are relevant as the educators are reduced in their capability of continuous feedback that is helpful in their activities. As such, we aim to provide a platform to ease the burden on the educators as to whether a student is performing well in their course. With the introduction of a machine learning and automated statistical analysis tool to the educators, we are in essence aiming to provide them with a smart learning environment focused on a particular online educational content distribution platform. Our research focus on Moodle as the LMS platform, due to its usage in our institution, and consequently to the ease of access LA4SLE@EC-TEL 2021: Learning Analytics for Smart Learning Environments, September 21, 2021, Bolzano, Italy Envelope-Open afdcunha42@gmail.com (A. Cunha); arf@dcc.fc.up.pt (Á. Figueira) Orcid 0000-0002-0507-7504 (Á. Figueira) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Table 1 Fields on the Moodle Log Pertaining Fields Example Event context Course: Technical Communication Activity Component System Student’s Name AName Student Origin web IP Address 127.0.0.1 Date and Time 4/Jul/16, 15:43 Affected user - (if not applicable) Interaction The user with id ’x’ viewed Description the course with id ’y’. Event name Course viewed to the relevant data. Moodle stores user interaction information in a log file in each course. The fields in the log files are shown in Table 1. The collected information allows for a wider capacity to help educators in understanding the areas their students have the most difficulties in. Moodle allows for the distribution of Resources and Activities by the educators. Resources are defined as static material, used for exposition or consultation, while activities are defined as any item that requires user interaction, such as tests, forums or work submissions. Thus, a tool capable of analysing the students’ interactions with the platforms and comparing it to a perceived optimal access pattern might provide insights to both professors and students about their potential shortcomings. Such a tool should be able to adapt itself to any course and be able to warn both professors and students should the latter’s actions result in them failing the course. Ideally, the tool should also be explainable so that the people involved can understand why it raised the alerts. The optimal medium for this tool’s analysis is the activity logs of the platforms in use, since they contain both time, student and activity information. It also stands to reason that an educator may design their courses with an optimal track (the order in which resources are accessed) in mind. Comparing this track to a student’s activity records allows for a measurement on how distant the student is from the intended course consumption. 2. Background Other works in this field also attempt to create a tool such as the one we envision, with the more relevant being: Ademi et al.[1] used Moodle logs for a single course relating to two academic years, 2016/2017 and 2017/2018, totalling over 326000 log entries. They filtered the data to remove non-Latin characters, as well as removing entries not corresponding to students. The classification of data was carried out with J48 decision trees using 2, 3, and 5 classes; Bayesian networks with 2 classes; and Support Vector Machines with 2 classes; with each class representing different ranges of grades. The best performing algorithm was the J48 Decision Tree with an accuracy of 89.32%. Yang et al.[2] used data from a course from the University of Tartu, Estonia, involving 242 students. Their goal was to predict if a student would pass or fail the class based on homework grades and procrastination metrics. To classify the data, they used k-means clustering for feature extraction and classified on the following algorithm: L- SVM, R-SVM, Gaussian processes, Decision Trees, Random Forests, Artificial Neural Networks, AdaBoost, and Naïve Bayes, of which the best performing was L-SVM with an accuracy of 84.6%. Hashim et al[3] used data from bachelor study programmes of the College of Computer Science and Information Technology, University of Basra, years 2017–2018, 2018–2019. Their aim was to predict student performance on final examinations. The data was cleaned via the removal of entries with empty fields. Each entry represents a student of which after cleaning 499 exist. The used classification algorithm were: Decision Tree, Naïve Bayes, Logistic Regression, Support Vector Machine, K-Nearest Neighbour, Sequential Minimal Optimisation and Neural Network, with the best performing being logistic regression with 88.8% in predicting failing students. In a previous work [4], the authors extracted from Moodle logs a set of existent and derived features that allowed them to predict a student’s academic grade. The system was based on a three-scale classification (low grades, margin grades, high grades). Some of the features are based on the time a student spends when interacting with a given activity or resource. While Moodle registers the time a particular page or activity has been accessed, it does not register when the user has left the page. Therefore, the length of any given session is computed by the authors using an algorithm that takes into account the start time of the next activity, but also a set of heuristics that set time activity thresholds. These times are then used to analyse student activity across ten different kinds of activity, including slide decks, tests, group formation, and workshop activities. Authors then use machine learning algorithms to classify learning paths based on similarities between students based on a multidimensional n-space where n is the number of different resources or activities taken during the course and the time spent is used as the ’intensity’ in the respective axis in that n-dimensional space. This work achieved 82% accuracy in the prediction of final grades at the end of the semester for low-grade students, as well as 67% (for the same class) up to the first third of the of the semester[4]. 3. Our Proposal This section describes a paradigm shift from the classification model such that the predictions can be: 1) extended to any number of resources/activities being undertaken during a course in an LMS; 2) the computation of the time taken by each student in each activity is enhanced and fine-tuned, 3) and, especially, we revise how to compare the optimal track with the student’s path. In our perspective the time a student spends in an activity or in a resource is also important, but what we consider to be fundamental is the order by which the resources and activities are accessed, which is not considered in [4]. In other words, if we were to sum all times of the same time of activity much information regarding sequence of accesses (namely, coherent access paths), returns to previous resources (for example, leading to difficulties in previous topics of subjects) would offer information that can be much more rich and informative to the system and ultimately to the instructor to mitigate those problems. 3.1. Automatic data anonymization and filtering Moodle’s logs contain a number of identifying fields, such as students’ full names, unique identifying IDs and IP addresses. IP addresses are discarded. Names and numeric IDs are used to generate and are replaced with SHA-256-based hashes. The logs can also contain entries pertaining to professors and other faculty staff. It is possible to filter which kinds of users appear in the anonymized logs: students, professors, and admins. This distinction is based on blacklists for professors, educators, assistants and administrators, meaning if a name is not present on any of those lists, then it is categorised as a student. 3.2. Activity/Resource extraction In order to achieve better generalisation capabilities, our system adapts itself to accommodate all activities provided via Moodle, so that our system does not need to be tailored for each course. For this purpose, the original logs are traversed to find occurrences of existing activities, prefix them with the relevant course identifier and academic year, as well as assign it a single letter to be used in activity strings if it’s deemed to be relevant for analysis. An example of the outcome of this process is shown in Table 2. Activities can be left out of this analysis by means of a blocklist which in the case below, includes the course’s landing page as well as forums. Table 2 Examples of the outcome of the activity extraction process Activity ID Activity Name Identifier DPI1001_1920_76792 File: Presentation Lecture A DPI1001_1920_78330 Test: Test 1 (Thursday) T1 DPI1001_1920_78728 File: Class02 B DPI1001_1920_87716 File: Statement (PDF) E DPI1001_1920_87717 Test: Test 2 (Wednesday) T2 DPI1001_1920_96220 Workshop: Article and Submission Evaluation G DPI1001_1920_108210 Group choice: Group Creation H DPI1001_1920_116444 Assignment: Slide Submission K 3.3. Dealing with activity durations While it is fair to assume that one student might navigate from one resource/activity to another in minutes or even seconds (being the differential the time spent in the former one), that is not always the case. That poses a number of issues. For example: if there is no further access from that student in the log; if there is further activity but only on another day, or simply if it would be unreasonable to assume that the student spend that much time on that precise activity. These situations need to rely on a threshold, after which it is can be assumed the student has left the activity. Our approach differs from others in the sense that we do not rely on a pre-established threshold limit equal to all Moodle activities. To handle these issues, as an initial step we take the same approach, and an initial set of times, considered to be reasonable thresholds, are defined for each activity. But then, for each entry we check the activity duration in the data. The data distribution and outliers are considered for extrapolation against our established thresholds and, should the duration be greater than the thresholds, flag them as “doubtful”. The number of interactions that were flagged as doubtful in the our test data (a ”Technical Communication” course) across five academic years were: 11692 in 2015/16 (37.67% of total); 9193 in 2016/17 (35.14% of total), 7689 in 2017/18 (36.64% of total), 4187 in 2018/19 (23.11% of total), and 13079 in 2019/20 (36.75% of total) which makes them fairly consistent along the years. Although one third of the duration times raises concerns, it is interesting to have a system that can assure (up to a good degree) that the duration session of two-thirds of the activities are correctly computed. 3.4. Data imputation for missing and doubtful durations Once doubtful duration times are identified, or if they are missing, they are then handled via the imputation of new values that are taken to be a reasonable approximation of a typical access duration for the particular activity. The handling of these doubtful times is their replacement with a more reasonable value depends on the current distribution of the non doubtful duration values. The average duration and the k-nearest neighbours are the most used methods, which depend on the size of the sample for that particular activity or resource [5]. For example, this means that if the time spent in Resource A by Student X is doubtful, then the system verifies all the times available for resource A, but also how the use of resources by all students compare to the use of resources by student X. 3.5. Creation of activity strings and feature generation The authors in [4] also create activity strings to describe access to resources and activities, however, they condense the times to be the sum of the times the user spent in each particular activity, and thus the order of activity accesses is not preserved, resulting in a huge loss of information. In this work we opt for a two-branch approach: i) we generate activity strings with no time information but preserving the access order. This will allow us to directly form and compare patterns of activity between a given student and one that attained a good grade in past years. The choice to not include time information was made on the premise that it might not accurately represent a student’s effort, who for example might have accessed a slide deck with the purpose of downloading it for offline study; ii) we still have the activity times to train a machine learning classifier to help us prevent and mitigate future problematic low grades. The student activity strings are compared to a string we consider to be optimal, reflecting the way that the educator considers that the course’s information should be ’consumed’. The optimal path will naturally vary by course and by year, and is obtained by asking the educator running the tool to input it. It should be noted that what we call the optimal path is not optimal in a pedagogical sense, because different students will have different educational needs so what can be considered optimal for one student may not be optimal for another. However, we consider it to be ’scientifically optimal’ because this provides a single point of comparison based on the educator’s project. Also, it is preferable to be warned that a student is in danger of failing when they end up passing the course (false positives) than not receiving any warnings because the system considers that a student will pass the course, then end up failing it (false negatives). To keep the generated strings to reasonable lengths, only activities deemed relevant (and not in a blocklist) are used for generating these strings. These student activity strings are space-separated single characters, or if the activity is a test, the ’Tn’ combination counts as a single character. An example of such string containing 5 activities is: ”A T1 B A T2”. This representation allows us to extract bi- and tri-grams, which are sequences of two or three consecutive words, which in this case represent online activities (or accessed resources). The presence of bi-grams in this context means that there is a pair of activities that are performed in an order, in an analogous way for tri-grams there is an order between three activities. The motivation is that there are coherent sequences to access pedagogic materials. For example, accesses to handouts before undertaking quizzes; reading posts before posting; obtaining templates before submitting papers, etc. These strings allow us to generate new features to complement the features already present in the previous work. These being: a) The String Length: the total activity count on a given access string. Note: it is not the character count of that string, but the activity count; b) Distance: the Levenshtein distance of a student’s access string to the optimal access string; c) Similarity: a score of the best fitting partial substring regarding the optimal string; d) number of 2- and 3-grams (unique) that are common to the optimal string. 4. Experimental settings and preliminary results 4.1. Data balancing First we divide the student grades (originally in a scale from 0 up to 20) into three categories: A (ranging from 0 to 8), B (9 to 11), C (12 to 20). Due to the imbalanced nature of the data, each classifier was tested using different sampling techniques to address this imbalance, such as oversampling, undersampling, and class weights. The dataset of initially 150k interactions was reduced to 93k (just student interactions). For the testing we had 625 rows, each agglomerating all features and corresponding to a single student attending the course from school years 2015/16 up to 2018/19. They were split 70/30 for training and testing sets, meaning 437 rows for training and 188 for testing before any class balancing effort. The training set consists of 83 samples (class A), 81 samples (class B) and 273 samples (class C). The testing set consists of 38 samples of class A, 33 samples of class B and 117 samples of class C. 4.2. Classifiers Three types of classification algorithms were tested: Decision Trees (DT), random forests (RF), and k-Nearest Neighbours (KNN). DT tests were carried out with various maximum depths for these trees: 6, 7, and 8. The best performing DT had a depth of 8 and utilised class weights. For RF, tests were carried out in a grid with different numbers of trees (100, 150, 200), and maximum depths (6, 7, 8). The best performing RF used maximum depth 8, with 200 trees, and utilised oversampling. For KNN, tests were carried out with 2, 4, 8, 10, and 20 neighbours. Together with oversampling, lower numbers of neighbours produced better results in this category as opposed to the higher numbers tested. The best KNN run with neighbours=2. The results of each of the classifiers can be seen in Table 3. Considering the work [4] as the baseline, we globally improved the accuracy by 5% and the precision by 11% (up to 97%) in the focus class A. 5. Discussion The reported results indicate that the classifiers have a good capability to separate the classes chosen, this is apparent when analysing the achieved metrics. Whilst the data is quite noisy, with each individual feature not being a particularly good indicator on all of their possible Table 3 Results for the best-performing classifiers Classifier Accuracy Precision Recall F1 score Support A B C A B C A B C A B C DT 0.72 0.81 0.72 0.85 0.59 0.81 0.76 0.69 0.76 0.70 37 32 34 RF 0.87 0.97 0.82 0.82 0.90 0.83 0.87 0.94 0.82 0.84 121 117 113 KNN 0.80 0.82 0.77 0.81 0.90 0.83 0.87 0.94 0.82 0.84 121 117 113 scores, the interactions between the features seem to provide the classifiers with sufficient data to achieve good results. A potential issue with the system is that the optimal string for one course on a given school year may not yet be applicable to other courses or even different instances of the same course. Depending on how long, or short, those strings end up being, it may result in individual features being too clustered around a small range of values being incapable of providing significant discriminatory performance on the classes. 6. Conclusions and Future Work The results achieved in the different classifiers show promising accuracy with the best being the random forests classifier. These results indicate there is also a margin for more fine tuning of the classifiers and the narrowing down of the semi-arbitrary characteristics of some of the features, such as the ideal string, which should lead in future research to better results. Furthermore, the ability to choose different times for each activity to be considered doubtful, as well as the ability to tune the optimal string to the lecturer’s preferences, allow for customization of the system and adapt to different circumstances, courses, and also evaluation schemes. References [1] N. Ademi, S. Loshkovska, S. Kalajdziski, Prediction of student success through analysis of moodle logs: Case study, in: Int. Conf. on ICT Innovations, Springer, 2019, pp. 27–40. [2] Y. Yang, D. Hooshyar, M. Pedaste, M. Wang, Y.-M. Huang, H. Lim, Predicting course achievement of university students based on their procrastination behaviour on moodle, Soft Computing 24 (2020) 18777–18793. [3] A. S. Hashim, W. A. Awadh, A. K. Hamoud, Student performance prediction model based on supervised machine learning algorithms, in: Materials Science and Engineering, volume 928, IOP Publishing, 2020, p. 032019. [4] B. Cabral, Á. Figueira, A machine learning model to early detect low performing students from lms logged interactions, in: EMENA-ISTL Information Systems and Technologies to Support Learning, Springer, 2019, pp. 145–154. [5] A. Aleryani, W. Wang, B. Iglesia, Dealing with missing data and uncertainty in the context of data mining, in: Hybrid Artificial Intelligence Systems, Springer, 2018, pp. 289–301.