=Paper=
{{Paper
|id=Vol-1137/LA_machinelearning_submission_2
|storemode=property
|title=Developing predictive models for early detection of at-risk students on distance learning modules
|pdfUrl=https://ceur-ws.org/Vol-1137/LA_machinelearning_submission_2.pdf
|volume=Vol-1137
|dblpUrl=https://dblp.org/rec/conf/lak/WolffZHKH14
}}
==Developing predictive models for early detection of at-risk students on distance learning modules==
Developing predictive models for early detection of at-risk
students on distance learning modules
Annika Wolff Zdenek Zdrahal Drahomira Herrmannova Jakub Kuzilek Martin Hlosta
Knowledge Media Institute, The Open University
Walton Hall
Milton Keynes, MK7 6AA
+44(0)1908 659462, 654512, 652477, 659109, 653800
a.l.wolff;z.zdrahal;drahomira.herrmannova;jakub.kuzilek;martin.hlosta{@open.ac.uk}
ABSTRACT successfully in place at Purdue University for some time,
providing feedback to students based on predictions from multiple
Not all students who fail or drop out would have done so if they data sources (Arnold and Pistilli, 2012). The Open University
had been offered help at the right time. This is particularly true on (OU) is one of the largest distance learning institutions in Europe.
distance learning modules where there is no direct tutor/student OU modules are making increasing use of the Virtual Learning
contact, but where it has been shown that making contact at the Environment, Moodle, to supply learning materials, instead of the
right time can improve a student’s chances. This paper explores previous paper materials supplied in the post.
the latest work conducted at the Open University, one of Europe’s
largest distance learning institutions, to identify when is the This paper explores the latest work at the Open University using
optimum time to make student interventions and to develop data from VLE, combined with demographic data to predict
models to identify the at-risk students in this time frame. This student failure or dropout. This ongoing work is already providing
work in progress is taking real-time data and feeding it back to real-time information to module teams and will be fully evaluated
module teams as the module is running. Module teams will be later in the year. The methods investigate the role of VLE activity
indicating which of the predicted at-risk students have received an compared with demographic data and attempt to make predictions
intervention, and the nature of the intervention. of a student before they submit their first assessment. This first
assessment has been found to be a very good predictor of a
Categories and Subject Descriptors students final outcome on a module.
H.2.8 [Database Applications]: Data Mining; D.4.8 This work is the culmination of a number of previous projects to
[Performance]: Modelling and prediction investigate the potential for different methods to produce accurate
predictions. We will first describe briefly some of the previous
General Terms work at the OU before examining the current methods,
Algorithms, Design, Experimentation, Human Factors, Theory. preliminary feedback of these and plans for future evaluation.
2. Previous work with OU data
Keywords Decision trees have proved a fairly popular method for exploring
predictive models, machine learning, student data, Bayesian the potential for building predictive models from student data (see
models, distance learning Baradawaj and Pal, 2011; Pandey and Sharma, 2013; Kabra and
Bichkat, 2011). Initial work with OU data focused on using
1. INTRODUCTION decision tress to predict student outcome from VLE data
Predictive modelling techniques can be applied to student data to combined with assessment scores. Each OU module evaluates
identify students who are at risk of failing or withdrawing from a students periodically with a Tutor Marked Assessment (TMA).
module. Tutors or module teams can use this information to aid The exact number may vary from module to module, but seven
their decision-making about whom they should contact to offer TMA’s is quite typical. Three modules, each with fairly typical
help, leading to better strategic use of resources and improved VLE usage and a large student cohort (between 1200 and 4400
retention. For example, the Course Signals system has been students registered), were chosen for building and testing the
models. The main findings from this were that decision trees were
fairly good at predicting both a drop in performance in a
ermission to make digital or hard copies of all or part of this work for subsequent assessment and in predicting the final outcome of the
personal or classroom use is granted without fee provided that copies are module. Prediction was overall better when combining VLE and
not made or distributed for profit or commercial advantage and that
TMA data. This preliminary work also suggested that the absolute
amount of clicking within the VLE was not directly correlated
with outcome, students could click a lot but still fail or not click at
all and still pass. However, reduction in clicking was a warning
sign.
The models were developed and tested on single presentations of
the three modules, then they were tested on a future presentation
of the same module. Finally, they were tested on each other (in
other words, developed on one module and applied to another).
As expected, accuracy was reduced in the latter two cases, but
interestingly not as much as might have ben expected. A brief
investigation into including demographic data revealed that
prediction could be improved with this data source. This work is
described in detail in Wolff et al. (2013a).
The next phase of work investigated more fully the potential for
using demographic data and focused on Bayesian models for Figure 2. TMA1 is a strong predictor of the final result
prediction, which were compared with more simple methods of The VLE opens two to three weeks prior to the start of the
linear and logistic regression and weighted score. The key module presentation so that students can smoothly engage early
findings were that a) including VLE data improved the accuracy in a number of module related activities. In order to achieve early
of predictions compared to using demographic data alone b) there predictions for TMA1 we start analysing records from the very
was little real difference between the different methods evaluated opening of the VLE, i.e. well before the presentation start. VLE
- accuracy increased as the module progressed. However, the activities can be classified into a number of actions and activity
majority of dropout occurs early in the module (Wolff et al. types depending on what is the student trying to do. Out of many,
2013b). we have identified four activity types that provide useful
Some focused investigation into the role of the first TMA in information for prediction. They are denoted as follows:
predicting the final outcome, found that failing the first
Resource contains books and other educational materials for
assessment had a significant negative impact. Therefore, the key
the students
to improving retention is in identifying those students who are at
Forum is a web site where students communicate with their
risk of either submitting but failing, or not submitting this first
tutors and with each other
assessment. This is described in more detail in the next section,
where the overall problem is specified. Subpage is the means of navigation in the VLE structure
OU Content refers to the specification of TMAs and the
3. Problem Specification guidelines for their elaboration.
For identifying students at risk we can use knowledge about Our predictive modeling algorithms use, for each student, weekly
students’ behavior and performance in the current presentation, aggregates of all four activity types and all their combination.
their demographic data and data about the module and Therefore, for each student and each week we get a 16
performance of others students in previous presentations. In this dimensional vector (N, R, F, S, O, RF, RS, …, RFSO) where N
task we do not consider students’ overall learning objectives, nor means “No VLE activity”. Some algorithms use numeric values
their previous or current performance in another modules. This is describing the number of accesses of particular activity type,
diagrammatically shown in Figure 1. A1-An indicate the time at others use mutually exclusive Boolean values representing the fact
which a module assessment is due. Vle1-Vlen are the VLE clicks that the student engaged in the particular combination of activity
in the periods between either the start of the module and first types.
assessment, or else in between assessments. 4. Methods for early detection of failure
Predictions of at risk students is calculated and updated every
week starting from the opening of the VLE. The prediction
combines the results of four machine learning algorithms:
1. k Nearest Neighbours (k-NN) makes use of weekly
aggregates represented as 16-dimensional numerical activity
type vectors compared with legacy data of previous
presentations.
Figure 1. Prediction problem
2. k Nearest Neighbours (k-NN) is based on a similar
Given demographic data, the results of TMAs so far and VLE approach but uses only demographic data. Since
activities, the goal is to identify as early as possible the students demographic data has typically nominal values, an important
who are at risk and for whom the intervention is meaningful. By part of the algorithms was how to define distance between
meaningful intervention, we mean that the student can be helped two demographic sets.
to pass the module and the overall cost of interventions is
affordable. The reasoning about the future behavior of the student 3. Classification and Regression Tree (CART) is calculated
is based on experience with students with similar characteristics in from VLE data and TMA1 of previous presentations and
previous presentations of same module. then used for the classification of current students.
Our analysis indicates that VLE data is more important than 4. Bayes network combines both demographic and VLE data.
demographic characteristics. Moreover, the performance at the Chi-square tests showed that a statistically independent
early stages of the module presentation is a very good predictor subset of demographic data exist. For a smaller number of
of final success or failure. In the analysed modules, the students demographic variables a full Bayes network has been
who fail or do not submit the first TMA have high probability of constructed. For the complete set, we implemented naïve
overall failure. For this reason it is crucial to concentrate the Bayes.
effort to identify at risk students before the TMA1 deadline. This The results of these methods are combined by majority voting.
is indicated in Figure 2.
Figure 4. 3-nearest neighbours based on demographic and
Figure 3a. Mockup module dashboard VLE distances
The icon representing the evaluated student is in the centre. The
upper right quadrant shows the three nearest neighbours in the
current presentation. The nearest neighbours of three previous
presentations are organised anticlockwise. In the quadrants
representing the previous presentations, the red icons indicate that
the student failed, the green ones indicate a pass. In the current
presentation the icons show predicted outcome. The amber icon
show the borderline cases. The default split is calculated by the
algorithms, but the tutors can express their experience by moving
the slider.
The list of students identified as at risk is passed to the module
team for possible interventions. Currently, the data is passed in a
spreadsheet, whilst the dashboard mockups are being also
evaluated with module teams and will be completed and
integrated with models and data when the design is finalized. The
spreadsheet rank orders the students on order of their weighted
risk score. An explanation for the prediction of each of the models
is given. The first two explanations point to the nearest
Figure 3b. Mockup dashboard describing an individual neighbours from the previous presentations (first the closest by
student their VLE activity and secondly the closest by their demographic
The mockups of the dashboards for presenting the results are data). Next, the prediction according to the decision tree is
shown in Figures 3a and 3b. Figure 3a demonstrates a view across explained in terms of the applied rule, which may combine the
students of a module. The upper graph presents an overview of students level of VLE activity with some demographic attributes
VLE activities. The lower table organizes students according to (these are the normal demographic attributes that are collected
their predicted outcome at the current point in the module, about students, e.g. age, previous academic background, etc.).
including an explanation for the prediction. Figure 3b shows the Finally, the prediction of the Bayes classifier is presented along
view of an individual student. with the explanation similar to the decision tree, combining VLE
with demographic information. In some cases, the predictions
The detail of the interface that allows the tutor to change the
from the four models do not match.
balance of predictions based on demographic and VLE data is
shown in Figure 4. 5. Evaluation
Evaluation of the latest methods will occur when the module has
completed. Regardless of the predictive methods being used, there
is a general prediction by module teams that retention will
improve in this presentation due to other factors, such as
improved module design and also changes to student funding and
the financial commitments that students are now making. This will
clearly impact on the ability to draw any firm conclusions about
what to attribute improved retention to, should that turn out to be
the case. However, it is still worth looking at the overall retention
compared to previous modules. It is also possible to use
qualitative methods, such as looking at the student feedback, or
speaking with the module tutors and module teams. In addition, it
is possible to make a hypothesis about the accuracy of the data. The feedback from the first set of output data has been very
methods where interventions have been made. If interventions are positive. A full evaluation will occur later in the year when the
having an effect, then this should reduce the accuracy of the module is complete.
predictions. Specifically, it should be the case that predictions
made for a student prior to an intervention being made will give a
7. REFERENCES
false positive result for failure. The precision and recall of the
[1] Arnold, K.E., Pistilli, M.D. 2012. Course Signals at Purdue:
methods on this module at this point in time can be compared to
Using Learning Analytics to increase student success. In:
methods applied to other modules at the same point in time, to test
Learning Analytics and Knowledge, 29 April – 2 May,
for significant differences.
Vancouver, Canada
The first set of predicted outcome for TMA 1 has been provided
[2] Baradwaj, B., Pal, S. 2011. Mining Educational Data to
to one of the pilot module teams and action will be taken in the
Analyze Student’s Performance, International. Journal of
very near future. While it is not possible to know yet what the
Advanced Computer Science and Applications, vol. 2, no. 6,
final evaluation will show, the module team, as well as wider
pp. 63-69
support networks for OU students, have been looking at the initial
outputs and feel very positive about the potential for the [3] Pandey, M., Sharma, V.K. 2013. A Decision Tree Algorithm
technology to integrate into wider OU practice and provide an Pertaining to the Student Performance. Analysis and
important source of information, both for strategically targeting Prediction. International Journal of Computer Applications
support to students when they need it, but also for improving 61(13): 1-5, New York, USA
advice given to students as they begin their studies. [4] Kabra, R. R. and Bichkar, R.S. 2011. Performance Prediction
of Engineering Students using Decision Trees. International
6. Conclusion Journal of Computer Applications 36(11): 8-12, New York,
Where previous work has demonstrated that it is possible to USA
accurately identify at risk students throughout a module
presentation, this latest work focuses specifically on increasing [5] Wolff, A., Zdrahal, Z., Nikolov, A., Pantucek, M. 2013a.
accuracy for early detection. Most students who fail get into Improving retention: predicting at-risk students by analysing
difficulties very early on, so this is the critical point at which to clicking behaviour in a virtual learning environment.
make an intervention. Predictions are made with reference to a In: Third Conference on Learning Analytics and Knowledge
students nearest neighbor, based firstly on demographic data and (LAK 2013), 8-12 April 2013, Leuven, Belgium
secondly on VLE data, allowing the two data sources to be [6] Wolff, A., Zdrahal, Z., Herrmannová, D. and Knoth, P.
balanced against each other and to better understand, over time, 2013b. Predicting Student Performance from Combined Data
the role of each. In addition, CART and Bayes models are applied Sources, in eds. Alejandro Peña-Ayala, Educational Data
to the combined VLE and demographic data. Predictions from the Mining: Applications and Trends, 524, Springer
four models are weighed against each other to produce a list of
students ranked in order of risk. Currently, this is provided in a
spreadsheet to module teams, along with explanations from each
of the models. Dashboards are being constructed to visualize this