Developing predictive models for early detection of at-risk students on distance learning modules Annika Wolff Zdenek Zdrahal Drahomira Herrmannova Jakub Kuzilek Martin Hlosta Knowledge Media Institute, The Open University Walton Hall Milton Keynes, MK7 6AA +44(0)1908 659462, 654512, 652477, 659109, 653800 a.l.wolff;z.zdrahal;drahomira.herrmannova;jakub.kuzilek;martin.hlosta{@open.ac.uk} ABSTRACT successfully in place at Purdue University for some time, providing feedback to students based on predictions from multiple Not all students who fail or drop out would have done so if they data sources (Arnold and Pistilli, 2012). The Open University had been offered help at the right time. This is particularly true on (OU) is one of the largest distance learning institutions in Europe. distance learning modules where there is no direct tutor/student OU modules are making increasing use of the Virtual Learning contact, but where it has been shown that making contact at the Environment, Moodle, to supply learning materials, instead of the right time can improve a student’s chances. This paper explores previous paper materials supplied in the post. the latest work conducted at the Open University, one of Europe’s largest distance learning institutions, to identify when is the This paper explores the latest work at the Open University using optimum time to make student interventions and to develop data from VLE, combined with demographic data to predict models to identify the at-risk students in this time frame. This student failure or dropout. This ongoing work is already providing work in progress is taking real-time data and feeding it back to real-time information to module teams and will be fully evaluated module teams as the module is running. Module teams will be later in the year. The methods investigate the role of VLE activity indicating which of the predicted at-risk students have received an compared with demographic data and attempt to make predictions intervention, and the nature of the intervention. of a student before they submit their first assessment. This first assessment has been found to be a very good predictor of a Categories and Subject Descriptors students final outcome on a module. H.2.8 [Database Applications]: Data Mining; D.4.8 This work is the culmination of a number of previous projects to [Performance]: Modelling and prediction investigate the potential for different methods to produce accurate predictions. We will first describe briefly some of the previous General Terms work at the OU before examining the current methods, Algorithms, Design, Experimentation, Human Factors, Theory. preliminary feedback of these and plans for future evaluation. 2. Previous work with OU data Keywords Decision trees have proved a fairly popular method for exploring predictive models, machine learning, student data, Bayesian the potential for building predictive models from student data (see models, distance learning Baradawaj and Pal, 2011; Pandey and Sharma, 2013; Kabra and Bichkat, 2011). Initial work with OU data focused on using 1. INTRODUCTION decision tress to predict student outcome from VLE data Predictive modelling techniques can be applied to student data to combined with assessment scores. Each OU module evaluates identify students who are at risk of failing or withdrawing from a students periodically with a Tutor Marked Assessment (TMA). module. Tutors or module teams can use this information to aid The exact number may vary from module to module, but seven their decision-making about whom they should contact to offer TMA’s is quite typical. Three modules, each with fairly typical help, leading to better strategic use of resources and improved VLE usage and a large student cohort (between 1200 and 4400 retention. For example, the Course Signals system has been students registered), were chosen for building and testing the models. The main findings from this were that decision trees were fairly good at predicting both a drop in performance in a ermission to make digital or hard copies of all or part of this work for subsequent assessment and in predicting the final outcome of the personal or classroom use is granted without fee provided that copies are module. Prediction was overall better when combining VLE and not made or distributed for profit or commercial advantage and that TMA data. This preliminary work also suggested that the absolute amount of clicking within the VLE was not directly correlated with outcome, students could click a lot but still fail or not click at all and still pass. However, reduction in clicking was a warning sign. The models were developed and tested on single presentations of the three modules, then they were tested on a future presentation of the same module. Finally, they were tested on each other (in other words, developed on one module and applied to another). As expected, accuracy was reduced in the latter two cases, but interestingly not as much as might have ben expected. A brief investigation into including demographic data revealed that prediction could be improved with this data source. This work is described in detail in Wolff et al. (2013a). The next phase of work investigated more fully the potential for using demographic data and focused on Bayesian models for Figure 2. TMA1 is a strong predictor of the final result prediction, which were compared with more simple methods of The VLE opens two to three weeks prior to the start of the linear and logistic regression and weighted score. The key module presentation so that students can smoothly engage early findings were that a) including VLE data improved the accuracy in a number of module related activities. In order to achieve early of predictions compared to using demographic data alone b) there predictions for TMA1 we start analysing records from the very was little real difference between the different methods evaluated opening of the VLE, i.e. well before the presentation start. VLE - accuracy increased as the module progressed. However, the activities can be classified into a number of actions and activity majority of dropout occurs early in the module (Wolff et al. types depending on what is the student trying to do. Out of many, 2013b). we have identified four activity types that provide useful Some focused investigation into the role of the first TMA in information for prediction. They are denoted as follows: predicting the final outcome, found that failing the first  Resource contains books and other educational materials for assessment had a significant negative impact. Therefore, the key the students to improving retention is in identifying those students who are at  Forum is a web site where students communicate with their risk of either submitting but failing, or not submitting this first tutors and with each other assessment. This is described in more detail in the next section, where the overall problem is specified.  Subpage is the means of navigation in the VLE structure  OU Content refers to the specification of TMAs and the 3. Problem Specification guidelines for their elaboration. For identifying students at risk we can use knowledge about Our predictive modeling algorithms use, for each student, weekly students’ behavior and performance in the current presentation, aggregates of all four activity types and all their combination. their demographic data and data about the module and Therefore, for each student and each week we get a 16 performance of others students in previous presentations. In this dimensional vector (N, R, F, S, O, RF, RS, …, RFSO) where N task we do not consider students’ overall learning objectives, nor means “No VLE activity”. Some algorithms use numeric values their previous or current performance in another modules. This is describing the number of accesses of particular activity type, diagrammatically shown in Figure 1. A1-An indicate the time at others use mutually exclusive Boolean values representing the fact which a module assessment is due. Vle1-Vlen are the VLE clicks that the student engaged in the particular combination of activity in the periods between either the start of the module and first types. assessment, or else in between assessments. 4. Methods for early detection of failure Predictions of at risk students is calculated and updated every week starting from the opening of the VLE. The prediction combines the results of four machine learning algorithms: 1. k Nearest Neighbours (k-NN) makes use of weekly aggregates represented as 16-dimensional numerical activity type vectors compared with legacy data of previous presentations. Figure 1. Prediction problem 2. k Nearest Neighbours (k-NN) is based on a similar Given demographic data, the results of TMAs so far and VLE approach but uses only demographic data. Since activities, the goal is to identify as early as possible the students demographic data has typically nominal values, an important who are at risk and for whom the intervention is meaningful. By part of the algorithms was how to define distance between meaningful intervention, we mean that the student can be helped two demographic sets. to pass the module and the overall cost of interventions is affordable. The reasoning about the future behavior of the student 3. Classification and Regression Tree (CART) is calculated is based on experience with students with similar characteristics in from VLE data and TMA1 of previous presentations and previous presentations of same module. then used for the classification of current students. Our analysis indicates that VLE data is more important than 4. Bayes network combines both demographic and VLE data. demographic characteristics. Moreover, the performance at the Chi-square tests showed that a statistically independent early stages of the module presentation is a very good predictor subset of demographic data exist. For a smaller number of of final success or failure. In the analysed modules, the students demographic variables a full Bayes network has been who fail or do not submit the first TMA have high probability of constructed. For the complete set, we implemented naïve overall failure. For this reason it is crucial to concentrate the Bayes. effort to identify at risk students before the TMA1 deadline. This The results of these methods are combined by majority voting. is indicated in Figure 2. Figure 4. 3-nearest neighbours based on demographic and Figure 3a. Mockup module dashboard VLE distances The icon representing the evaluated student is in the centre. The upper right quadrant shows the three nearest neighbours in the current presentation. The nearest neighbours of three previous presentations are organised anticlockwise. In the quadrants representing the previous presentations, the red icons indicate that the student failed, the green ones indicate a pass. In the current presentation the icons show predicted outcome. The amber icon show the borderline cases. The default split is calculated by the algorithms, but the tutors can express their experience by moving the slider. The list of students identified as at risk is passed to the module team for possible interventions. Currently, the data is passed in a spreadsheet, whilst the dashboard mockups are being also evaluated with module teams and will be completed and integrated with models and data when the design is finalized. The spreadsheet rank orders the students on order of their weighted risk score. An explanation for the prediction of each of the models is given. The first two explanations point to the nearest Figure 3b. Mockup dashboard describing an individual neighbours from the previous presentations (first the closest by student their VLE activity and secondly the closest by their demographic The mockups of the dashboards for presenting the results are data). Next, the prediction according to the decision tree is shown in Figures 3a and 3b. Figure 3a demonstrates a view across explained in terms of the applied rule, which may combine the students of a module. The upper graph presents an overview of students level of VLE activity with some demographic attributes VLE activities. The lower table organizes students according to (these are the normal demographic attributes that are collected their predicted outcome at the current point in the module, about students, e.g. age, previous academic background, etc.). including an explanation for the prediction. Figure 3b shows the Finally, the prediction of the Bayes classifier is presented along view of an individual student. with the explanation similar to the decision tree, combining VLE with demographic information. In some cases, the predictions The detail of the interface that allows the tutor to change the from the four models do not match. balance of predictions based on demographic and VLE data is shown in Figure 4. 5. Evaluation Evaluation of the latest methods will occur when the module has completed. Regardless of the predictive methods being used, there is a general prediction by module teams that retention will improve in this presentation due to other factors, such as improved module design and also changes to student funding and the financial commitments that students are now making. This will clearly impact on the ability to draw any firm conclusions about what to attribute improved retention to, should that turn out to be the case. However, it is still worth looking at the overall retention compared to previous modules. It is also possible to use qualitative methods, such as looking at the student feedback, or speaking with the module tutors and module teams. In addition, it is possible to make a hypothesis about the accuracy of the data. The feedback from the first set of output data has been very methods where interventions have been made. If interventions are positive. A full evaluation will occur later in the year when the having an effect, then this should reduce the accuracy of the module is complete. predictions. Specifically, it should be the case that predictions made for a student prior to an intervention being made will give a 7. REFERENCES false positive result for failure. The precision and recall of the [1] Arnold, K.E., Pistilli, M.D. 2012. Course Signals at Purdue: methods on this module at this point in time can be compared to Using Learning Analytics to increase student success. In: methods applied to other modules at the same point in time, to test Learning Analytics and Knowledge, 29 April – 2 May, for significant differences. Vancouver, Canada The first set of predicted outcome for TMA 1 has been provided [2] Baradwaj, B., Pal, S. 2011. Mining Educational Data to to one of the pilot module teams and action will be taken in the Analyze Student’s Performance, International. Journal of very near future. While it is not possible to know yet what the Advanced Computer Science and Applications, vol. 2, no. 6, final evaluation will show, the module team, as well as wider pp. 63-69 support networks for OU students, have been looking at the initial outputs and feel very positive about the potential for the [3] Pandey, M., Sharma, V.K. 2013. A Decision Tree Algorithm technology to integrate into wider OU practice and provide an Pertaining to the Student Performance. Analysis and important source of information, both for strategically targeting Prediction. International Journal of Computer Applications support to students when they need it, but also for improving 61(13): 1-5, New York, USA advice given to students as they begin their studies. [4] Kabra, R. R. and Bichkar, R.S. 2011. Performance Prediction of Engineering Students using Decision Trees. International 6. Conclusion Journal of Computer Applications 36(11): 8-12, New York, Where previous work has demonstrated that it is possible to USA accurately identify at risk students throughout a module presentation, this latest work focuses specifically on increasing [5] Wolff, A., Zdrahal, Z., Nikolov, A., Pantucek, M. 2013a. accuracy for early detection. Most students who fail get into Improving retention: predicting at-risk students by analysing difficulties very early on, so this is the critical point at which to clicking behaviour in a virtual learning environment. make an intervention. Predictions are made with reference to a In: Third Conference on Learning Analytics and Knowledge students nearest neighbor, based firstly on demographic data and (LAK 2013), 8-12 April 2013, Leuven, Belgium secondly on VLE data, allowing the two data sources to be [6] Wolff, A., Zdrahal, Z., Herrmannová, D. and Knoth, P. balanced against each other and to better understand, over time, 2013b. Predicting Student Performance from Combined Data the role of each. In addition, CART and Bayes models are applied Sources, in eds. Alejandro Peña-Ayala, Educational Data to the combined VLE and demographic data. Predictions from the Mining: Applications and Trends, 524, Springer four models are weighed against each other to produce a list of students ranked in order of risk. Currently, this is provided in a spreadsheet to module teams, along with explanations from each of the models. Dashboards are being constructed to visualize this