=Paper=
{{Paper
|id=Vol-1137/LA_machinelearning_submission_4
|storemode=property
|title=Modelling student online behaviour in a virtual learning environment
|pdfUrl=https://ceur-ws.org/Vol-1137/LA_machinelearning_submission_4.pdf
|volume=Vol-1137
|dblpUrl=https://dblp.org/rec/conf/lak/HlostaHVKZW14
}}
==Modelling student online behaviour in a virtual learning environment==
Modelling student online behaviour in a virtual learning environment Martin Hlosta† Drahomira Herrmannova† Lucie Vachova†† Jakub Kuzilek† Zdenek Zdrahal† Annika Wolff† ∗ Knowledge Media Institute, The Open University† University of Economics, Prague†† Walton Hall Department of Exact Methods, Faculty of Management Milton Keynes, MK7 6AA Jarosovska 1117/II {martin.hlosta; d.herrmannova; jakub.kuzilek; Jindrichuv Hradec, 377 01 z.zdrahal; annika.wolff}@open.ac.uk vachova@fm.vse.cz ABSTRACT massive open online courses (MOOCs) [Cormier,2008]. The In recent years, distance education has enjoyed a major concept of distance education is however not new. The Open boom. Much work at The Open University (OU) has focused University is an institution with over forty years of experi- on improving retention rates in these modules by provid- ence with distance education, historically based on off-line ing timely support to students who are at risk of failing the materials and nowadays making an increasing use of the In- module. In this paper we explore methods for analysing stu- ternet. The great advantage of the online courses is in the dent activity in online virtual learning environment (VLE) – fact they are accessible to virtually anyone with Internet General Unary Hypotheses Automaton (GUHA) and Markov access. chain-based analysis – and we explain how this analysis can The other side of the coin is that the retention rates in be relevant for module tutors and other student support these courses are often low. [Koller et al.,2013] mention, that staff. We show that both methods are a valid approach to an average retention rate of a Coursera1 course is around modelling student activities. An advantage of the Markov 5%. The situation at traditional universities as well as at chain-based approach is in its graphical output and in the The Open University is significantly better, however, there possibility to model time dependencies of the student activ- is still a room for improvement. ities. There might be many reasons for the low retention rates, from the fact that the online courses are often offered to anybody interested to the fact that the performance of each Categories and Subject Descriptors student depends almost exclusively on how much are they D.4.8 [Performance]: Modelling and Prediction; willing to study on their own at home. Our work at The H.2.8 [Database Applications]: Data Mining Open University aims at analysing students’ activities in the online courses in order to gain insight into their behavioural General Terms patterns, which can be utilised for building prediction mod- els. Algorithms, Design, Experimentation, Human Factors 1.1 Problem Description Keywords The Open University2 is the biggest university in the Unit- Student Data, Distance Learning, Predictive Models, Ma- ed Kingdom, offering several hundred distance learning mod- chine Learning, Information Visualisation ules, which can be studied both as standalone modules or as part of a university degree. Anybody can sign up for a mod- 1. INTRODUCTION ule provided by The OU, without any previous education whatsoever. The students receive their study material and The recent years have seen a massive growth of differ- submit their assignments through an online virtual learning ent possibilities of online education, such as the well known environment. ∗This work was carried out at the Knowledge Media Insti- Students participating in a module are generally split into tute. smaller study groups of no more than few tens of students, typically according to their geographic location. Each group has an assigned tutor. The tutors grade the students’ as- signments and exams, answer their questions in the online Permission to make digital or hard copies of all or part of this work for forums, provide general advice and guidance, etc. personal or classroom use is granted without fee provided that copies are In order to support the students who are at risk of fail- not made or distributed for profit or commercial advantage and that copies ing the module The OU also implements various interven- bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific 1 permission and/or a fee. Request permissions from Permissions@acm.org. https://www.coursera.org/, a well known and one of the LAK ’14, March 24 – 28 2014, Indianapolis, IN, USA biggest platforms providing open online courses. 2 Copyright 2014 ACM 978-1-4503-2664-3/14/03 ...$15.00. http://www.open.ac.uk/ tions (such as phone calls from a specialised student support • binary flags indicating whether student was active in teams) during the course of the module. Because the number the VLE and in various content types. of students studying each module can reach several thousand (the modules used in our analysis have enrolment of around 4. METHODS two thousand students) and the resources available for the For analysis of student behaviour in the virtual learn- interventions are limited, the interventions have to be care- ing environments, we have used two different approaches – fully planned. Therefore, an important question one might GUHA [Hájek et al.,1966] and modelling based on Markov ask is how to identify students at risk of failing the module chains [Norris,1998]. so that the intervention is meaningful and efficient. Improving student retention through these focused inter- 4.1 Activity types analysis ventions and helping the tutors to focus on the students, As mentioned in Section 3, the VLE data contain infor- who require help, provides many benefits, from improved mation about the type of content the student accessed. The student satisfaction to financial savings for the university. content type can be for example forum, wiki, resource, quiz, In order to support the identification of students who are etc. In total there are 11 different content types. Using the currently at risk, we utilise several statistical and machine binary flags, indicating whether student was active in given learning methods. The available data contain both the in- week and content type, we utilised Bayes Theorem [Bishop formation about the students’ activity in the VLE as well and Nasrabadi,2006] for determining the probability that the as their demographic information. However, for modelling student will fail to complete the module. Moreover, we anal- student behaviour, only the data from the VLE was used. ysed each of the content types in terms of mean number of A more detailed description of our data set can be found in students succeeding in the module based on activity or inac- Section 3. tivity in the given content type. Based on this investigation we have selected a set of content types which significantly 2. PREVIOUS WORK influence students’ performance, these were then used in the The current work builds on previous research done at further analyses. The Open University. The initial experiments with machine learning techniques were using the VLE and assessment data 4.2 GUHA [Wolff and Zdrahal,2012]. One of the main findings of this General Unary Hypotheses Automaton (GUHA), originally research was that decision trees generally outperform the published in [Hájek et al.,1966], is one of the oldest data other methods [Wolff and Zdrahal,2012]. This research also mining methods for automatic discovery of new interesting included creation of a dashboard providing the university hypotheses from the data. To achieve this goal, GUHA uses staff with real-time information about student performance. various different procedures (ASSOC, IMPL, CORREL). The Additional methods were tested in [Wolff et al.,2013b] and choice of the procedure to use depends mainly on the user in [Wolff et al.,2013a]. In the latter work, demographic data needs and his experience. We have selected the ASSOC pro- were added to the predictions, however this research did cedure [Rauch and Simunek,2001], which allows to discover not confirm that this data provide a significant increase in interesting associations between attributes in the data. The performance. interestingness of the association is mostly based on their co-occurrence. The ASSOC procedure allows to limit the 3. DATA SPECIFICATION resulting rules by specifying constraints on the attributes. This property is important for our field of interest. The analyses we performed were done using real data of For our research, we used the ASSOC procedure that is several modules from The Open University. We examined a implemented in the 4ft-Miner module within the Lisp-Miner number of subsequent presentations of each module. software tool 3 . The specification of the constraints enabled The available data contain two types of information: us to restrict the rules only to those, which cover students • Information about the results of student assignments that fail or succeed in the TMA. We used three basic types (TMAs – tutor marked assignments). There are sev- of features for the analysis introduced in Section 3. eral assignments in each module, typically between five The search space for both types of binary flags was rea- and seven. Generally, the module is ended by a final sonable to perform analysis. For weekly aggregated counts, exam. it was necessary to reduce the search space via interval discretisation. For this purpose, we utilized LISp-miner • Data about student activity from the virtual learning and unsupervised equal frequency discretisation [Wong and environment (VLE). Chiu,1987]. GUHA method produce large set of hypotheses. The ex- The VLE data are aggregated by days and content type ample of such results with the categorized binary flags are (e.g. forum, wiki, resource, ...). This means that for each depicted in the Figure 1. Unfortunately the information con- day we know how many times did the student interact with tained in the output is difficult to interpret. Moreover, the given content type. For our analysis we summarise the data information of the time dimension is lost and this is even by weeks and content types. Summarising the data by weeks worse when using various content types. For us, this was seemed to be reasonable, it simplifies our analyses without the motivation to look for another modelling method. loosing too much detail. The features generated by the sum- marisation and used in the methods are: 4.3 Markov chain-based analysis of student ac- tivity • click counts aggregated by week, 3 LISp-Miner lispminer.vse.cz/ – software tool for imple- • click counts aggregated by week and content type, mentation of the GUHA method. Figure 1: Screenshot with discovered rules from LISp-miner. Figure 2: Representation of students behaviour in scenario 3 (TMA 1 was not submitted) Scenario TMA 1 TMA 1 not submit success 1. zero in any of 0 - 4 51.6 42.6 2. zero only in 1 - 4 95.7 4.3 3. zero only in 2 - 4 92.3 7.7 4. zero only in 3 - 4 100.0 0.0 5. zero only in 4 54.3 45.7 6. zero only in 0 15.4 71.8 7. zero only in 0 - 1 6.7 86.7 8. zero only in 0 - 2 15.4 69.2 9. zero only in 0 - 3 57.1 42.9 10. zero in at least one of 0 - 3, 18.2 71.6 non-zero in 4 11. zero in at least one of 0 - 2, 14.6 75.1 non-zero in 3 - 4 12. zero in at least one of 0 - 1, 9.3 80.4 Figure 3: Representation of students behaviour in non-zero in 2 - 4 scenario 3 (TMA 1 was passed) Table 1: Summary of examined situations for stu- dents behaviour evaluation (data in % of total num- represented by VLE activity, raise their chance to success in ber of course students) TMA 1 (scenarios 6, 7, 8). Figures 2 and 3 specify more closely the situations from the scenario 3 (with the TMA 1 not submitted, and TMA 1 passed, respectively). Colour tones of arrows differ (from In this part of analysis we examined the differences in in- white to red) depending on the percentage of students who tensity of student activity between students who were suc- moved in given direction. The more red the colour the bigger cessful in the first TMA (TMA 1) and those who did not the percentage of students it represents. The rows represent submit TMA 1. Students who failed in TMA 1 are not in- the weeks in which VLE activity was measured. First row cluded in this analysis, due to the fact that they represent shows activity before the beginning of the course (Week 0), only a small portion of students who submit TMA 1. In this the other four rows capture the VLE activity in week 1, 2, stage of research we analysed only students who at least once 3 and 4 respectively. The columns represent the intervals of had zero VLE activity in one of the weeks under consider- VLE activity - different colour of the node mean different ation. In VLE passive students represent the specific group interval. First column is zero VLE activity, while in other important while looking for potentially at-risk students. On columns the activity is divided into intervals with the cut this group we have studied different scenarios - the list of points in multiples of 30. them is displayed in Table 1. Moreover, the Table 1 shows This type of analysis enables us to look for specific pat- percentage of students who behaved according to given Sce- terns in students’ behaviour. In the similar way we can nario and were successful in TMA 1, in comparison to those analyse the different types of VLE activities as shown in who did not submit TMA 1, as well. Numbers in column Figure 4. In this case the nodes represent not the intensity Scenario represents particular weeks of course. of student activity, but capture the interest of student in Based on the data in Table 1, we can identify behaviour of specific content type. And (unlike the previous two figures) at-risk students. This is evident especially in scenarios 2, 3, this directed acyclic graph as a whole depicts the Markov 4 and 7. This shows us that students who tend to reach zero chain [Norris,1998]. VLE activity in later weeks are more probable not tu submit (scenarios 2, 3, 4). On the other hand those who have zero VLE activity in earlier weeks and later start to show interest 5. CONCLUSION Dave Cormier. 2008. The CCK08 MOOC–connectivism course, 1/4 way. Dave’s Educational Blog, 2. Petr Hájek, I. Havel, and Michal Chytil. 1966. The guha method of automatic hypotheses determination. Com- puting, 1(4):293–308. Daphne Koller, Andrew Ng, Chuong Do, and Zhenghao Chen. 2013. Retention and intention in massive open online courses: In depth. EDUCAUSE, June. James R Norris. 1998. Markov chains. Number 2008 in Cambridge series in statistical and probabilistic math- ematics. Cambridge university press. Jan Rauch and Milan Simunek. 2001. Mining for association rules by 4ft-miner. In INAP, pages 285–295. Figure 4: Markov chain for the various activity types Annika Wolff and Zdenek Zdrahal. 2012. Improving reten- combinations in weeks 0-5. tion by identifying and supporting ”at-risk” students. EDUCAUSE Review Online, July/Summer. In this paper we have examined two methods for analysing Annika Wolff, Zdenek Zdrahal, Drahomira Herrmannova, activity of students in the online virtual learning environ- and Petr Knoth. 2013a. Predicting student per- ment before the first tutor marked assignment – GUHA and formance from combined data sources. In Alejandro Markov chain-based graphical models. Both methods pro- Peña-Ayala, editor, Educational Data Mining: Appli- vide useful insights into the students’ behaviour during their cations and Trends, number 524 in Studies in Compu- studies. The benefit of the latter lies mostly in its graph- tational Intelligence, pages 175–202. Springer Interna- ical output, which might be easier to interpret and could tional Publishing, Cham. potentially provide support in planning interventions, and Annika Wolff, Zdenek Zdrahal, Andriy Nikolov, and Michal in the possibility to model time dependencies of the student Pantucek. 2013b. Improving retention: predicting at- activities. We believe that the understanding of the student risk students by analysing clicking behaviour in a vir- behavioural patterns will also be useful for building better tual learning environment. In Third Conference on predictive models of student performance. Learning Analytics and Knowledge (LAK 2013). ISBN 978-1-4503-1785-6. References Andrew K. C. Wong and David K. Y. Chiu. 1987. Syn- Christopher M Bishop and Nasser M Nasrabadi. 2006. thesizing statistical knowledge from incomplete mixed- Pattern recognition and machine learning, volume 1. mode data. IEEE Trans. Pattern Anal. Mach. Intell., springer New York. 9(6):796–805, June.