Using Process Mining to determine the relevance and impact of performing non-evaluative quizzes before evaluative assessments Juan Antonio Martínez-Carrascal1 and Teresa Sancho-Vinuesa 1 1 Universitat Oberta de Catalunya, Rambla del Poblenou, 156, 08018 Barcelona, Spain Abstract Different data mining techniques are commonly used to extract behavioural patterns from activity logs. However, they often offer static views and lack interpretability. In this paper, we describe the procedure for obtaining meaningful learning processes for a Maths course based on Moodle logs. Log data is transformed into process models in order to be analysed. We use the method described to specifically analyse the potential relevance and detailed impact of performing non-evaluative assessments before evaluation tests on a mathematics online university course. Our preliminary results outline statistical-ly significant differences between those students who practice before submitting evaluative tests and those who decide to proceed to evaluation directly. Besides this particular result, the process can be expanded to detect behavioural patterns and differences in learning processes among groups of students, in particular, in online learning scenarios. Keywords 1 Process Mining, Behavioural patterns, Formative assessment, Academic performance, Learning Process. 1. Introduction Process mining constitutes a discipline between business process management and data mining [1]. It constitutes a discipline on its own, that was initially focused on industrial processes and business management analysis. However, its potentialities and interpretability of results have interested other related fields, such as learning analytics. In this paper, we use process mining to get meaningful information from Moodle logs, aimed to obtain relevant outcomes that can help to improve the learning process. This introductory work aims to set the basics, envision the potentiality and shown one potential application. As specific application, we analyse the impact of the student behaviour before taking evaluative tests. In particular, we focus on the impact of taking online non-evaluative quizzes before carrying out evaluative tests. The focus is not set on the simple execution of the formative quiz, but in whether it is executed before taking the evaluative activity. We choose this specific aspect due to the existence of previous research work on a similar course we will work on whose results indicate that prac-tising has an impact[2]. The course uses classical statistical techniques. However, we aim to use this experience to develop a framework allowing deeper analysis of learn-ing processes and in particular, of behavioural learning patterns. The rest of this paper is structured as follows. Section 2 sets the theoretical frame-work, reviewing the basics behind process mining and its link to learning analytics. The section ends by stating the research question covered in this paper. Once done, Section 3, provides details on the course under analysis, and also on the steps carried out to conform a meaningful process based on Moodle log data. Section 4 shows the results obtained. Section 5 discusses these results, and finally, Section 6 concludes with interesting open lines for future research. Learning Analytics Summer Institute Spain (LASI Spain) 2022, June 20–21, 2022, Salamanca, Spain EMAIL: jmartinezcarra@uoc.edu (A. 1); tsancho@uoc.edu (A. 2) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 52 2. Theoretical framework Process mining is a discipline oriented to ‘discover, monitor, and improve real processes by extracting knowledge from event logs’[1]. Although it was originally focused on business-oriented models, the range of applications is broad. In particular, and in recent years, the learning analytics community has shown growing interest on this topic. Three main categories can be found into process mining: process discovery, conformance checking and process enhancement. Process discovery consists of identifying underlying processes hidden in log data. Given a log file, with information regarding real process executions, three main characteristics are extracted: a group of cases, a set of actions, and a timestamp for each execution associated. Each case is finally an execution of the process, which consists of a set of actions that take place in a specific order at given timestamps. While a single execution can be represented through a graph, integrating a huge number of executions is far more complex and makes interpretation difficult. Directly follows graphs (DFGs) constitute a first level of representation, which include a set of states (based on the actions in the log) linked by arcs, but with intrinsic limitations [3]. When the number of cases is high, the DFG turns into a spaghetti-like flow, which makes interpretation more complex. The analysis of these DFGs normally requires a filtering process, oriented to get either commonly performed activities, or common transitions. Different tools both commercial [4, 5]and non-commercial [6, 7] are available for processing these flows. From a formal perspective, Petri nets [8]are behind the theoretical foundation of these models. This kind of nets can both represent processes and simulate iterations on it, by placing a token on the start point and simulating process executions. Petri nets can be designed from a theoretical perspective, by using BPMN language, common to generic process analysis tools [9]. However, the potential behind process mining is to automatically obtain these nets, by analysing logs. This process is called process discovery [10] and constitutes a discipline on its own. While theoretical models represent what should happen, the discovered models show what is actually happening. For a given log, there is neither a single technique to infer models, nor a single resulting model. In fact, if the number of cases and associated actions increase, the problem can be computationally challenging. Specific algorithms try to reduce computational complexity at the cost of providing models which lack some characteristics of more complex models. Characteristics of discovered models include fitness, precision, generalisation and simplicity[11]. Fitness indicates to what extent the model reflects all the activity seen in the event log. Precision refers to the fact that the model does not allow for behaviors that are completely unrelated to the log from which it derives. As logs are finite, generalisation indicates the ability of the model to generalise similar behaviours. Finally, simplicity refers to the fact of the model being as simple as possible. There is always a balance between these characteristics. For instance, a model can potentially be made simpler at the cost of not reflecting all behaviour, or at introducing potentially unrelated behaviour. Specific algorithms oriented to practical applications are included into two main categories: heuristic and fuzzy [11]. Heuristic miner is focused on reflecting the main behaviour reflected in the log. Fuzzy miner is used when the number of activities and cases is particularly high, and the behaviour reflected in the log is unstructured. However, other algorithms are also used. Alpha-miner algorithm is also commonly used, and some articles on Educational data mining make use of inductive miner[12]. The generated models can be compared with a log associated to the same process. This comparison is called conformance checking [10, 11] and indicates to what extent the information contained in the log is consistent with a given model. From a practical perspective, conformance aims to detect where the logged behaviour diverges from the model. While different possibilities exist to establish conformance indexes, the most common approach is to check alignments. It is considered an accurate technique, which overcomes the limitations of previous algorithms. This technique tries to map a trace that does not necessarily adjust to the model and find either where the log differs from the model or where the model differs from the log. Besides conformance indicators, the disagreements between the log and the model set the basics for process enhancement. The idea is to improve the model based on the information gathered from real 53 executions of the process. With this idea in mind, and focusing on learning scenarios, we approach the improvement of learning processes, A compilation of works classified by education domains can be found in [13] . This compilation reveals works in different unrelated areas, such as curriculum min-ing, analysis of student registration processes or professional training. To cite some relevant works on aspects linked to student’s behaviour in courses, [14] remarks the potentialities of process mining to detect behavioural patterns in learning environments focusing on conformance analysis in a blended learning course. [15] uses process mining oriented to improve the learning experience of students enrolled in MOOC courses. [16] analyses online assessment data and recently [17] focuses on assessments in self-regulated learning environments. However, and to the best of our knowledge, there are no references in the literature that focus on the impact of non-evaluative assessments through process mining on academic performance. Existing literature focusing on this impact are based on classical techniques. Results indicate that practising improves overall outcome [2, 18]. This last work, focused on the same course we focus on, also indicates the potentiality of using formative quizzes, concluding that there is a link between questionnaire scores and overall performance. However, the link with the order of the activities is not considered in this article. Given this background, and linking these ideas together, we aim to analyse the learning process of a mathematical course, based on the activity logged, and the analysis of the underlying process. Specifically, we raise the following research question: RQ: Does completing non-evaluative online quizzes prior to performing evaluative assessments help students pass the course? What is the quantitative impact of an af-firmative response? The next section provides additional details on the course and precises the methods used. 3. Methodology 3.1. Course description The course under analysis is an introductory course to Mathematics. This course is offered online and covers mathematical requirements for students entering Computer Science and Multimedia related degrees. The course is structured into 11 learning units, covering from Calculus basics to differential calculus. The course lasts for one academic semester. The course is offered completely online. Students have a course landing page, where they can access materials in different formats, post doubts and contact the instructor. However, the core of the course is contained into a Moodle environment, where non-evaluative quizzes and evaluative activities are offered. Each unit has a similar structure. Students are supposed to begin practising non-evaluative questionnaires. Upon submission, they can check the answers. Additional retries are possible. Successive submissions are provided with slight modifications, allowing for extra practice. There is neither penalty for failing these tests, nor for not submitting them. For each of the units, students should carry out an evaluative assignment for the unit under consideration. Both kinds of tests are opened on a periodical basis (weekly or bi-weekly, depending on the unit). Non-evaluative quizzes are available two days before evaluative activities, with the aim of increasing student practice. Once a non-evaluative quiz is open, it remains open for the rest of the course, and students can check it later - and even resubmit -. When it comes to evaluative tests, they usually have a deadline, which is usually a week after they are offered. However, when a student opens an evaluative test, she must submit it within 24 hours. Figure 1 shows the schematic of the expected process for a given unit of the course. 54 Figure 1: Expected student behaviour within a course unit. Figure 1 reflects the Petri Net associated to the expected behaviour for a given unit. As the unit starts, the student can choose to perform the non-evaluative quiz (going through the place, P2), or simply skip (through P1). In any case, he can still decide to take non-evaluative quizzes again, as many times as needed. When the student is confident take the evaluative assessment, she should take it and proceed. This is the last step covered before entering a new unit. It is also noticeable that satisfaction surveys carried out on different editions of the course show a high level of satisfaction. The passing rations for the course under analysis is normally over 80%. In particular, the edition analysed in this paper has 144 students. 121 of them pass the course, while 12 fail and 11 withdraw from the course. 3.2. Fitting data for process discovery Besides the theoretical perspective of the process described in Figure 1, we aim to obtain a process model based on real executions from the students. To do so, we need to map a case identifier, an activity and a timestamp. Before performing the mapping, we have a look at the Moodle data we are working with. Table 1 reflects this data, where users have been anonymised. Table 1 Sample of Moodle data gathered Date User Activity Scope Description Detail The user with id '125433' viewed the 18/02/2021 Cuestionario: Modulo de curso User1 Cuestionario 'quiz' activity with 17:05 Nombres PRÀCTICA visto course module id '25433'. The user with id '152301' has viewed Cuestionario: the attempt with id Qüestionari inicial Intento de 18/02/2021 '1347943' belonging User2 sobre el Cuestionario cuestionario 17:06 to the user with id funcionament del visualizado '152301' for the quiz curs with course module id '87859'. 55 The user with id '152301' has had Cuestionario: their attempt with id Qüestionari inicial Intento del 18/02/2021 '1347943' re-viewed User2 sobre el Cuestionario cuestionario 17:07 by the user with id funcionament del revisado '152301' for the quiz curs with course module id '87859'. The user with id Cuestionario: '152301' has sub- Qüestionari inicial 18/02/2021 mitted the attempt User2 sobre el Cuestionario Intento enviado 17:07 with id '1347943' for funcionament del the quiz with course curs module id '87859'. A preliminary view indicates that we can recover time information from the Date column on Table 1. We will also consider the User column as a case identifier. In other words, we consider that each student performs an execution of the learning process all through the course. Finally, and regarding the activities, we decided to adjust the level of granularity to adjust potential findings to reflect the process depicted in Figure 1. In order to do so, and to provide shorter mnemonics we ordered the different learning units and assigned them a short name. The first learning unit will be identified as 1. Non evaluative assessments will be marked as PR, indicating that they are intended to practice. They correspond to non-evaluative quizzes and – as indicated – are expected to be covered before evaluative assessments (shortened as EVAL). In addition, and to allow for extra granularity, we also distinguish between the different possibilities Moodle adopts for a given activity. In particular, and thinking about assessments, we can have INIT (when the user initiates a submission), SUBMIT, when the student effectively submits, and REVIEW, when the user checks a form already sent. All these transformations are common in case studies that use Moodle[12, 17], and in general should be performed when dealing with LMS data. In our case, they have been performed through Python scripts, providing an output which is suitable for process analysis. Table 2 provides a sample of the resulting output. Table 2 Sample of Moodle log adapted for process analysis. Timestamp Case ID Activity 25/02/2021 21:45 User2 M1-PR-INIT 25/02/2021 22:33 User2 M1-PR-SUBMIT 26/02/2021 22:34 User2 M1-PR-INIT 26/02/2021 23:09 User2 M1-PR-SUBMIT 28/02/2021 11:12 User2 M1-EVAL-INIT 28/02/2021 21:27 User2 M1-EVAL-SUBMIT According to this sample, User2 has tried non-evaluative tests twice, before attempting the evaluative assessment. With this information, we can proceed to graph process data linked to our course under analysis. 4. Results Once the data for the course has been pre-processed according to the procedure de-scribed in Section 3, it was load into Celonis [4] for detailed analysis. Figure 2 shows an initial view of the process. 56 Figure 2: Overview of the discovered learning process model. While the image is only provided as a sample, we can clearly see that the process begins in a quite linear way, but after the initial week, it turns into a spaghetti-like process. This is in fact typical of complex processes with a high number of cases and activities. It can also be filtered based on specific activities, transitions or groups of students. In our case, we have filtered the view in Figure 2. This filtering confirms the results of previous analysis based on the same course indicating the relevance of practicing before evaluative activities [2]. For this reason, we have filtered the log in order to determine to what extent students cover the practice tests before the evaluative ones. Table 3 indicates the number of students performing non-evaluative quizzes before evaluation for each unit, classified according to their final course result. Table 3 Distribution of students performing non evaluative quizzes before evaluative assessments for the different modules based on final course mark (pass, fail, withdrawn). Actions Pass (121) Fail (12) WD (11) M1-PR before M1-EVAL 101 10 3 M2-PRM2-EVAL 104 9 3 M3-PR before M3-EVAL 98 6 1 M4-PR before M4-EVAL 97 5 1 M5-PR before M5-EVAL 95 2 1 M6-PR before M6-EVAL 108 3 1 M7-PR before M7-EVAL 90 1 1 M8-PR before M8-EVAL 88 0 1 M9-PR before M9-EVAL 83 0 1 M10-PR before M10-EVAL 88 0 1 The analysis of data in Table 3 can be analysed in terms of probability. We compute the conditional probability of passing the course, considering whether the students perform non evaluative quizzes before evaluative assessments or not. In addition, we also determine statistical significance through χ² test, which is shown in Table 4. 57 Table 4 Probability to pass the subject based on whether the student carries out formative quizzes before evaluative assessements in each module. Legend: ns: non-significant - *: p<0.05- **: p<0.01- ***: p <0.001 - ****: p<0.0001 Actions Pass (121) Fail (12) WD (11) M1 0.85 0.8 ns M2 0.90 0.61 ** M3 0.92 0.62 *** M4 0.94 0.59 **** M5 0.97 0.57 **** M6 0.96 0.41 **** M7 0.98 0.60 **** M8 0.99 0.60 **** M9 0.99 0.63 **** M10 0.99 0.60 **** From a graphical perspective, Figure 3 reflects the probability to pass considering whether non- evaluative activities are performed in advance or not. Figure 3: Differences in probabilities to pass the course. 5. Discussion Results in the previous section allows us to answer the question stated in Section 2 of this paper. Performing non-evaluative quizzes before the evaluate assessment linked to the unit under study positively impacts the chances to pass the subject. Differences in passing ratios have been quantified and shown as statistically significant. While it shows not significant for the first unit, the differences get noticeable after the second unit. Difference in passing ratios can be as high as 55% in Unit 6. The irrelevance of the first unit can even be considered normal from a conceptual perspective, even more considering the propaedeutic nature of the course. The first unit can be considered too introductory for some students, and they simply approach the evaluative activity directly. However, and from the second week, and consistently with the increasing complexity, the relevance of testing in advance also increases. These findings are also consistent with previous findings linked to this course [ 2]. The referred work indicates that those students that submit practice tests show higher performance that those who choose 58 not to submit them. While this work is based on classical techniques, the results agree with the process mining approach. Regarding process mining, and while this article is part of a preliminary research stage, results are promising. Specific findings have emerged from the analysis performed. We aim to further explore its application to deepen into the analysis of student behaviour and learning paths. We believe it can outperform other tools, due to its focus on the process and not on static views, and also due to the ability to get meaningful results it provides. Finally, we would like to remark the pedagogical implications. In particular, and based on these findings, we would encourage Math teachers in general to design non-evaluative quiz activities, and to remark the relevance of training before approaching evaluative tests. At the same time, students should be encouraged to always train before evaluation. Finally, and from a learning analytics perspective, we believe that those systems oriented to raise alarm should not only focus on the activities performed but on the process carried out. 6. Limitations and future work This article shows an initial stage of our research regarding the potential use of process mining to analyse online courses. In this sense, results should be considered as those of a preliminary work. Nevertheless, the work carried out outlines the potential of process mining in learning analytics scenarios. Among its potential applications, we currently focus on the analysis of student behaviour along the course, and in par-ticular, on the learning paths they follow. This analysis can provide relevant insight into when and why students decide to abandon the expected course path and how this impacts academic outcomes. This is relevant both to detect behaviours that can potentially lead to unsuccessful course outcomes and to improve course design. Authors are open to collaborate in initiatives aimed to achieve these goals. 7. References [1] van der Aalst W (2012) Process mining: Overview and opportunities. ACM Transactions on Management Information Systems 3:1–17 [2] Figueroa-Cañas J, Sancho-Vinuesa T (2021) Investigating the relationship between optional quizzes and final exam performance in a fully asynchronous online calculus module. Interactive Learning Environments 29:. https://doi.org/10.1080/10494820.2018.1559864 [3] van der Aalst WMP (2019) A practitioner’s guide to process mining: Limita-tions of the directly- follows graph. Procedia Computer Science 164:321–328. https://doi.org/10.1016/J.PROCS.2019.12.189 [4] Process Mining and Execution Management Software | Celonis. https://www.celonis.com/. Accessed 22 Apr 2022 [5] Process Mining and Automated Process Discovery Software for Professionals - Fluxicon Disco. https://fluxicon.com/disco/. Accessed 22 Apr 2022 [6] van Hee K, Oanea O, Post R, et al (2006) Yasper: A tool for workflow model-ing and analysis. Proceedings - International Conference on Application of Concurrency to System Design, ACSD 279–281. https://doi.org/10.1109/ACSD.2006.37 [7] start | ProM Tools. https://www.promtools.org/doku.php. Accessed 22 Apr 2022 [8] Peterson JL (1977) Petri Nets. ACM Computing Surveys (CSUR) 9:223–252. https://doi.org/10.1145/356698.356702 [9] Dijkman RM, Dumas M, Ouyang C (2008) Semantics and analysis of busi-ness process models in BPMN. Information and Software Technology 50:. https://doi.org/10.1016/j.infsof.2008.02.006 [10] Leemans SJJ, Fahland D, van der Aalst WMP (2018) Scalable process dis-covery and conformance checking. Software and Systems Modeling 17:. https://doi.org/10.1007/s10270- 016-0545-x [11] van der Aalst WMP (2014) Process mining in the large: A tutorial. In: Lecture Notes in Business Information Processing 59 [12] Bogarín A, Cerezo R, Romero C (2018) Discovering learning processes using inductive miner: A case study with learning management systems (LMSs). Psicothema 30:322–329. https://doi.org/10.7334/psicothema2018.116 [13] Bogarín A, Cerezo R, Romero C (2018) A survey on educational process mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Dis-covery 8:e1230. https://doi.org/10.1002/widm.1230 [14] Mukala P, Buijs J, Leemans M, van der Aalst W (2015) Exploring Students’ Learning Behaviour in MOOCs using Process Mining Techniques. Compu-ting Conference [15] Umer R, Susnjak T, Mathrani A, Suriadi S (2017) On predicting academic performance with process mining in learning analytics. Journal of Research in Innovative Teaching & Learning 10:. https://doi.org/10.1108/jrit-09-2017-0022 [16] Pechenizkiy M, Trčka N, Vasilyeva E, et al (2009) Process mining online assessment data. In: EDM’09 - Educational Data Mining 2009: 2nd Interna-tional Conference on Educational Data Mining [17] Cerezo R, Bogarín A, Esteban M, Romero C (2020) Process mining for self-regulated learning assessment in e-learning. Journal of Computing in Higher Education. https://doi.org/10.1007/s12528-019-09225-y [18] Fitkov-Norris E, Lees B (2012) Online formative assessment: Does it add up to better performance in Quantitative modules? In: Proceedings of the 11th ECRM 60