Using Process Mining to determine the relevance and impact of
performing non-evaluative quizzes before evaluative
assessments
Juan Antonio Martínez-Carrascal1 and Teresa Sancho-Vinuesa 1
1
    Universitat Oberta de Catalunya, Rambla del Poblenou, 156, 08018 Barcelona, Spain

                      Abstract
                      Different data mining techniques are commonly used to extract behavioural patterns from
                      activity logs. However, they often offer static views and lack interpretability. In this paper, we
                      describe the procedure for obtaining meaningful learning processes for a Maths course based
                      on Moodle logs. Log data is transformed into process models in order to be analysed. We use
                      the method described to specifically analyse the potential relevance and detailed impact of
                      performing non-evaluative assessments before evaluation tests on a mathematics online
                      university course. Our preliminary results outline statistical-ly significant differences between
                      those students who practice before submitting evaluative tests and those who decide to proceed
                      to evaluation directly. Besides this particular result, the process can be expanded to detect
                      behavioural patterns and differences in learning processes among groups of students, in
                      particular, in online learning scenarios.

                      Keywords 1
                      Process Mining, Behavioural patterns, Formative assessment, Academic performance,
                      Learning Process.

1. Introduction

    Process mining constitutes a discipline between business process management and data mining [1].
It constitutes a discipline on its own, that was initially focused on industrial processes and business
management analysis. However, its potentialities and interpretability of results have interested other
related fields, such as learning analytics.
    In this paper, we use process mining to get meaningful information from Moodle logs, aimed to
obtain relevant outcomes that can help to improve the learning process. This introductory work aims to
set the basics, envision the potentiality and shown one potential application.
    As specific application, we analyse the impact of the student behaviour before taking evaluative
tests. In particular, we focus on the impact of taking online non-evaluative quizzes before carrying out
evaluative tests. The focus is not set on the simple execution of the formative quiz, but in whether it is
executed before taking the evaluative activity. We choose this specific aspect due to the existence of
previous research work on a similar course we will work on whose results indicate that prac-tising has
an impact[2]. The course uses classical statistical techniques. However, we aim to use this experience
to develop a framework allowing deeper analysis of learn-ing processes and in particular, of behavioural
learning patterns.
    The rest of this paper is structured as follows. Section 2 sets the theoretical frame-work, reviewing
the basics behind process mining and its link to learning analytics. The section ends by stating the
research question covered in this paper. Once done, Section 3, provides details on the course under
analysis, and also on the steps carried out to conform a meaningful process based on Moodle log data.
Section 4 shows the results obtained. Section 5 discusses these results, and finally, Section 6 concludes
with interesting open lines for future research.

Learning Analytics Summer Institute Spain (LASI Spain) 2022, June 20–21, 2022, Salamanca, Spain
EMAIL: jmartinezcarra@uoc.edu (A. 1); tsancho@uoc.edu (A. 2)
© 2022 Copyright for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                      52
2. Theoretical framework

    Process mining is a discipline oriented to ‘discover, monitor, and improve real processes by
extracting knowledge from event logs’[1]. Although it was originally focused on business-oriented
models, the range of applications is broad. In particular, and in recent years, the learning analytics
community has shown growing interest on this topic.
    Three main categories can be found into process mining: process discovery, conformance checking
and process enhancement. Process discovery consists of identifying underlying processes hidden in log
data. Given a log file, with information regarding real process executions, three main characteristics are
extracted: a group of cases, a set of actions, and a timestamp for each execution associated. Each case
is finally an execution of the process, which consists of a set of actions that take place in a specific order
at given timestamps.
    While a single execution can be represented through a graph, integrating a huge number of
executions is far more complex and makes interpretation difficult. Directly follows graphs (DFGs)
constitute a first level of representation, which include a set of states (based on the actions in the log)
linked by arcs, but with intrinsic limitations [3]. When the number of cases is high, the DFG turns into
a spaghetti-like flow, which makes interpretation more complex. The analysis of these DFGs normally
requires a filtering process, oriented to get either commonly performed activities, or common
transitions. Different tools both commercial [4, 5]and non-commercial [6, 7] are available for
processing these flows.
    From a formal perspective, Petri nets [8]are behind the theoretical foundation of these models. This
kind of nets can both represent processes and simulate iterations on it, by placing a token on the start
point and simulating process executions. Petri nets can be designed from a theoretical perspective, by
using BPMN language, common to generic process analysis tools [9]. However, the potential behind
process mining is to automatically obtain these nets, by analysing logs. This process is called process
discovery [10] and constitutes a discipline on its own. While theoretical models represent what should
happen, the discovered models show what is actually happening. For a given log, there is neither a
single technique to infer models, nor a single resulting model. In fact, if the number of cases and
associated actions increase, the problem can be computationally challenging. Specific algorithms try to
reduce computational complexity at the cost of providing models which lack some characteristics of
more complex models.
    Characteristics of discovered models include fitness, precision, generalisation and simplicity[11].
Fitness indicates to what extent the model reflects all the activity seen in the event log. Precision refers
to the fact that the model does not allow for behaviors that are completely unrelated to the log from
which it derives. As logs are finite, generalisation indicates the ability of the model to generalise similar
behaviours. Finally, simplicity refers to the fact of the model being as simple as possible. There is
always a balance between these characteristics. For instance, a model can potentially be made simpler
at the cost of not reflecting all behaviour, or at introducing potentially unrelated behaviour.
    Specific algorithms oriented to practical applications are included into two main categories: heuristic
and fuzzy [11]. Heuristic miner is focused on reflecting the main behaviour reflected in the log. Fuzzy
miner is used when the number of activities and cases is particularly high, and the behaviour reflected
in the log is unstructured. However, other algorithms are also used. Alpha-miner algorithm is also
commonly used, and some articles on Educational data mining make use of inductive miner[12].
    The generated models can be compared with a log associated to the same process. This comparison
is called conformance checking [10, 11] and indicates to what extent the information contained in the
log is consistent with a given model. From a practical perspective, conformance aims to detect where
the logged behaviour diverges from the model. While different possibilities exist to establish
conformance indexes, the most common approach is to check alignments. It is considered an accurate
technique, which overcomes the limitations of previous algorithms. This technique tries to map a trace
that does not necessarily adjust to the model and find either where the log differs from the model or
where the model differs from the log.
    Besides conformance indicators, the disagreements between the log and the model set the basics for
process enhancement. The idea is to improve the model based on the information gathered from real


                                                     53
executions of the process. With this idea in mind, and focusing on learning scenarios, we approach the
improvement of learning processes,
    A compilation of works classified by education domains can be found in [13] . This compilation
reveals works in different unrelated areas, such as curriculum min-ing, analysis of student registration
processes or professional training. To cite some relevant works on aspects linked to student’s behaviour
in courses, [14] remarks the potentialities of process mining to detect behavioural patterns in learning
environments focusing on conformance analysis in a blended learning course. [15] uses process mining
oriented to improve the learning experience of students enrolled in MOOC courses. [16] analyses online
assessment data and recently [17] focuses on assessments in self-regulated learning environments.
    However, and to the best of our knowledge, there are no references in the literature that focus on the
impact of non-evaluative assessments through process mining on academic performance. Existing
literature focusing on this impact are based on classical techniques. Results indicate that practising
improves overall outcome [2, 18]. This last work, focused on the same course we focus on, also indicates
the potentiality of using formative quizzes, concluding that there is a link between questionnaire scores
and overall performance. However, the link with the order of the activities is not considered in this
article.
    Given this background, and linking these ideas together, we aim to analyse the learning process of
a mathematical course, based on the activity logged, and the analysis of the underlying process.
Specifically, we raise the following research question:

   RQ: Does completing non-evaluative online quizzes prior to performing evaluative assessments help
students pass the course? What is the quantitative impact of an af-firmative response?

   The next section provides additional details on the course and precises the methods used.

3. Methodology

    3.1.         Course description

    The course under analysis is an introductory course to Mathematics. This course is offered online
and covers mathematical requirements for students entering Computer Science and Multimedia related
degrees. The course is structured into 11 learning units, covering from Calculus basics to differential
calculus. The course lasts for one academic semester.
    The course is offered completely online. Students have a course landing page, where they can access
materials in different formats, post doubts and contact the instructor. However, the core of the course
is contained into a Moodle environment, where non-evaluative quizzes and evaluative activities are
offered. Each unit has a similar structure. Students are supposed to begin practising non-evaluative
questionnaires. Upon submission, they can check the answers. Additional retries are possible.
Successive submissions are provided with slight modifications, allowing for extra practice. There is
neither penalty for failing these tests, nor for not submitting them. For each of the units, students should
carry out an evaluative assignment for the unit under consideration.
    Both kinds of tests are opened on a periodical basis (weekly or bi-weekly, depending on the unit).
Non-evaluative quizzes are available two days before evaluative activities, with the aim of increasing
student practice. Once a non-evaluative quiz is open, it remains open for the rest of the course, and
students can check it later - and even resubmit -. When it comes to evaluative tests, they usually have a
deadline, which is usually a week after they are offered. However, when a student opens an evaluative
test, she must submit it within 24 hours. Figure 1 shows the schematic of the expected process for a
given unit of the course.


                                                    54
Figure 1: Expected student behaviour within a course unit.

    Figure 1 reflects the Petri Net associated to the expected behaviour for a given unit. As the unit
starts, the student can choose to perform the non-evaluative quiz (going through the place, P2), or simply
skip (through P1). In any case, he can still decide to take non-evaluative quizzes again, as many times
as needed. When the student is confident take the evaluative assessment, she should take it and proceed.
This is the last step covered before entering a new unit.
    It is also noticeable that satisfaction surveys carried out on different editions of the course show a
high level of satisfaction. The passing rations for the course under analysis is normally over 80%. In
particular, the edition analysed in this paper has 144 students. 121 of them pass the course, while 12 fail
and 11 withdraw from the course.

    3.2.         Fitting data for process discovery

   Besides the theoretical perspective of the process described in Figure 1, we aim to obtain a process
model based on real executions from the students. To do so, we need to map a case identifier, an activity
and a timestamp. Before performing the mapping, we have a look at the Moodle data we are working
with. Table 1 reflects this data, where users have been anonymised.

Table 1
Sample of Moodle data gathered
      Date       User        Activity                Scope          Description             Detail
                                                                                     The user with id
                                                                                     '125433' viewed the
 18/02/2021                 Cuestionario:                        Modulo de curso
                   User1                          Cuestionario                       'quiz' activity with
 17:05                      Nombres PRÀCTICA                     visto
                                                                                     course module id
                                                                                     '25433'.
                                                                                     The user with id
                                                                                     '152301' has viewed
                            Cuestionario:
                                                                                     the attempt with id
                            Qüestionari inicial                  Intento de
 18/02/2021                                                                          '1347943' belonging
                   User2    sobre el              Cuestionario   cuestionario
 17:06                                                                               to the user with id
                            funcionament del                     visualizado
                                                                                     '152301' for the quiz
                            curs
                                                                                     with course module
                                                                                     id '87859'.


                                                    55
                                                                                      The user with id
                                                                                      '152301' has had
                            Cuestionario:
                                                                                      their attempt with id
                            Qüestionari inicial                    Intento del
 18/02/2021                                                                           '1347943' re-viewed
                   User2    sobre el              Cuestionario     cuestionario
 17:07                                                                                by the user with id
                            funcionament del                       revisado
                                                                                      '152301' for the quiz
                            curs
                                                                                      with course module
                                                                                      id '87859'.
                                                                                      The user with id
                            Cuestionario:
                                                                                      '152301' has sub-
                            Qüestionari inicial
 18/02/2021                                                                           mitted the attempt
                   User2    sobre el              Cuestionario     Intento enviado
 17:07                                                                                with id '1347943' for
                            funcionament del
                                                                                      the quiz with course
                            curs
                                                                                      module id '87859'.

   A preliminary view indicates that we can recover time information from the Date column on Table
1. We will also consider the User column as a case identifier. In other words, we consider that each
student performs an execution of the learning process all through the course. Finally, and regarding the
activities, we decided to adjust the level of granularity to adjust potential findings to reflect the process
depicted in Figure 1.
   In order to do so, and to provide shorter mnemonics we ordered the different learning units and
assigned them a short name. The first learning unit will be identified as 1. Non evaluative assessments
will be marked as PR, indicating that they are intended to practice. They correspond to non-evaluative
quizzes and – as indicated – are expected to be covered before evaluative assessments (shortened as
EVAL).
   In addition, and to allow for extra granularity, we also distinguish between the different possibilities
Moodle adopts for a given activity. In particular, and thinking about assessments, we can have INIT
(when the user initiates a submission), SUBMIT, when the student effectively submits, and REVIEW,
when the user checks a form already sent. All these transformations are common in case studies that
use Moodle[12, 17], and in general should be performed when dealing with LMS data. In our case, they
have been performed through Python scripts, providing an output which is suitable for process analysis.
Table 2 provides a sample of the resulting output.

Table 2
Sample of Moodle log adapted for process analysis.

 Timestamp                           Case ID                             Activity
 25/02/2021 21:45                    User2                               M1-PR-INIT
 25/02/2021 22:33                    User2                               M1-PR-SUBMIT
 26/02/2021 22:34                    User2                               M1-PR-INIT
 26/02/2021 23:09                    User2                               M1-PR-SUBMIT
 28/02/2021 11:12                    User2                               M1-EVAL-INIT
 28/02/2021 21:27                    User2                               M1-EVAL-SUBMIT

   According to this sample, User2 has tried non-evaluative tests twice, before attempting the
evaluative assessment. With this information, we can proceed to graph process data linked to our course
under analysis.

4. Results

    Once the data for the course has been pre-processed according to the procedure de-scribed in Section
3, it was load into Celonis [4] for detailed analysis. Figure 2 shows an initial view of the process.


                                                     56
Figure 2: Overview of the discovered learning process model.

    While the image is only provided as a sample, we can clearly see that the process begins in a quite
linear way, but after the initial week, it turns into a spaghetti-like process. This is in fact typical of
complex processes with a high number of cases and activities. It can also be filtered based on specific
activities, transitions or groups of students. In our case, we have filtered the view in Figure 2. This
filtering confirms the results of previous analysis based on the same course indicating the relevance of
practicing before evaluative activities [2]. For this reason, we have filtered the log in order to determine
to what extent students cover the practice tests before the evaluative ones. Table 3 indicates the number
of students performing non-evaluative quizzes before evaluation for each unit, classified according to
their final course result.

Table 3
Distribution of students performing non evaluative quizzes before evaluative assessments for the
different modules based on final course mark (pass, fail, withdrawn).

 Actions                         Pass (121)              Fail (12)                  WD (11)
 M1-PR before M1-EVAL            101                     10                         3
 M2-PRM2-EVAL                    104                     9                          3
 M3-PR before M3-EVAL            98                      6                          1
 M4-PR before M4-EVAL            97                      5                          1
 M5-PR before M5-EVAL            95                      2                          1
 M6-PR before M6-EVAL            108                     3                          1
 M7-PR before M7-EVAL            90                      1                          1
 M8-PR before M8-EVAL            88                      0                          1
 M9-PR before M9-EVAL            83                      0                          1
 M10-PR before M10-EVAL          88                      0                          1


     The analysis of data in Table 3 can be analysed in terms of probability. We compute the conditional
probability of passing the course, considering whether the students perform non evaluative quizzes
before evaluative assessments or not. In addition, we also determine statistical significance through χ²
test, which is shown in Table 4.


                                                    57
Table 4
Probability to pass the subject based on whether the student carries out formative quizzes before
evaluative assessements in each module. Legend: ns: non-significant - *: p<0.05- **: p<0.01- ***: p
<0.001 - ****: p<0.0001

 Actions                        Pass (121)              Fail (12)                 WD (11)
 M1                             0.85                    0.8                       ns
 M2                             0.90                    0.61                      **
 M3                             0.92                    0.62                      ***
 M4                             0.94                    0.59                      ****
 M5                             0.97                    0.57                      ****
 M6                             0.96                    0.41                      ****
 M7                             0.98                    0.60                      ****
 M8                             0.99                    0.60                      ****
 M9                             0.99                    0.63                      ****
 M10                            0.99                    0.60                      ****

    From a graphical perspective, Figure 3 reflects the probability to pass considering whether non-
evaluative activities are performed in advance or not.


Figure 3: Differences in probabilities to pass the course.

5. Discussion
   Results in the previous section allows us to answer the question stated in Section 2 of this paper.
Performing non-evaluative quizzes before the evaluate assessment linked to the unit under study
positively impacts the chances to pass the subject. Differences in passing ratios have been quantified
and shown as statistically significant. While it shows not significant for the first unit, the differences
get noticeable after the second unit. Difference in passing ratios can be as high as 55% in Unit 6.
   The irrelevance of the first unit can even be considered normal from a conceptual perspective, even
more considering the propaedeutic nature of the course. The first unit can be considered too introductory
for some students, and they simply approach the evaluative activity directly. However, and from the
second week, and consistently with the increasing complexity, the relevance of testing in advance also
increases.
   These findings are also consistent with previous findings linked to this course [ 2]. The referred work
indicates that those students that submit practice tests show higher performance that those who choose


                                                   58
not to submit them. While this work is based on classical techniques, the results agree with the process
mining approach.
    Regarding process mining, and while this article is part of a preliminary research stage, results are
promising. Specific findings have emerged from the analysis performed. We aim to further explore its
application to deepen into the analysis of student behaviour and learning paths. We believe it can
outperform other tools, due to its focus on the process and not on static views, and also due to the ability
to get meaningful results it provides.
    Finally, we would like to remark the pedagogical implications. In particular, and based on these
findings, we would encourage Math teachers in general to design non-evaluative quiz activities, and to
remark the relevance of training before approaching evaluative tests. At the same time, students should
be encouraged to always train before evaluation. Finally, and from a learning analytics perspective, we
believe that those systems oriented to raise alarm should not only focus on the activities performed but
on the process carried out.

6. Limitations and future work

   This article shows an initial stage of our research regarding the potential use of process mining to
analyse online courses. In this sense, results should be considered as those of a preliminary work.
Nevertheless, the work carried out outlines the potential of process mining in learning analytics
scenarios. Among its potential applications, we currently focus on the analysis of student behaviour
along the course, and in par-ticular, on the learning paths they follow. This analysis can provide relevant
insight into when and why students decide to abandon the expected course path and how this impacts
academic outcomes. This is relevant both to detect behaviours that can potentially lead to unsuccessful
course outcomes and to improve course design. Authors are open to collaborate in initiatives aimed to
achieve these goals.

7. References

[1]  van der Aalst W (2012) Process mining: Overview and opportunities. ACM Transactions on
     Management Information Systems 3:1–17
[2] Figueroa-Cañas J, Sancho-Vinuesa T (2021) Investigating the relationship between optional
     quizzes and final exam performance in a fully asynchronous online calculus module. Interactive
     Learning Environments 29:. https://doi.org/10.1080/10494820.2018.1559864
[3] van der Aalst WMP (2019) A practitioner’s guide to process mining: Limita-tions of the directly-
     follows          graph.         Procedia        Computer         Science         164:321–328.
     https://doi.org/10.1016/J.PROCS.2019.12.189
[4] Process Mining and Execution Management Software | Celonis. https://www.celonis.com/.
     Accessed 22 Apr 2022
[5] Process Mining and Automated Process Discovery Software for Professionals - Fluxicon Disco.
     https://fluxicon.com/disco/. Accessed 22 Apr 2022
[6] van Hee K, Oanea O, Post R, et al (2006) Yasper: A tool for workflow model-ing and analysis.
     Proceedings - International Conference on Application of Concurrency to System Design, ACSD
     279–281. https://doi.org/10.1109/ACSD.2006.37
[7] start | ProM Tools. https://www.promtools.org/doku.php. Accessed 22 Apr 2022
[8] Peterson JL (1977) Petri Nets. ACM Computing Surveys (CSUR) 9:223–252.
     https://doi.org/10.1145/356698.356702
[9] Dijkman RM, Dumas M, Ouyang C (2008) Semantics and analysis of busi-ness process models
     in         BPMN.          Information        and       Software         Technology         50:.
     https://doi.org/10.1016/j.infsof.2008.02.006
[10] Leemans SJJ, Fahland D, van der Aalst WMP (2018) Scalable process dis-covery and
     conformance checking. Software and Systems Modeling 17:. https://doi.org/10.1007/s10270-
     016-0545-x
[11] van der Aalst WMP (2014) Process mining in the large: A tutorial. In: Lecture Notes in Business
     Information Processing


                                                    59
[12] Bogarín A, Cerezo R, Romero C (2018) Discovering learning processes using inductive miner:
     A case study with learning management systems (LMSs). Psicothema 30:322–329.
     https://doi.org/10.7334/psicothema2018.116
[13] Bogarín A, Cerezo R, Romero C (2018) A survey on educational process mining. Wiley
     Interdisciplinary Reviews: Data Mining and Knowledge Dis-covery 8:e1230.
     https://doi.org/10.1002/widm.1230
[14] Mukala P, Buijs J, Leemans M, van der Aalst W (2015) Exploring Students’ Learning Behaviour
     in MOOCs using Process Mining Techniques. Compu-ting Conference
[15] Umer R, Susnjak T, Mathrani A, Suriadi S (2017) On predicting academic performance with
     process mining in learning analytics. Journal of Research in Innovative Teaching & Learning 10:.
     https://doi.org/10.1108/jrit-09-2017-0022
[16] Pechenizkiy M, Trčka N, Vasilyeva E, et al (2009) Process mining online assessment data. In:
     EDM’09 - Educational Data Mining 2009: 2nd Interna-tional Conference on Educational Data
     Mining
[17] Cerezo R, Bogarín A, Esteban M, Romero C (2020) Process mining for self-regulated learning
     assessment      in    e-learning.   Journal      of   Computing       in   Higher    Education.
     https://doi.org/10.1007/s12528-019-09225-y
[18] Fitkov-Norris E, Lees B (2012) Online formative assessment: Does it add up to better
     performance in Quantitative modules? In: Proceedings of the 11th ECRM


                                                 60