=Paper= {{Paper |id=Vol-2415/paper06 |storemode=property |title=Analyzing students’ persistence using an event-based model |pdfUrl=https://ceur-ws.org/Vol-2415/paper06.pdf |volume=Vol-2415 |authors=Pedro Manuel Moreno-Marcos,Pedro José Muñoz-Merino,Carlos Alario-Hoyos,Carlos Delgado-Kloos |dblpUrl=https://dblp.org/rec/conf/lasi-spain/Moreno-MarcosMA19 }} ==Analyzing students’ persistence using an event-based model== https://ceur-ws.org/Vol-2415/paper06.pdf
                                                      Analyzing Students’ Persistence
                                                       using an Event-Based Model

                            Pedro Manuel Moreno-Marcos[0000-0003-0835-1414], Pedro J. Muñoz-Merino[0000-0002-2552-
                            4674]
                               , Carlos Alario-Hoyos[0000-0002-3082-0814] and Carlos Delgado Kloos[0000-0003-4093-3705]

                                                         Universidad Carlos III de Madrid
                                              {pemoreno, pedmume, calario, cdk}@it.uc3m.es



                                    Abstract. In education, persistence can be defined as the students’ ability to keep
                                    on working on the assigned tasks (e.g., exercises) despite the difficulties. From
                                    previous studies, persistence might be an important factor in students’
                                    performance. However, these studies were limited because they only relied on
                                    students’ self-reported data to measure persistence. This article aims to contribute
                                    with a novel model to measure persistence from students’ logs, which is general
                                    enough to be applied to different educational platforms. In this work, persistence
                                    is measured taking students’ interactions with automatic correction exercises.
                                    Simple metrics such as the average of students’ attempts are not valid for a
                                    precise calculation of persistence since some exercises should count more for
                                    persistence as they have been done incorrectly many times but with some limit
                                    so that a single exercise cannot bias the indicator; or when a student answers
                                    correctly we should not add new attempts. In this paper, we propose a model to
                                    measure persistence on exercises which is valid to many digital online
                                    educational platforms. The analysis of students’ persistence shows that there are
                                    not statistically significant differences of persistence between students who drop
                                    out the course or not, although persistence is shown to have a positive relationship
                                    with average grades in most of the cases. In contrast, persistence is not related to
                                    engagement with videos. These results provide an initial exploration about
                                    students' persistence, which can be important to understand how students behave
                                    and to properly adapt the course to students’ needs.

                                    Keywords: persistence, learning analytics, students’ behaviors.


                          1         Introduction

                          Digital learning platforms, such as Open edX and Moodle, offer the possibility to gather
                          a lot of information about how students are interacting and engaging with the course
                          contents. This information can be exploited to detect difficulties in the learning process
                          so as to support stakeholders in decision making, for example, through visualizations
                          and dashboards [1]. Students can face many possible difficulties, and these may lead to
                          risk of dropout, failure, lack of engagement or motivation, etc. [2].
                             Among those difficulties, low persistence of students can also be a problem that
                          could be analyzed through the analysis of students’ interactions. There are many ways




Copyright © 2019 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.
                                           LASI Spain 2019: Learning Analytics in Higher Education                                               57




                          to define persistence. In general, persistence is a personality feature. In the educational
                          context, many articles consider persistence as staying/continuing their degrees to
                          complete them (e.g., [3-4]). For example, Kimbark et al. [5] considered students were
                          persistent when they enrolled in the following spring semester. However, for many
                          other authors (e.g., [6-7]), persistence is treated as a synonym of perseverance so that a
                          student is considered persistent when he/she keeps on working on a task (e.g., an
                          exercise) after trying to solve it incorrectly [6]. For this article, this latter definition will
                          be considered, and from now, all references about persistence will treat persistence in
                          this way. Particularly, we focus on specific activities in a course and persistence will
                          be measured from interactions with individual exercises so that a student with high
                          persistence is a student who attempts exercises again and again until the correct solution
                          is obtained. However, persistence should not be the average of attempts, a typical
                          indicator in many previous studies. We should only consider attempts until the student
                          solve the exercise correctly. In addition, all exercises cannot count the same since some
                          exercises enable the possibility of being more persistent. Moreover, a limit should be
                          established so that single exercises cannot bias the indicator.
                             Research in this field has shown that being persistent can lead to a higher academic
                          productivity [8] and academic achievement [9]. Muenks et al. [10] conducted several
                          regression analyses and found that persistence was useful to predict grades, although
                          self-regulated learning (SRL) and engagement variables achieved higher predictive
                          power. Authors in [10] also concluded that SRL skills, such as effort regulation (ability
                          to maintain effort/attention despite tasks are not interesting and there are distractors
                          [11]) and cognitive SRL (which includes planning, monitoring and learning strategies,
                          among others), can be related to persistence. However, high values of persistence do
                          not necessarily mean that SRL skills are good because if students do not self-reflect on
                          what they are doing after attempting each exercise, their learning might be only
                          superficial [12]. Despite further research can be done, these findings suggest that
                          although high persistence is not necessarily positive, low persistence can be negative
                          since students with low persistence lack the ability to confront their problems with the
                          tasks they solve incorrectly. However, these results may depend on the specific context.
                             In order to alleviate the problem of low persistence, it would be beneficial to be able
                          to measure the level of persistence of students from their events when working in an
                          online environment and know the prevalence of the correspondent behavior. Prior
                          research work has mainly modelled persistence through self-reported data (e.g., [7-10,
                          13]), but no models have been defined to measure persistence from students’ events in
                          a digital platform. Self-reported data have many issues such as students might not be
                          aware of their persistence or they might lie. The development of these models based on
                          students’ events would be useful because they would provide information to instructors
                          that they could use to modify their materials and/or provide scaffolding questions
                          (similar to some Intelligent Tutoring Systems [14]) in the exercises to make students
                          easier to progress throughout the tasks and increase their persistence and engagement.
                          Moreover, instructors may take actions to encourage their students to finish their
                          exercises so as to modify students’ behavior. Apart from that, if students are warned
                          about their lack of persistence, they may also take corrective actions.




Copyright © 2019 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.
                          58               LASI Spain 2019: Learning Analytics in Higher Education




                             In this context, it is important to not only measure the persistence of each student
                          but also to have aggregated data about the prevalence of persistence (i.e., how
                          persistence is distributed among learners) to define actions. Furthermore, the course
                          context and methodology are important factors because they may affect how persistent
                          students are. For example, students might be more/less persistent depending on the
                          importance/weight of the exercises within the course. Chase [15] also analyzed this
                          issue and concluded that persistence could vary depending on the domain of expertise
                          of the students (e.g., students can be more persistent in courses they find easier).
                          Nevertheless, according to Csikszentmihalyi [16], if exercises are too easy/difficult,
                          students may feel bored/anxious, and that may also affect persistence. Apart from the
                          context, as mentioned above, there can be many other variables (such as academic
                          achievement) that can affect or be affected by the persistence.
                             In this line, the aim of this paper is to determine a model to measure the persistence
                          of students through their events related to exercises in a digital platform and obtain
                          conclusions about students’ persistence. Specifically, the objectives of this work are:
                                  O1. Propose a model to measure students’ persistence based on their
                                        interactions in a digital platform.
                                  O2. Analyze the prevalence of the persistence.
                                  O3. Analyze how the persistence is related to other variables about students’
                                        behavior and performance.
                             The structure of the paper is as follows: section 2 presents an overview of what has
                          been researched on detection of students’ behaviors and particularly on persistence;
                          section 3 describes the context and data collection techniques; section 4 details the
                          model to measure persistence; the analysis and discussion of the results are provided in
                          Section 5; finally, the main conclusions are detailed in Section 6.


                          2         Related Work

                          Many algorithms have identified variables related to students’ personality [17],
                          sentiments [18] and problems [19], such as heavy work load, among others. One of
                          these possible personality features is students’ persistence. Persistence, sometimes also
                          referred as perseverance, is the students’ ability to keep on working with effort on a
                          task (e.g., an exercise) after facing difficulties (e.g., after getting a wrong answer) [20].
                          Many researchers have explored the persistence of students in different scenarios. For
                          example, authors in [6] carried out a study with 10-to-12-year-old students who used a
                          digital educational game. The game was designed to make students face exercises that
                          were unlikely to be solved, and each time the student failed a question, he/she was
                          presented with different options, which were used to measure students’ persistence
                          (e.g., continue working, get an easier exercise, take a break to play a game, etc.). Their
                          study showed that students with higher persistence managed to solve tasks at higher
                          difficulty levels. Moreover, Eley et al. [7] analyzed personality profiles from medical
                          students and they found that 60% of them had a profile with high persistence and low
                          harm avoidance (personality trait with tendency towards pessimism, anxiety and worry
                          about problems), which can be important to succeed in medicine.




Copyright © 2019 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.
                                           LASI Spain 2019: Learning Analytics in Higher Education                                               59




                              Furthermore, research has also focused in the analysis of how persistence can be
                          related with other personality features. For example, Credé et al. [21] found a strong
                          relationship between persistence and conscientiousness. Datu et al. [22] also analyzed
                          persistence of Filipino high school students and found that persistence was a good
                          predictor of behavioral engagement, emotional engagement and flourishing (i.e.,
                          optimal psychological state, characterized by optimism and great purpose in life [23]).
                              Apart from students’ behaviors, researchers have also analyzed the relationship
                          between persistence and academic performance, although results vary depending on the
                          study. For instance, authors in [13] conducted relative weight analyses to evaluate the
                          relationship between grit (a combination of students’ consistency of interests in the
                          topics and persistence for long term goals [24]) and found that the two elements of grit
                          (consistency of interests and persistence) had weak predictive power. Similar
                          conclusions were obtained by Bazelais et al. [25], who found persistence was not a
                          significant predictor of course success, unlike prior academic performance. However,
                          Meyers et al. [26] found that persistence was useful to differentiate between learners
                          who drop out or not in secondary school. This finding was supported by Farrington et
                          al. [27], who concluded that persistence has a direct relationship with grades.
                              Previous results suggest that more research is needed to analyze the relationship
                          between persistence and success. Also, one of the limitations of the contributions in the
                          literature is that they usually analyze persistence from self-reported data (e.g., [7-10,
                          13, 15, 20, 22, 25, 26]), and these data may be biased because of learners’ beliefs and
                          motivations. Few contributions, such as [28], measured persistence based on students’
                          events. Particularly, they measured persistence as the time spent on attempts where the
                          exercises were not solved correctly. In this article, we aim to contribute with the
                          analysis of persistence based on students’ interactions in a digital platform, and we
                          focus on the attempts needed until the student correctly solves the exercise.
                              Specifically, this paper innovates with a method to measure persistence from
                          students’ events collected from the digital platform (objective O1). Moreover, the
                          article also presents a novel analysis about the prevalence of persistence (objective O2)
                          and the analysis of the relationship between persistence and other variables, such as
                          dropout and performance (objective O3).


                          3         Description of the context and data collection

                          The analysis of students’ persistence was carried out using data from SPOCs (Small
                          Private Online Courses) [29], offered by Universidad Carlos III de Madrid. The
                          institution has a local instance of Open edX and encourages professors to develop
                          SPOCs as a way to support face-to-face courses. Particularly, three possible models of
                          use are defined for the use of SPOCs: (1) SPOCs needed to pass the course or with an
                          important weight in the final grade, (2) SPOCs that are part of the course (often used to
                          be combined with flipped classroom) but do not count for the summative evaluation,
                          and (3) SPOCs that are only a recommended support for the course but that are not
                          mandatory. In total, there are data available from 38 SPOCs, which comprise all the
                          thematic areas of the studies the university offers, mainly Social Sciences, Formal




Copyright © 2019 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.
                          60               LASI Spain 2019: Learning Analytics in Higher Education




                          Sciences and Engineering. However, the characteristics of each SPOC (e.g., syllabus,
                          purpose, structure, etc.) are unknown.
                              As the SPOCs are hosted in Open edX, data format is defined by edX [30], and there
                          is information available about activity, videos and exercises (there could be potentially
                          other features such as forum data, but SPOCs are mainly designed with just videos and
                          automatic correction exercises). For this analysis of the persistence, only information
                          about exercises is considered because we want to measure if students continue after
                          having difficulties (i.e., getting a wrong answer) and the feedback about how students
                          are doing is only contained in exercises. Particularly, only the events labelled as
                          "problem_check", which are the events produced when a student sends the answer to
                          an exercise, are considered. In total, there are 270,183 events in the 38 SPOCs from
                          3,598 different students. However, there are students who enrolled in more than one
                          SPOC. In order to have the value of persistence for each course, which can be more
                          relevant for instructors as they may only be interested in the data about their courses,
                          the combination of course-student is considered. With this assumption, there are 4,382
                          pairs of course-student and 210,125 combinations of course-student-exercise. As there
                          are 270,182 events, the global number of attempts for each exercise per student is 1.29.
                              With regard to the exercises, they can have many different types, such as multiple
                          choice, checkboxes, dropdown, numerical input and text input problems. All these
                          formats admit automatic grading. Grade from each exercise can be a continuous value
                          between 0 to 100%. However, in most of the cases (95%), exercises are graded as
                          correct or incorrect, as the format of the most common exercises can only accept binary
                          values (e.g., a numerical input exercise or a multiple-choice question can only be right
                          or wrong, while checkboxes can admit partial grades). For the analysis, no information
                          is known about the format of each exercise and the number of allowed attempts. This
                          can be an important limitation because, for example, an student cannot be persistent if
                          the instructor designs the exercises so that only one attempt is allowed (which is typical
                          in summative exercises), and a student is very likely to be persistent if there are only
                          true/false questions where the student knows the right answer once he/she gets feedback
                          from his/her initial attempt and sees that the initial answer is wrong.
                              In order to alleviate the aforementioned limitation, the following filtering criterion
                          was applied: exercises where all students who attempted them had two attempts at
                          maximum were removed. Note that if one exercise is not excluded, there might be
                          students who get it right using one/two attempts and the exercise remains valid for these
                          students. This rule served to (1) eliminate true/false exercises (or exercises where only
                          two options were possible), (2) eliminate easy problems that all students get right with
                          few attempts (they are not useful to show persistence), and (3) eliminate exercises
                          where only 1-2 attempts are allowed (a typical policy in the institution is allowing 1
                          attempt for summative exercises and 2 attempts for other exercises). Therefore, after
                          the filtering, all exercises had a format and number maximum of attempts that allowed
                          students to show persistence. With these criteria, 656 exercises (from 28 SPOCs) were
                          included out of the 3002 exercises in the SPOC. In order to justify the threshold of
                          considering two attempts at maximum for all students to include an exercise in the
                          calculation, an analysis with three attempts was carried out. For this case, only 189 were
                          included, which is considerably lower than in the case with two attempts. Because of




Copyright © 2019 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.
                                           LASI Spain 2019: Learning Analytics in Higher Education                                               61




                          that, for the rest of the analysis, the initial filtering of considering exercises where there
                          was at least one student with three attempts or more is considered (i.e., removing
                          exercises where all students had two attempts at maximum).


                          4         Description of the model to identify persistence

                          The persistence in this analysis is related to the extent students keep on trying the
                          exercises until they get it right after doing it wrong. This section focuses on identifying
                          how to model persistence based on students’ interactions with exercises. The aim is to
                          define an overall indicator of persistence, although it can be based on the persistence of
                          individual exercises. A priori, it is possible to say that the persistence of a student will
                          be minimum if he never tries the exercise again when he gets it wrong (assuming there
                          are no limits of attempts). Similarly, the persistence will be maximum if the student
                          always ends up getting the right answer after trying the exercise several times. With
                          this idea of persistence, we do not take into account how the student got the correct
                          answer and/or whether persistence is good or bad for learning. If a student gets the
                          answers using a trial and error strategy, it will be probably bad for his learning process
                          (learning will be probably superficial), but the student is considered to be persistent
                          because he/she always gets the right answer (which is our definition of persistence).
                             Considering the previous ideas, it is easy to determine when a student is fully
                          persistent or not. However, the difficulty is how to model the overall persistence of a
                          student which may sometimes be persistent and sometimes not. For the definition of
                          persistence, the sequence of attempts and the associated results (grades) is considered.
                          For this model, grades of an exercise can only be 0 or 1. This is because 95% of the
                          rows with course-student-exercise in all the SPOCs are only graded with 0-1. This can
                          also be generalizable to quite a few learning environments. The remaining 5% (which
                          can include, e.g., checkbox exercises) has been discretized in 0-1 to be consistent with
                          the rest of the exercises by rounding grades down (e.g., 0.8 is converted to 0). Regarding
                          the sequence of attempts, it contains the grade of each attempt separated by spaces.
                          Table 1 shows some examples of sequences and the idea of associated persistence.

                                                     Table 1. Sequence of attempts of difference exercises

                            ID     Sequence       Idea of persistence
                            1      0              The student is not persistent as he/she does not try the exercise again (after getting 0)
                                                  in order to get correct answer.
                            2      01             The student shows persistence as he/she attempts the exercise again to get it right.
                            3      000001         The student shows persistence and he/she shows more persistence than in case 2 as
                                                  he/she needed a lot of attempts until getting the answer right.
                            4      0000           The student shows certain persistence as he/she has tried the exercise several times
                                                  but he/she has not got the correct answer. The persistence should be greater than in
                                                  case 1 but smaller than in cases 2 and 3.


                             Table 1 shows that despite the final outcome of the exercise can be the same, the
                          persistence the student shows in each case is different because in some cases the student




Copyright © 2019 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.
                          62               LASI Spain 2019: Learning Analytics in Higher Education




                          tried the exercise more times and showed to be more persistent despite the difficulties.
                          Considering this fact, the following assumptions have been made for the model:
                           • If students get the answer right in the first attempt, no persistence is shown
                                because there is not a situation where an answer is wrong and the student decides
                                whether attempting the question again or not (to show persistence). However, this
                                fact does not mean that students are not persistent. Therefore, events where the
                                answer is correct in the first attempt are excluded. Similarly, re-attempts of correct
                                exercises are excluded because the student already got the correct answer.
                           • Students show more persistence if they need more attempts to solve the exercise,
                                but they should not be penalized if they solve the exercise with few attempts.
                              With these assumptions, the aim is to define an indicator of persistence in the range
                          0-1 for the learner persistence. One initial question is how to consider the persistence
                          of each exercise for the global persistence. One possible approach is to compute a value
                          of persistence for each exercise and calculate the average for all of them. The limitation
                          of this approach is that it does not allow weighting the exercises easily so that exercises
                          where the students show more persistence have a higher weight. For example, in Table
                          1, a student will get the maximum value of the maximum persistence (1) in both cases
                          2 and 3 because he/she has attempted the exercises until getting the correct answer.
                          However, the student "shows" more persistence in case 3 because he/she needed more
                          attempts. With "shows", we refer to the persistence which is visible or demonstrated
                          with the exercises (by making attempts). In order to weight the exercises depending on
                          the persistence showed, it is proposed to compute the persistence as a single fraction,
                          where the numerator and denominator increase every time the student shows
                          persistence in each exercise. If more persistence is shown (e.g., case 3), more units will
                          be added to both numerator and denominator to give more importance to the exercise.
                              This way, this model will increment one unit in the numerator and denominator each
                          time the student attempts an exercise once he/she has got the exercise incorrectly at
                          least once. In order to penalize when a student does not get the right answer, a penalty
                          variable is defined in the denominator. If a student fails the exercise at first attempt and
                          he never attempts the exercise again, the numerator will not change, but he will receive
                          a penalty in the denominator to decrease the overall persistence. Formula 1 shows how
                          to compute the persistence, where n is the number of exercises the student has
                          attempted, and i represents a particular exercise:
                                                                            ∑𝑖=𝑛
                                                                             𝑖=0 min⁡(𝑎𝑡𝑡𝑒𝑚𝑝𝑡𝑠𝑖 −1,𝑠𝑡𝑜𝑝)
                                       𝑃𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑒 =            𝑖=𝑛                                                                    (1)
                                                               ∑𝑖=0 min(𝑎𝑡𝑡𝑒𝑚𝑝𝑠𝑖 −1,𝑠𝑡𝑜𝑝)+𝑝𝑒𝑛𝑎𝑙𝑡𝑦·(1−𝑓𝑙𝑜𝑜𝑟(𝑔𝑟𝑎𝑑𝑒𝑖 ))

                             As it has been mentioned, the numerator and denominator increment depending on
                          the number of attempts. As the first attempt is not considered, the value is incremented
                          by the number of attempts of the exercise (i) – 1. The formula also considers a parameter
                          stop. The reason is that if a student has a very large number of attempts, their effect
                          could be very strong respect to other exercises. For example, if a student never attempts
                          the exercise again once it is wrong, but he/she does with one exercise and he/she
                          attempts it many times, he may have good persistence because the weight of a single
                          case. To prevent that, the stop variable defines the maximum that can be summed for
                          each exercise. In practice, this parameter seems to be irrelevant in this context because




Copyright © 2019 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.
                                           LASI Spain 2019: Learning Analytics in Higher Education                                               63




                          from the 58,217 exercises where the student does not get the answer right in the first
                          attempt, only 860 (1.5%) have four or more attempts. Nevertheless, it is included as it
                          may be relevant in other courses where there are more open questions and students may
                          need more attempts to solve the questions.
                              In the denominator, there is a penalty for the non-correct exercises. The final grade
                          of the exercise is discretized in 0-1, as mentioned, with the floor function and whenever
                          the exercises is wrong there is a penalty. If the exercise is correct, grade is 1 and the
                          penalty is avoided. With this formula, there are two variables: stop and the penalty.
                          These variables can be adjusted depending on the context. The idea is that the stop
                          represents the number of attempts needed to achieve the maximum persistence that you
                          can show (if the answer if correct). For example, it can be considered that if you attempt
                          the exercise 10 times after the first incorrect attempt, you show the maximum
                          persistence and therefore if you have more than 10 attempts, 10 will be considered to
                          avoid overweighting the exercise in the formula. In this scenario, stop will be set as 10
                          although it will not affect the results considerably as mentioned.
                              The idea of the penalty is that it represents the number of attempts needed to
                          compensate an event of non-persistence so as to make the overall persistence 0.5. For
                          example, if the penalty is 1, if one student gets 0 in one exercise and 1 in another one
                          in the second attempt (sequence 0-1), the persistence would be 0.5. We believe that 1
                          is a low value because low persistence is shown with the pattern 0-1. In contrast, if the
                          penalty is very high, the students would need to demonstrate a lot of persistence to
                          overcome a non-persistent event. In this case, as most questions are answered with 3-4
                          attempts at most (after doing the filtering, 71% of the exercises are answered with 3
                          questions at most and 82% with 4 questions at most) and they all have more than two
                          options, we considered a penalty of 4. This way, a non-persistent event can be
                          compensated with two questions with three options at most where the student gets the
                          answer in the last attempt. Nevertheless, this value can be context-dependent and can
                          be adapted in each scenario. Moreover, an instructor may also want to adjust this value
                          according to his/her own criteria depending on the context or methodology. With this
                          approach, the persistence in the example in Table 1 would be
                          (0+1+5+3)/(4+1+5+7)=9/17 = 0.53, which is reasonable as the student did not get the
                          correct answer in two of the exercises but he/she attempted one of them several times.


                          5         Results

                          In this section, the analyses to achieve the objectives stated in Section 1 and the
                          discussion of the results are presented. First, the analysis of the prevalence of
                          persistence is discussed. Next, the relationship between the persistence and other
                          variables is detailed.

                          5.1       Analysis of the prevalence of persistence

                          The first question is about how the persistence is distributed among the students in the
                          SPOC (i.e., prevalence). In order to evaluate this, the model to determine the
                          persistence, presented in Section 4 has been used. As a first result, the histogram of the




Copyright © 2019 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.
                          64               LASI Spain 2019: Learning Analytics in Higher Education




                          persistence is provided in Fig. 1. This histogram reflects that most of the students have
                          either a fair/moderate persistence (between 0.3-0.7) or the maximum persistence
                          (persistence above 0.9). Among the 3,062 pairs of course-student where persistence
                          was defined (there were 432 cases where the student always got the answer right and
                          thus there is no information about persistence), 1,216 (33%) cases represented students
                          who were persistent occasionally, i.e., their persistence was between 0.3-0.7. Among
                          those learners, most of them are between 0.4-0.5. Moreover, there are 988 (32%) with
                          very high persistence (above 0.9) while there are only 155 (5%) students with low
                          persistence (below 0.3). The mean of persistence is 0.70, and the median 0.67, which
                          means that many students do not give in when they face difficulties and keep on trying
                          their exercise. The high prevalence of persistence might be due to the considered
                          context since most exercises do not have many possible options so these types of
                          exercises might engage students to make different attempts until they succeed since
                          they know they can get the correct answer with a relative low effort.




                          Fig. 1. Histogram with the prevalence of the persistence

                              In order to delve into the prevalence of persistence, several profiles of students have
                          been identified for each range of persistence, based on an initial exploratory analysis of
                          the data (considering the ranges from the histogram and the percentages of attempted
                          exercises). For this analysis (and the analysis in the next section), persistence values
                          were merged with other indicators about videos and exercises. In some cases, indicators
                          were not available because for example, the student did not have interactions with
                          videos. This resulted in a merged dataset of 2,522 pairs of course-students (after
                          removing the 432 cases with undefined persistence). The description of these profiles
                          is as follows.
                           • Students who were not interested in the exercises and with low persistence: There
                                 were 90 students with less than 30% of attempted exercises and persistence (62%
                                 of students with persistence below 30%). Therefore, it was a group of students
                                 who did not take the exercises and they only sampled some of them without
                                 showing much interest. However, there were a few cases of students who were
                                 active in watching videos. In this case, their persistence can be less representative
                                 as they put less effort.
                           • Students with low persistence having done most of the exercises. There were 8
                                 students with more than 80% attempted exercises but with less than 30% of




Copyright © 2019 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.
                                           LASI Spain 2019: Learning Analytics in Higher Education                                               65




                                  persistence (6% of students with persistence below 30%). They were students
                                  whose average grade of the exercises they attempted was always above 83%.
                                  Therefore, they were students who usually got the correct answers at first attempt
                                  and they did not care about having some incorrect answers as they were above the
                                  passing rate (e.g., 50%). Nevertheless, it was not a frequent behavior as there were
                                  only 8 students in this group.
                            •     Average students with low persistence: There were 47 students with less than 30%
                                  of persistence and between 30-80% of attempted exercises. This group
                                  represented the 32% of students with persistence below 30%. They usually only
                                  engaged with part of the SPOC and their interactions with videos were also low
                                  as they only watched 27% of the videos on average. The average grade of the
                                  exercises they attempted was above 77% for 75% of the cases (above the first
                                  quartile), but they only attempted 49% of the exercises on average. Similarly to
                                  the previous group, they may not care about doing everything right if they
                                  achieved a grade above the passing rate.
                            •     Students with medium persistence: There were 1,192 students with persistence
                                  between 30-70%. These students had, on average, 67% of the exercises right in
                                  their first attempt, and on average they completed 82% of the exercises they
                                  attempted. Therefore, from the 33% of the exercises they did not solve right at
                                  first attempt, they got 15% more correct in successive attempts, so they were
                                  persistent about half of the times on average. Moreover, they watched on average
                                  38% of the videos.
                            •     Students with high persistence: There were 939 students with persistence above
                                  90%. Although there were 148 students who interacted with less than 20% of the
                                  exercises and their persistence was less representative, on average, 48% of the
                                  exercises were done, which makes persistence representative enough. These
                                  students were often good students as their average grade on first attempt was 80%
                                  on average, but these students wanted to have their exercises right and they tried
                                  their exercises again until got them right. However, they only watched 31% of the
                                  videos on average.

                              These profiles show that there are very different behaviors in relation to the
                          persistence. This could motivate adaptive tools depending on the different students’
                          profiles. Moreover, there is a considerable portion of students with low interactions
                          with very high/low persistence. In those cases, it is important to note that the persistence
                          is less significant, particularly for those learners with low persistence, as they may be
                          just sampling the exercises without intention to complete them. For other profiles with
                          low-medium persistence, further work should be done to analyze what kind of
                          interventions could be done to raise their persistence.

                          5.2       Relationship between persistence and students’ behavior and performance
                          In this section, an analysis of how the persistence is related with other variables about
                          students’ behavior and performance are presented. Firstly, one of the variables with
                          great interest in the literature is dropout. Many researchers have analyzed which




Copyright © 2019 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.
                          66               LASI Spain 2019: Learning Analytics in Higher Education




                          variables affect dropout [2] and they have proposed predictive models to forecast which
                          learners will drop out the course (e.g., [31]). This fact is particularly relevant because
                          of the high dropout rates in the courses [32]. The first part of the analysis aims to
                          discover if there is a relationship between persistence and dropout. In this case, a
                          student is considered to have dropped out if he/she has not completed at least 75% of
                          the exercises of the SPOC. For this analysis, only the first semester of the academic
                          year 2018/2019 will be considered (as the second semester is not finished yet). In order
                          to analyze the relationship between persistence and dropout, a boxplot was made using
                          both variables (see Fig. 2). This figure shows that the persistence of students who do
                          not drop out the course is similar to those who drop out. The mean of persistence for
                          both groups is 0.71 and 0.68, respectively. The difference of persistence, evaluated
                          through the Mann-Whitney test, was not statistically significant (p-value = 0.25). This
                          implies that persistence is not crucial to complete the course. A possible reason is
                          because persistence is only measured with the attempted exercises, and students may
                          try to complete the exercises they attempt (even using brute force if necessary) but they
                          may stop using the SPOC at some point. Another possible reason is that these types of
                          exercises might not discriminate persistent and non-persistent students since as there
                          are few options to select, students make a lot of try-steps. Other reasons may be due to
                          the specific context. Because of that, more research is needed to delve into the reasons.




                          Fig. 2. Boxplot with the relationship between persistence and dropout

                             After analyzing the relationship between persistence with dropout, the relationship
                          between the average grade (considering only attempted exercises and all the attempts)
                          is presented. Moreover, an analysis of the relationship between the percentage of
                          completed videos is presented to discover whether persistent students also complete the
                          videos or not. In order to analyze these variables, plots have been made relating
                          persistence and the variables (see Fig. 3).
                             Fig. 3 illustrates that the average grade has clear positive relationship with
                          persistence as the average grade tends to be higher when the persistence is higher. While
                          average grade fluctuates more for students with low persistence, i.e., there can be
                          students with low persistence with high grades and more variance is presented, students
                          with high persistence usually achieve good grades. Nevertheless, there are some cases
                          of students with high persistence but low grades. As the average grade is only computed
                          with attempted exercises, this means that students had a very poor performance on
                          exercises where the number of attempts is limited (which are excluded for persistence




Copyright © 2019 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.
                                           LASI Spain 2019: Learning Analytics in Higher Education                                               67




                          but not for average grade). This implies that there may be cases where students can be
                          persistent by using brute force to solve the exercises but they are not actually learning.
                          Thus, persistence do not necessarily mean learning, although it may mean effort.
                          However, the trend is that the average grade is more positive as the persistence
                          increases. This suggests that while persistence is not crucial for success, it can be
                          beneficial and having good persistence can lead to high performance, provided that the
                          student reflects on the questions and do not guess the answers by brute force.
                             In contrast, when analyzing the relationship between persistence and percentage of
                          completed videos, results show that there is no relationship between both variables (see
                          Fig. 3b). This means that although completing videos is an indicator of constancy and
                          work in the SPOC, it is not related to persistence. There are students that can be very
                          engaged in watching videos but they are not engaged with the exercises and they are
                          not persistent enough to complete them (with correct answers), and there can be other
                          students that use the SPOC to practice with exercises, but they are not interested in
                          watching the videos. Thus, engagement with different course materials can be different,
                          and an instructor should highlight the importance of all the parts they want to ensure
                          students actually cover in the SPOC.




                          Fig. 3. Relationship between persistence and (a) average grade, (b) % of completed videos


                          6         Conclusions

                          This work presents a novel method to determine students’ persistence from low-level
                          events. This model can identify the persistence of a student in scale 0-1, taking into
                          account if a student keeps on trying an exercise once he/she fails to solve it correctly or
                          not. The model can be applied to many educational platforms provided data about
                          exercises and attempts are available, although the model could be refined if context is
                          known. With this model, the persistence of the students in 28 SPOCs is computed. The
                          analysis of the prevalence shows that there are many different profiles depending on
                          the persistence. Most students have either a fair/moderate persistence (between 0.3-0.7)
                          or a high persistence (above 0.9). Results show that there are many students with low
                          and fair/moderate persistence who have an average grade above the passing rate on the
                          exercises they attempt and they do not mind having some exercises wrong. In contrast,
                          results show that there are students who always persist until they get their answers right.
                          It is interesting that there are few students with low persistence, but their profiles are
                          different in terms of engagement; some of them are not persistent because they do not




Copyright © 2019 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.
                          68               LASI Spain 2019: Learning Analytics in Higher Education




                          keep on trying their exercises until get the correct answer while others are not persistent
                          just because they are not trying the exercises and they just sample some of them.
                              When persistence was compared with other variables, results also show that there is
                          not statistically significant difference between students who drop out the course or not
                          in terms of persistence. This means that although a learner seems to be persistent in the
                          exercises he/she does, it does not mean that he/she will complete the course. This can
                          happen, e.g., when students attempts few exercises (regardless they persistent in them)
                          and/or use trial and error strategies. Similar conclusions were found with the
                          engagement with videos, as no relationship was found with persistence. In contrast, the
                          average grade was positively related with persistence. However, there were cases where
                          average grade was low but persistence was high, which means that the student was not
                          probably learning, and he/she was getting the correct answers by using brute force.
                              Despite the abovementioned findings, there are some limitations that are worth
                          mentioning. Results might be tied to the specific context. In our case study, it is
                          unknown the typology and allowed number of attempts of each specific exercise. This
                          is a very important limitation because for example, it is easier to be persistent when
                          questions have a limited set of answers (e.g., multiple-choice questions) than in open-
                          ended questions (e.g., write the number of the solution of the exercise). Indeed, most
                          exercises had a limited number of answers so this could explain higher values of
                          persistence. For this analysis, some filters were included to alleviate this issue but the
                          ideal would be having information about this. Furthermore, another limitation is related
                          to the way persistence is computed. As persistence is a subjective characteristic of
                          people, many measures could be presented, and each one can have its advantages and
                          disadvantages. The measure in this paper can give an idea of how persistent the student
                          is in the exercises he/she has done, but for example, does not consider what it has not
                          been done, and the model does not take into account how the exercises are solved (e.g.,
                          it would be bad if a trial and error strategy is used). While the last fact is not the focus
                          on this article, that could be also interesting to include in future models.
                              As future work, in relationship with the aforementioned limitations, it would be
                          relevant to analyze the persistence in educational platforms where information about
                          the context and methodology is known, and add a corrective factor for non-attempted
                          exercises and exercises where there is no reflection between attempts (e.g., elapsed time
                          between attempts is very short). In addition, it would be relevant to use the model in
                          courses when students may usually need more attempts and grades are not typically
                          binary (0/1). Moreover, it would be interesting to delve into the profiles of persistence.
                          In order to do that, clustering techniques could be used to have a detailed picture of
                          how different students are based on their persistence. Furthermore, a more detailed
                          analysis could be done to explore the relationship between persistence and other
                          variables of students’ behaviors and performance, and the types of exercises/resources.
                          In this line, persistence could be also introduced in predictive models, for example, to
                          forecast dropout and or performance, to analyze its predictive power. Finally, it would
                          also be important to analyze which factors can increment/decrease persistence and
                          provide instructors information about the persistence so that they can analyze possible
                          interventions that can enhance the persistence of their students and see if that can have
                          a positive effect on their overall learning.




Copyright © 2019 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.
                                           LASI Spain 2019: Learning Analytics in Higher Education                                               69




                          Acknowledgements

                             Work partially funded by the LALA project (grant no. 586120-EPP-1-2017-1-ES-
                          EPPKA2-CBHE-JP). The LALA project has been funded with support from the
                          European Commission. This work has also been partially funded by FEDER/Ministerio
                          de Ciencia, Innovación y Universidades – Agencia Estatal de Investigación/ project
                          Smartlet (TIN2017-85179-C3-1-R), and by the Madrid Regional Government, through
                          the project e-Madrid-CM (S2018/TCS-4307). The latter is also co-financed by the
                          Structural Funds (FSE and FEDER). It has also been supported by the Spanish Ministry
                          of Science, Innovation and Universities, under an FPU fellowship (FPU016/00526).
                          This publication reflects the views only of the authors, and funders cannot be held
                          responsible for any use which may be made of the information contained therein.


                          References
                            1. Verbert, K., Duval, E., Klerkx, J., Govaerts, S., & Santos, J.L.: Learning Analytics
                               Dashboard Applications. American Behavioral Scientist 57(10), 1500-1509 (2013).
                            2. Moreno-Marcos, P.M., Alario-Hoyos, C., Muñoz-Merino, P.J., & Delgado Kloos, C.D.:
                               Prediction in MOOCs: A review and future research directions. IEEE Transactions on
                               Learning Technologies (In Press)
                            3. Kuh, G.D., Crue, T.M., Shoup, R., Kinzie, J., 6 Gonyea, R.M.: Unmasking the effects of
                               student engagement on first-year college grades and persistence. The Journal of Higher
                               Education 79(5), 540-563 (2008).
                            4. Tinto, V.: Reflections on student persistence. Student Success 8(2), 1-8 (2017).
                            5. Kimbark, K., Peters, M.L, & Richardson, T.: Effectiveness of the student success course on
                               persistence, retention, academic achievement, and student engagement. Community College
                               Journal of Research and Practice 41(2), 124-138 (2017).
                            6. Silvervarg, A., Haake, M., & Gulz, A.: Perseverance Is Crucial for Learning. "OK! But Can
                               I Take a Brea?". In: Penstein Rosé, C. et al. (eds.) Artificial Intelligence in Education. AIED
                               2018, LNCS, vol. 10947, pp. 532-544. Springer, Cham (2018).
                            7. Eley, D.S., Leung, J., Hong, B.A., Cloninger, K.M. 6 Cloninger, C.R.: Identifying the
                               dominant profiles in medical students: implications for their well-being and resilience. Plus
                               One 11(8), 1-16 (2016).
                            8. Hodge, B., Wright, B., & Bennett, P.: The role of grit in determining engagement and
                               academic outcomes for university students. Research Higher Educ. 59(4), 448-460 (2018).
                            9. Wolters, C.A., & Hussain, M.: Investigating grit and its relations with college students’ self-
                               regulated learning and academic achievement. Metacogn. Learning 10(3), 293-311 (2015).
                           10. Muenks, K., Wigfield, A., Yang, J.S., & O’Neal, C.R.: How True is Grit? Assessing Its
                               Relations to High School and College Students’ Personality Characteristics, Self—
                               Regulation, Engagment, and Achievement. J. Educat. Psychology 109(5), 599-620 (2017).
                           11. Pintrich, P.R, Smith, D.A., Garcia, T., & McKeachie, W.J.: A manual for the use of the
                               Motivated Strategies for Learning Questionnaire (MSLQ). Ann Arbor, MI: National Center
                               for Research to Improve Postsecondary Teaching and Learning (1991).
                           12. Young, M.R.: The Motivational Effects of the Classroom Environment in Facilitating Self-
                               Regulated Learning. Journal of Marketing Education 27(1), 25-40 (2005).




Copyright © 2019 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.
                          70               LASI Spain 2019: Learning Analytics in Higher Education




                           13. Steinmayr, R., Weidinger, A.F., & Wigfield, A.: Does students’ grit predict their school
                               achievement above and beyond their personality, motivation, and engagement?
                               Contemporary Educational Psychology 53, 106-122 (2018).
                           14. Feng, M., Heffernan, N., & Koedinger, K.: Student modeling in an intelligent tutoring
                               system. In: Stankov, S., Glavinić, V., & Rosić, M. (eds.) Intelligent Tutoring Systems in E-
                               Learning Environment: Design Implementation and Evaluation, pp. 208-236. IGI Global,
                               Pennsylvania (2011).
                           15. Chase, C.C.: Motivating expertise: equipping novices with the motivational tools to move
                               beyond failure. In: Staszewski, J.J. (ed.) Expertise and Skill Acquisition: the Impact of
                               William G Chase. Pshychology Press, New York (2013).
                           16. Csikszentmihalyi, M.: Flow: The Psychology of Optimal Experience: Finding Flow. Harper
                               Collins, New York, NY (1990).
                           17. Chen, G., Davis, D., Hauff, C., & Houben, G.J.: On the Impact of Personality in Massive
                               Open Online Learning. In: Proceedings of the 24th Conference on User Modeling,
                               Adaptation and Personalization, pp. 121-130. ACM, New York (2016).
                           18. Moreno-Marcos, P.M., Alario-Hoyos, C., Muñoz-Merino, P.J., Estévez-Ayres, I, & Delgado
                               Kloos, C.: Sentiment Analysis in MOOCs: A case study. In: Proceedings of the 2018 IEEE
                               Global Engineering Education Conference, pp. 1489-1496. IEEE, Piscataway (2018).
                           19. Chen, X., Vorvoreanu, M., & Madhavan, K. Mining Social Media data for understanding
                               students’ learning experiences. IEEE Trans. Learning Technologies 7(3), 246-259 (2014).
                           20. Lufi, D., & Cohen, A.: A scale for measuring persistence in children. Journal of personality
                               assessment 51(2), 178-185 (1987).
                           21. Credé, M., Tynan, M.C., & Harms, P.D.: Much ado about grit: A meta-analytic synthesis of
                               the grid literature. Journal of Personality and Social Psychology 113(3), 492-511 (2017).
                           22. Datu, J.A.D., Valdez J.P.M., & King, R.B.: The Successful Life of Gritty Students: Grit
                               Leads to Optimal Educational and Well-Being Outcomes in a Collectivist Context. In: King
                               R., Bernardo A. (eds.) Psychol. Asian Learners, pp. 503-516. Springer, Singapore (2016).
                           23. Diener, E., Wirtz, D., Tov., W., Kim-Prieto, C., Choi, D.W, Oishi, S., & Biswas-Diener, R.:
                               New well-being measures: Short scales to assess flourishing and positive and negative
                               feelings. Social Indicators Research 97(2), 143-156 (2010).
                           24. Duckworth, A.L., Peterson, C., Matthews, M.D., & Kelly, D.R.: Grit: perseverance and
                               passion for long-term goals. J. Personality and Social Psychology 92(6), 1087-1101 (2007).
                           25. Bazelais, P., Lemay, D.J., & Doleck, T.: How does Grit Impact College Students’ Academic
                               Achievement in Science? Europ. J. Science and Mathematics Education 4(1), 33-43 (2016).
                           26. Meyers, R., Pignault, A., & Houssemand, C.: The role of motivation and self-regulation in
                               dropping out of school. Procedia-Social and Behavioral Sciences 89, 270-275 (2013).
                           27. Farrington, C.A., Roderick, M., Allensworth, E., Nagaoka, J., Keyes, T.S., Johnson, D.W.,
                               & Beechum, N.O: Teaching Adolescents to Become Learners: The Role of Noncognitive
                               Factors in Shaping School Performance – A Critical Literature Review. Consortium on
                               Chicago School Research, Chicago (2012).
                           28. Ventura, M., Shute, V., & Zhao, W. The relationship between video game use and a
                               performance-based measure of persistence. Computers & Education 60(1), 52-58 (2013).
                           29. Fox, A.: From MOOCs to SPOCs. Communications of the ACM 56(12), 38-40 (2013).
                           30. edX: EdX Research Guide Release, https://buildmedia.readthedocs.org/media/pdf/devdata/
                               latest/devdata.pdf, last accessed 2019/04/22.
                           31. Feng, W., Tang, J., & Liu, T.X.: Understanding Dropouts in MOOCs. In: Proceedings of the
                               33rd AAAI Conference on Artificial Intelligence. In Press (2019).
                           32. Paura, L, & Arhipova, I.: Cause analysis of students’ dropout rate in higher education study
                               program. Procedia-Social and Behavioral Sciences 109, 1282-1286 (2014).




Copyright © 2019 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.