Looking for a dropout predictor based on the
             instructional design of online courses*

          Salvador Ros [0000-0001-6330-4958] and Agustín Caminero[0000-0001-9658-9646]

                           Spanish Distance University, UNED. Spain


        Abstract. Dropout is probably one of the most critical concerns in higher
        education institutions. The research literature in this field is focused on asking
        questions about which reasons are the most frequent for dropping out, but there
        is a lack of research related to influence of the instructional design to find a new
        predictive indicator of dropout. In this work, we analyze four datasets obtained
        from the result of different assessment scheduled in the instructional design of
        the selected courses. In-depth research allows affirming that the dropout-risk
        group is formed by a large group of students that decided not to take one of the
        proposed assessment. This predictive indicator is easy to implement and allows
        faculty and institutions to design educational and administrative process to
        understand and help the students.

        Keywords: dropout, instructional design, education.


1       Introduction

Dropout is probably one of the most critical concerns in higher education institutions.
It is well established that this effect is especially significative in the introductory
courses having the highest dropout rates[1]. Some works reported that only 67% of
students complete their first-year courses [2]. This situation is common both for face to
face institutions as distance or online ones.
   The research literature in this field is focused on asking questions about which
reasons are the most frequent for dropping out. The research community agree that the
lack of time and the lack of motivation. are the most significant. However, both of these
reasons were affected by many factors [2]. Some researchers are focused on building
strategies for preventing abandonment (e.g. inclusion of in-game assessments or
activities) [2] while others try to predict the dropout. For this purpose, the researchers
look for different features to apply sophisticated algorithms to obtain a good result in
the predictions. In online education and especially at Spanish Distance University,
UNED exists a methodology for improving the learning process. This methodology is


* Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
18

based on the continuous assessment, which implies an assessment roadmap that
students have to follow until the end of course.
   Our research question RQ1 is related to this fact:
    Is it possible from the continuous assessment roadmap to detect the different
student’s group exposed to dropout along the course?
   At this moment, we are not interested in detecting the possible motivations of the
students for deciding to drop out but to detect them to start a process to understand and
help them to continue the studies. Therefore, we are looking for an indicator that allows
institutions to start theses processes and understand better this phenomenon.
   The rest of this paper is organized as follow. Section 2 presents some results related
to dropout situations. In Section 3, we describe our -experiment to detect dropout.
Section 4 presents a discussion of the results. Finally, Section 5 outlines the
conclusions.


2       Related works

The drop out phenomenon is a concern in many educational institutions. Some reports
prevent the high rates of drop out, especially in the introductory courses [2]. They try
to understand the aspects that push students to decide to drop out or by the contrary,
which factors influence the success of students. In [3], they analysed the influence of
twelve factors of success in a computer science course, and they detected the three most
relevant ones were: comfort level, math, and attribution to luck whereas in [4], they
detected as good predictors of students´success, the student effort and comfort.
Other authors look for different strategies, focusing on preventing dropout instead of
looking for different relevant factors. In [1], they concluded that using different
activities related to constructivism theories as games improve the motivation and in
turn, the success.
Since motivation is considered a relevant feature in the dropout, other researchers have
been working in predictive models related to the grade of the motivation of the students
[5]. In [6] uses Social Cognitive learning theory as a frame for a better understanding
of learner motivation and analyse the time variable as a predictive variable of the
motivation.
These studies are based on the selection of a set of significative features most of them
specially designed for a traditional classroom and very difficult to observe in an online
or distance context since they take into account information that it is impossible to
obtain from the online experience. In some cases, these studies collect features
obtained from logs of different systems or programs used as a complement in the
subject.
Once the most relevant features are detected, they try to build a model for predicting
the success or the dropout.
Finally, some researches are involved in educational strategies to avoid dropout rates.
For this purpose, they need a right and early enough prediction of dropout to apply the
different interventions [7].
                                                                                       19

By another hand, there is a lack of research about the instructional design´s influence
for predicting or preventing the dropout. This influence is not taken into account
because most of these studies are not designed for online or distance education. Finally,
the irruption of new techniques based on the deep-learning paradigm is being used for
the prediction of learners’ engagement base on new features, (e.g. facial expression)
[8].

In this paper, we present a very simple dropout predictor that uses student activity
feature based on the course instructional design. This predictor allows us an early
enough prediction of which students are at high risk of dropout and allows institutions
and faculty to implement different educational and strategies to obtain valuable
information to understand and help students.


3      Detecting dropout based on instructional design.

For answering our research question, we take as starting point the instructional design
of the courses based on the methodology followed at Spanish Distance University,
UNED. The instructional design of courses allows collecting different data that contain
new knowledge that could be useful for improving the teaching process. Despite other
consideration, if we focus in the assessment as part of the intervention in the subject,
we realize that the subjects have a schedule assessment plan that allows getting
information about the performance of the students in different periods. This
instructional design of the courses implies a set of assessments that they have to deliver
along the course called Continuous Assessment Tests, (PEC’s), and a final face to face
assessment, (F2F). Each assessment contributes to the final grade, but they are not
compulsory. An example of this assessment design is described in Table 1. It describes
the courses selected for our experiment. Therefore, the time variable appears as a new
element to take into account the purpose of design a continuous dropout detection
system.
On the other hand, we are interested in a system that is effective, feasible, useful and
very easy to implement.
For this purpose, we focused on the students’ attendance to the different assessments,
and we define the drop out concepts used in this work. We distinguish between two
kinds of dropout: Partial dropout that includes any student who didn´t take one of the
assessments at the moment they had to do (PEC’s). This way, a student can be in partial
dropout along the course different times. The second definition is Final dropout that
includes any student who didn´t take the final assessment (F2F), regardless of whether
he/she took any PEC before.


3.1    Method

We have collected data from two different computer science degree subjects delivered
at Spanish Distance University, UNED. We have collected data from the year 2019 and
20

2020. The selected courses are CS01 introductory subjects. The first one is a CS1
Database course and the second one is a CS01 Computer Networks course.
These both courses have in common the methodology of assessment. These courses
have a schedule assessment plan based on a continuous assessment framework. (see
Table 1).

                     Table 1. Assessment´s Contributions to final Grade


           Course                   PEC1             PEC2            PEC3        F2F


          Database                  10%               5%             15%        70%


      Computer Networks             10%              10%                  -     80%


We collected the academic results of the two subjects for two years at different periods
in the term following the assessment’s schedule.
 For Database course, the first assessment was PEC1, with a weight of 10% in the final
grade. The second was PEC2, with a weight of 5% in the final grade, the third
assessment, PEC3, was a practical exercise with a weight of 15% in the final grade and
finally, the final face-to-face exercise (F2F) with a weight of 70% in the final grade.
In the case of Computer Network course, the first assessment was PEC1 with a weight
of 10% in the final grade. The second was PEC2 with a weight of 10% in the final
grade. Finally, the final face-to-face exercise (F2F) with a weight of 70% in the final
grade. This structure allows as calculate the partial and final dropout.
To answer the research question RQ1, we focused mainly on the fact of whether a
student had taken the assessments than in the academic results collected in the different
moments along the course. This selection hides, likely, other factors that could
influence the motivation or engagement of students. The analysis of these factors is out
of the scope of this work. We are focused on the tracking of student activity based on
the instructional design to detect the set of students at risk of dropout automatically.


3.2     Results

To analyze the results, we summarized them in four tables. Each column codifies the
information about how many students (rate %) followed a sequence of taking
assessments along the course. (e.g. Column 001 for Computer Networks means the rate
of students that didn´t take PEC1 and didn´t take PEC2 but take F2F).
Analyzing the students’ rates for Computer Networks 2019 course collected in the
three-assessment schedule, we can distinguish three-point of analysis. First point after
PEC1. At this point, we found 43.16% of partial dropout. After PEC2, there is an
increment of up 56.8%, and finally, the final dropout rate was 50.66%, (see Table 2).
For the year 2020, we obtained similar results. After PEC1, the partial dropout rate is
                                                                                            21

53.56%. After PEC2 it is increased up 58.04%, and the final dropout rate was 52.77%,
(see Table 3)


      Table 2. Student Rates for 2019 Computer Networks subject. (PEC1,PEC2,F2F)


   000          001         011      010       100       101           111            110


 33.33%        5,86%       2,91%    1,06%    9,55%        8%         32,53%          6,67%


      Table 3. Student Rates for 2020 Computer Networks subject. (PEC1,PEC2,F2F)


   000          001        011       010       100       101         111             110


 42,21%        6,86%      3,43%     1.05%     5,54%     3,43%      33,50%           3.95%


Regarding 2019 Database course, (see Table 4), we had a four-assessments schedule.
The Partial dropout rates for the PEC´s were respectively, 28.51%, 39.25%, and
50.82%. The final dropout rate was 44.21%. The 2020 Database course data, (see Table
5) showed that the PEC’s partial dropout rates were respectively 32.25%, 38,71%, and
62.21%. The final dropout rate was 41.01%

           Table 4. Student rates for 2019 Database Subject. (PEC1,PEC2,PEC3,F2F)


  0000         0001       0010      0011      0100       0101       0110            0111


  19.00%     1.65%        3.72%     1.65%     1.65%      0%         0.42%      0.42%


  1000         1001       1010      1011      1100       1101       1110            1111


  6.61%      3.72%       2.48%      0.42%     4.96%    13.22%       5.37%       34.71%
22

             Table 5. Student rates for 2020 Database Subject. (PEC1,PEC2,PEC3,F2F)


     0000        0001       0010      0011       0100        0101      0110            0111


    25.33%      2.80%       0.46%    0.46%        0%        0.92%      0.92%         1.38%


     1000        1001       1010      1011       1100        1101      1110            1111


    4.14%        3.22%      0.92%     1.38%      6.45%     19.35%      2.80%          29.47%


4        Discussion

    The first trend we can observe is that the partial dropout rates increase along the
course, but the final dropout rate at the end is less than the last partial dropout rate. For
all the dataset used the final dropout is higher than the first partial dropout rate at PEC1,
(see Fig. 1 To establish some kind of relationship among the partial dropout rates and
the final one we have to deep in the composition of these rates.

                          Partial and Final Dropout Rates
    70

    60

    50

    40

    30

    20

    10

     0
                 PEC1                PEC2                  PEC3                F2F

                         2019CN        2020CN            2019DB        2020DB


                              Fig. 1. Evolution of the dropout rates
                                                                                      23

   Based on the 2019 Computer Networks dataset, our initial candidate dropout-risk
group had a size of 43.16% of the total student. We observed that this group by the time
of the PC2 was increased up 56,8%. At this point, this group was formed by 39.19% of
students have taken neither PEC1 nor PEC2 and a student´s group that decided not to
take the PEC2 (17,55%). Finally, a small portion of students that was initially in the
dropout-risk group decided to take PEC2 and then to go out from it. In the end, the
group of risk determined by the final dropout rate was formed in 65.79% by the students
that had taken neither the PEC1 nor PEC2 (33,33%). We found this behaviour in the
2020 computer network dataset, as well. In this case, the final dropout-risk group was
formed by a 79,98% of the students that had taken neither the PEC1 nor PEC2
(42,21%).
   Finally, we would like highlighting that if we extended the dropout-risk group with
the students that only have taken one PEC, the final dropout group for 2019 Computer
Network course data set was formed by an 86,67% of these students and the 2020 data
set by a 92,47% od these students.
   After this first analysis, we wanted to check if an increase in the number of PEC´s
had a significative influence on the composition of the dropout-risk group. For this
purpose, we analysed the dataset for the 2019 Database course. We found that 51.39%
of the final dropout-risk group is composed of students that have taken neither PEC1
nor PEC2. This rate increased up 76.63% if we add the group of students that has only
take one of these two PECs. The behaviour of the 2020 Database course dataset was
similar. The 62,88% of the final dropout-risk group was composed by students that
have taken neither PEC1 nor PEC2, and this rate increased up 77,46% if we added the
group of students that only had taken one of these two first PEC´s.
   Following the analysis, and adding the third PEC, we found that the composition of
the dropout-risk group for 2019 Database course increased up 87,85% if we considered
the student that haven´t take one PEC. This rate increased up 93,19% in the 2020 data
set for the Database course.
   Therefore, the tracking of the students that didn´t take any of the proposed PEC is a
good predictor of dropout group. This predictor could enable instructors to identify
students at risk of dropping out and defining different teaching strategies to target the
groups they belong initially formed at the beginning of the course.


5      Conclusions

In this work, we have analyses the dropout phenomenon taking into account the
information obtained from the instructional design of an online course. The
instructional design allows to schedule some interventions along the course.
Assessments are considered a kind of intervention. Analysing, the data about how many
students take the different assessments, we realized that we had a good predictor of the
composition of the dropout-risk group.
   This predictor gives us an initial set of students in dropout risk that allows faculty
and institution to do some interventions to understand and help students. This group is
24

redefined each time that the students decide not to take an assessment to the extent that
the results point out that the students that decide not to do one of the scheduled
assessment are candidates to drop out finally.
   This predictor is an effortless and efficiency one and allows an easy implementation
and tracking.


6      Acknowledgements

The authors acknowledge the support of SNOLA Thematic Network of Excellence
(RED2018‐102725‐T) by the Spanish Ministry of Innovation, Science and Universities
and G-Elios UNED research group.


7      Software

The dataset and the code for obtaining these results are available at
https://github.com/sros-UNED/LASI2020.

References

[1] A. Yan, M. J. Lee, and A. J. Ko, “Predicting Abandonment in Online Coding Tutorials,”
    2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC),
    pp. 191–199, Oct. 2017, DOI: 10.1109/VLHCC.2017.8103467.
[2] L. Malmi, “Why students drop out CS1 course,” in Proc. Second International Workshop
    on Computing Education Research, 2006.
[3] B. Wilson and S. Shrock, “Contributing to success in an introductory computer science
    course: A study of twelve factors,” presented at the ACM Sigcse Bulletin, Mar. 2001, vol.
    33, pp. 184–188, DOI: 10.1145/366413.364581.
[4] P. R. jr Ventura, “Identifying predictors of success for an objects-first CS1,” Computer
    Science Education, vol. 15, no. 3, pp. 223–243, Sep. 2005, DOI:
    10.1080/08993400500224419.
[5] L. Qu and W. L. Johnson, “Detecting the Learner’s the Motivational States in An Interactive
    Learning Environment,” in Proceedings of the 2005 conference on Artificial Intelligence in
    Education: Supporting Learning through Intelligent and Socially Informed Technology,
    NLD, May 2005, pp. 547–554, Accessed: May 02, 2020. [Online].
[6] M. Cocea, “Assessment of Motivation in Online Learning Environments,” in Adaptive
    Hypermedia and Adaptive Web-Based Systems, vol. 4018, V. P. Wade, H. Ashman, and B.
    Smyth, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006, pp. 414–418.
[7] S. Halawa, D. Greene, and P. J. Mitchell, “Dropout Prediction in MOOCs using Learner
    Activity Features,” p. 21, 2014.
[8] M. A. A. Dewan, F. Lin, D. Wen, M. Murshed, and Z. Uddin, “A Deep Learning Approach
    to Detecting Engagement of Online Learners,” 2018, pp. 1895–1902, DOI:
    10.1109/SmartWorld.2018.00318.