<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Modeling Tra jectories to Understand the Delayed Completion of Sequential Curricula Undergraduate Programs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Instituto de Informatica</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Universidad Austral de Chile</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chile renato.boegeholz@uach.cl</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>jguerra</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>escheihig@inf.uach.cl</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Taking more time than expected to complete university degree programs is a global and known problem, and in Chile, has relevance because pressure exists to complete degrees on time. In this work, we explore academic delay in higher education programs, in particular an engineering program, and its relation with academic information summarizing the trajectory of students along with the academic program. Academic information is represented by semester-by-semester features that re ect di erent aspects such as performance, workload, and difculty. Exploratory analyses of these variables reveal two orthogonal groups: performance and workload; then used to build models predicting the relative delay of a student at her 8th term at the program relative to the expected completion at 8th term. To further explore the trajectory of delay and analyze how the delay relates to other academic aspects such as term by term performance or workload, a sequential model was built. Results show di erent patterns of behaviors across di erent levels of delay in the 8th term. The methods and results of this research can be used by educational institutions or the government to support its decisions about the use of resources and attrition rates reduction.</p>
      </abstract>
      <kwd-group>
        <kwd>academic analytics curricular analytics learning analytics</kwd>
        <kwd>educational data mining academic trajectories time to degree</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Taking more time than expected to complete university degree programs is a
global and known problem. The average time to degree1 is 1.36 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] in Latin
America, and 1.31 in Chile, a number that has not experienced a relevant
variation in the last 10 years. This statistic not only means that obtaining a degree
1 The ratio between the average time it takes for students to graduate; and the
theoretical duration of the study program
takes on average about 31% more time than expected but also reveals that career
delays are a serious problem with economic considerations: this situation causes
in Chile an additional expense for families and the government estimated at US$
500 million [
        <xref ref-type="bibr" rid="ref2 ref3">2,3</xref>
        ]. Delay in completing a degree program may have multiple causes
such as the failure and repetition of subjects, temporary suspension of studies,
or assuming less course load than expected according to the curricular plan [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
Special importance is given to academic reasons behind academic delay because
pressure exists to complete degrees on time: high education in Chile is nanced
by the student (or her family) with the help of scholarships or other funding
bene ts that require good performance and usually do not tolerate delays [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        In this work, we explore academic delay in higher education programs, in
particular an engineering program, and its relation with academic information
summarizing the trajectory of students along with the academic program. While
similar work focused in performance variables such as grades [
        <xref ref-type="bibr" rid="ref17 ref19 ref6">6, 17, 19</xref>
        ], we
include other relevant aspects of the academic information such as the course load
taken by the student each term, the di culty associated with the courses taken,
and repetition of courses which academic situation may determine the risk of
punitive actions (for example, elimination of the study program by failing a
subject more than twice, or failing, in the same semester, more than two subjects),
among others. The rst research question of this work is:
      </p>
      <p>1) What is the relation between delay in obtaining the degree and academic
information along the trajectory of the student?</p>
      <p>We build prediction models on academic delay considering di erent features
that can describe academic trajectories. As mentioned before, we seek to
represent such trajectories in terms of di erent academic information spanning
performance, course workload, and di culty. These academic factors are relevant
not only because they could predict academic delay, but because they could
characterize the trajectories in an actionable manner, that is, further analysis
of such trajectories could bring insights that could support counseling practices
and curricular re-design actions. Thus, we state a second research question:
2) Can the academic trajectories relate to delay be characterized in a manner
that provides information about student behavior?
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Researchers have used di erent approaches to analyze academic information and
typically centered around performance measures. Based on grading data and
dismissing the background characteristics of the students, [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] identi ed curriculum
subjects that can serve as e ective indicators of academic performance. Using
X-means [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] to group students yearly, [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] discovered typical progress patterns
and evaluated the predictive capacity of the explanatory subjects. Considering
the results in an entrance self-assessment test and the academic performance of
the rst year, [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] used K-means [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] to group students and follow their
performance trajectories in the following 2nd and 3rd year, measuring the in uence
of the rst year behavior in the progression of the curriculum [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] explored
multiple personal and social factors that can a ect the academic performance of
university students and, using the Grade Point Average (GPA) as an explained
variable through decision trees [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] proposed a qualitative model to classify and
predict it [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Dropout has also been investigated. Aiming to relieve dropout, [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] used
recommendation systems techniques to predict the grades that students will obtain
in future subjects. The predictions were made using personalized multiple linear
regressions (PLMR) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] on student participation data in both traditional classes
and Massive Open Online Courses (MOOC) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] examined demographic
variables as family characteristics; pre-university and university academic
performance factors; and the participation or not in recovery courses, to predict
the persistence of the students in the study programs. Using as explanatory
variables the scores of the ACT standardized test for college admission, the
average of grades in high school, the average grades of the rst semester of the
university; and using analysis of variance (ANOVA), Pearson product-moment
correlations, and multiple regression analysis [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] showed that students who were
academically prepared to take college-level courses were more likely to persist
than students assigned to mandatory recovery courses [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        Researchers have also focused on performing analyses of students'trajectories
to inform the design of curricula. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] used student performance data from a
speci c program to perform an analysis of the curriculum design of the program.
In particular, they modeled the di culty of each subject as its contribution
(negative or positive) to the students' GPA and then contrasting this measure
to a survey of student perception. Using the same performance data they also
performed a dropout and enrollment path analysis [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Important to consider is
that most of the previous work has been carried related to a exible course{credit
systems where the degrees are obtained as the sum of core and optional approved
subjects [
        <xref ref-type="bibr" rid="ref11 ref13 ref19 ref8">8, 11, 13, 19</xref>
        ]. This context di ers from our sequential-non- exible
curricula, where the ow of subjects to take is pre-de ned for all the terms of the
program.
      </p>
      <p>
        The representation of the academic trajectories of the students is not
trivial, and it is necessary to consider the temporal dependence of the variables
under study. Following this idea, [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] used frequent pattern mining [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] to
reveal academic trajectories to understand the sequences of subjects to take that
could improve student performance. In [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], a sequential data model is proposed
that explicitly captures the temporal dependencies of the academic performance
characteristics to form ngerprints or signatures that are constructed allowing
di erent analytical interpretations and the development of predictive models for
the risk of academic delay.
      </p>
      <p>As presented, several of the proposed models -all of the regression type-
considered the performance of the students as explanatory variables, without taking
into account the dependence that exists in the performance of a student in
various subjects within the same semester, as well as in successive semesters. Our
work distinguished from previous work in several aspects: in its multivariate
nature considers in addition to academic performance aspects such as the academic
workload and the di culty of the subjects in each semester; incorporating the
temporal dimension in the modeling of the students' trajectories; and
exploring such trajectories in the context of a curriculum with a sequential structure,
where about 90% of the courses to be completed are mandatory.
3
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Methods</title>
      <sec id="sec-3-1">
        <title>Academic Features and Data Descriptions</title>
        <p>Academic information at the level of the degree program includes data of courses
taken, passed, failed (with their grades), and dropped in each term of the
academic life of the student. We combine these records with historic data and the
expected curricular progress 2 to generate a series of academic features in each
term of the student academic life. These features represent di erent dimensions
related to academic performance, academic workload (courseload), the relative
di culty of the courses, and the consistency between the courses taken and their
theoretical order in the curriculum. Nine features for each term of each student
were computed and are de ned in Table 1. More details about the de nition of
the variables can be found in Appendix A.</p>
        <p>We understand by academic trajectory to all this information organized in a
term by term sequence. To allow comparisons between trajectories, we considered
only the activity of the rst 8 semesters (8 terms) for each student, counted
from their rst enrollment. These 8 semesters also represent a milestone of the
study program because contain the required subjects to obtain the Licenciatura
degree3. Considering this, the explained variable will be the delay in the 8th
term (DELAY8), which measures how far is the student of having completed all
courses of the rst 8 terms of the plan in her rst 8 terms of academic life. A
student who passed all planned courses of the rst 8 semesters of the program
in her rst 8 semesters of academic life, has delay zero.</p>
        <p>To reduce inconsistencies in the comparisons of the trajectories (because of
the dependency between the features and the structure of the program
curricula), it was decided to analyze only one study program: Engineering in Computer
Science. For the period of available data, this program implements three
curriculum versions (2008, 2010, and 2015) each with 11 semesters of duration and an
average of 6 subjects per semester. To study the performance of the students
during their rst 8 semesters, only whose admitted between 2008 and 2015 were
2 In Chile most of higher education programs have a semi- exible curricular plan,
where the study sequence in pre-de ned term by term.
3 In Chile, \Licenciatura" is similar to a bachelor's degree. To obtain this grade is
necessary to complete 8 semesters of subjects that are part of an academic major. This
degree allows you to continue an academic career. To be quali ed for professional
practice there are necessary between 2 and 4 semesters of additional subjects.</p>
        <p>Feature
GP A
P ASSRAT E
F IRST IM E
P ROGRE
W KLOAD
DIF F IC A
DIF F IC B
DISP AR
DELAY</p>
        <p>Description
Possible
values
Not cumulative weighted grade average for the semester. [1:0; 7:0] with 4:0
the passing grade
Passing rate of the semester (the ratio of passed subjects [0; 1]
to passed plus failed subjects).</p>
        <p>The proportion of subjects enrolled for the very rst time [0; 1]
in the semester.</p>
        <p>Contribution of subjects passed in the semester to the [0; 1]
total of subjects required to obtain the degree.</p>
        <p>Academic workload rate for the semester measured in
CST to the average semester CST of the program.
[0; LENP rog]
Di culty of the semester, as an additive measure (\Al- [0; max(HF Rj)SU Bi]
pha di culty").</p>
        <p>Di culty of the semester, as a geometric measure (\Beta [0; 1]
di culty").</p>
        <p>The disparity of subjects enrolled in the semester: the [0; 1]
di erence, in semesters, between the highest level subject
and the lowest level subject.</p>
        <p>Measurement of the delay between the theoretical and
actual (average) semester of the student given their date
of admission.
[0; 1]
considered. The resulting data set was composed of 14,199 records of academic
activity of 365 di erent students.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Data Modeling</title>
        <p>We are interested in modeling the behavior of the 8th-semester delay (DELAY8)
as a function of the other eight features de ned for each semester. We limit the
scope of the independent variables (the features) to the rst 4 semesters because
of two reasons. First, the idea of predicting delay (and also predicting dropout)
gain relevance if a prediction can be done early, thus it seems reasonable to
predict a delay in the 8th semester with information from the four rst terms.
Second, the four initial semesters correspond to the \Bachillerato" milestone
in the engineering programs of the university, where the foundational courses
of math and physics are concentrated and which have the higher failure rates,
thus are strongly related with academic failure or success. The analyses will be
performed in three steps.</p>
        <p>The rst step is to perform an exploratory data analysis (EDA) for all the
variables. We summarize the main characteristics of each variable (mean,
median, quantiles, and range) together with their box plots, following with
correlation matrix and principal component analysis (PCA) to gain an understanding
of the structure of the set of variables and identify the most signi cant variables
which can explain the academic delay.</p>
        <p>The second step includes building predictive models using two supervised
algorithms, linear regression (LR) and support vector machine (SVM) on the
delay at 8th semester with other features of the rst 4 semesters, such as di culty
and workload as predictors. These analyses target research question 1.</p>
        <p>
          The third step is to characterize academic trajectories in terms of the relation
of the term features and delay. An adaptation of the sequential model proposed
by [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] is implemented. In our case, this model is built as follows. Each student
trajectory is represented as a sequence of 4 nodes (semesters 1 to 4), where each
node is a single value representing a delay score. To compute the i semester's
delay score, rst all students are clustered using k-means on their i semester's
features. Then the score of the i semesters is the average delay of all students
in the cluster (cluster members).
4
4.1
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results and Discussion</title>
      <sec id="sec-4-1">
        <title>Exploratory Data Analysis</title>
        <p>We explored the behavior of all the features in their semester-by-semester
progression. Box plots and summary statistics for the variables can be found in
Appendix B. Figure 2 presented here as a sample shows box plots of the
features GPA (not cumulative weighted grade average) and DIFFIC A (an additive
measure of di culty) for the rst 8 semesters.</p>
        <p>From the analysis can be observed that the DELAY variable shows
distributions with increasing medians and variability through the terms, as students on
average accumulate more delay as they stay more terms. It can be observed that
the distributions of the average grade (GPA) have medians that increase slightly
but consistently through the terms. PASSRATE shows the lowest median in the
2nd semester with a value of 0.5. In the following semesters, the values of the
medians increase progressively meanwhile the median passing rate for the 1st
semester is much higher than the rest, with a value of 0.7. Dispersion is similar
in all semesters.</p>
        <p>In the case of the academic workload (WKLOAD), the median distribution
in semesters 2nd to 4th is below the academic workload de ned by the
curriculum and it approaches that value in semester 5. The dispersion of these
distributions increases between the 6th and 8th semesters. The relative di
culty, both DIFFIC A, and DIFFIC B, show medians that descend as students
progress in their semesters, with quite similar dispersion. The disparity
(DISPAR) shows only two median values: 0.143 for semesters 2 through 5; and 0.286,
for semesters 6 through 8. Dispersion appears biased (positively or negatively)
for all semesters, except for semester 8 in which symmetry is appreciated.</p>
        <p>Correlation matrices and principal component analysis (PCA) were obtained
to explore options for reducing the set of variables. The results of the analyzes for
semester 3 are described in Figure 4. Biplots and correlograms for all the features
can be found in Appendix B. In all semesters it is observed that the delay is
negatively correlated with the variables of performance, curricular progress, and
academic load, namely: GPA, PASSRATE, PROGRE, AVG, and WKLOAD.
On the other hand, it has a very weak positive correlation with the measures
of di culty of the subjects (DIFFIC A and DIFFIC B) and weak negative with
the measure of disparity (DIS). Two main components of the PCA shows two
groups of variables: i) those that are more related to the individual performance
of the students: AVG, PASSRATE, PROGRE and GPA, and ii) those which are
related to the characteristics of the study program: WKLOAD, DIFFIC A, and
DIFFIC B. It is interesting the distinct components represented by performance
and workload. To represent each of the groups in the subsequent analyzes of this
work, we selected GPA and DIFFIC A respectively based on a better degree of
interpretation than they may have compared to the other features.
di erence with the lower value was computed. We can observe that this
indicator declines, as expected, with the inclusion of the variables of the consecutive
semesters. An important nding is that the predictive power in the rst two
semesters is quite similar between the model with two variables and the model
with all variables. In contrast, by including semesters 3 and 4, the predictive
power of the complete model is much greater. In the case of SVM, the RSME,
RSq, and MAE indicators are calculated and ratify the observed with AICc. In
particular, with the complete model until the fourth semester, an RSq of 0.805
is obtained which is high.
4.3</p>
      </sec>
      <sec id="sec-4-2">
        <title>Characterization of the Delay Trajectories</title>
        <p>Trajectories were built for each student as a sequence of 4 values, one per each
of the 4 rst terms, each representing the average delay of all students who have
similar academic features in a semester. To do this, we performed a clustering at
each semester with all its features. Each term, the student falls into one cluster,
from which the average delay marks his/her delay trajectory. The number of
clusters obtained varied from 3 to 4.</p>
        <p>Figure 5a shows the trajectories for all students who completed 8 semesters.
To understand the delay behavior along time, the trajectories were presented
into 3 groups: those students who reached their 8th semester with 2 semesters
of delay or less (DELAY 8 0:29), students who reached the 8th semester with
a delay between 2 and 4 semesters (0:29 &lt; DELAY 8 0:57), and students who
have a delay of more than 4 semesters (DELAY 8 &gt; 0:57).</p>
        <p>Figures 5b to 5d shows the trajectories for every one of those groups. It can
be observed -as in Figure 5a- that for the rst semester, the prediction of delay
for all students is 2 or 4 semesters.</p>
        <p>Most of the students who complete their 8th semester with a delay lower or
equal than 2 semesters (Figure 5b), were projected with 2 semesters or less of
delay throughout the entire program.</p>
        <p>On the other hand, students who nished the 8th semester with a delay
greater than 4 semesters (Figure 5d), maintained similar forecasts during the 4
semesters under study.</p>
        <p>It can be seen that in both groups, the less and more delayed, the proportion
of \good" and \bad" results for their 1st semester is quite similar, which would
make the 1st semester a not reliable indicator of the nal delay result.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>In this research paper, we applied statistical and data mining methods to
understand the di erent behaviors in the progression of students across a sequential
curriculum program related to delay in obtaining a degree. First, features that
summarize di erent aspects of the academic records information such as
performance, workload, and course di culty were built for each term a student stays
in the academic program. A measure of the delay was custom-made in
relation to the expected progress at term 8th. Second, we performed an exploratory
data analysis of the features which revealed two di erent sets of variables that
appeared orthogonal within the two principal components of a PCA: a group
with all performance variables, and a group with workload and di culty. The
weighted average grade (GPA) and one measure of di culty (DIFFIC A) were
selected to represent these groups. Third, the predictive capacity of the
features was explored, revealing that the selected two variables can predict the
8th-semester delay close enough as a model using all predictors. Fourth, delay
sequences were modeled and represented as trajectories showing distinguishable
groups of students'behavior. The evidence provided shows that the delay is not
strictly determined by the students'performance during their rst semester. Low
initial performances can follow a path of progressive improvement and reduce
their potential delay while ends the program. In conclusion, this work provides
valuable insight into a better understanding of the dynamics of the progress in
a sequential curricular program, potentially contributing to the decision-making
of institutions, directors, and students.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>Work funded by Universidad Austral de Chile and the LALA project (grant no.
586120-EPP-1-2017-1-ES-EPPKA2-CBHE-JP). This project has been funded
with support from the European Commission. This publication re ects only the
views of the authors, and the Commission cannot be held responsible for any
use which may be made of the information contained therein.</p>
    </sec>
    <sec id="sec-7">
      <title>Appendix A</title>
    </sec>
    <sec id="sec-8">
      <title>De nition of Explanatory Variables</title>
      <p>The following equations describe how the features were built for each semester
i of the student's stay and every subject j enrolled in that semester:</p>
      <p>GP Ai =</p>
      <p>PjS=U1Bi (GRAj CT Sj )</p>
      <p>PSUBi CT Sj
j=1
LENP rog= The total number of semesters of the study program.</p>
      <p>SU BP rog = The total number of subjects to be completed in the program to obtain the degree.
CT SP rog = The total number of CTS credits of the study program.</p>
      <p>SU Bi = The number of subjects enrolled in semester i.</p>
      <p>SU B1Ti = The number of subjects enrolled in the semester i for the very rst time.
P ASSi = The number of passed subjects in semester i.</p>
      <p>F AILi = The number of failed subjects in semester i.</p>
      <p>GRAj = The nal grade obtained by the student in the subject j.</p>
      <p>SEMj = Semester in which the subject j is located within the study program.
PjS=U1Bi SEMj
AV Gi = Average semester in which the student is, with AV Gi = SUBi
CT Sj = The number of CTS credits of the subject j.</p>
      <p>CT SAvg = The average number of CTS credits per semester of the study program.
HF Rj = The historical failure rate of the subject j.</p>
    </sec>
    <sec id="sec-9">
      <title>Appendix B</title>
    </sec>
    <sec id="sec-10">
      <title>Exploratory Data Visualization</title>
      <p>N</p>
      <p>N</p>
      <p>N</p>
      <p>Table of Fig. B.6: Di culty (beta) (DIFFIC B)</p>
      <p>N</p>
      <p>N</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ferreyra</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Avitabile</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Botero</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paz</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Urzua</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>At a Crossroads: Higher Education in Latin America and the Caribbean</article-title>
          . Directions in Development. Washington, DC: World Bank.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>SIES</given-names>
            <surname>Servicio de Informacion de Educacion Superior del Ministerio de Educacion de Chile.</surname>
          </string-name>
          (
          <year>2018</year>
          ). Informe Duracion Real y Sobreduracion de las carreras de Educacion Superior (
          <year>2013</year>
          -2017)
          <article-title>(Spanish).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Aequalis Foro de Educacion Superior</surname>
          </string-name>
          . (
          <year>2019</year>
          ).
          <article-title>Estimacion del gasto scal y familiar para nanciar la sobre-duracion de los estudiantes en las carreras: caso chileno (Spanish).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Himmel</surname>
          </string-name>
          , E. Modelo de analisis de
          <article-title>la desercion estudiantil en la educacion superior (Spanish)</article-title>
          . (
          <year>2002</year>
          ). Facultad de Educacion/ Ponti cia Universidad Catolica de Chile. (
          <year>2018</year>
          ).
          <source>Calidad en la Educacion</source>
          , (
          <volume>17</volume>
          ),
          <fpage>91</fpage>
          -
          <lpage>108</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Trevino</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valdes</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Castro</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Costilla</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pardo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Donoso Rivas</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Factores asociados al logro cognitivo de los estudiantes de America Latina y el Caribe (Spanish)</article-title>
          .
          <source>OREALC/UNESCO.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Saa</surname>
            ,
            <given-names>A. A.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Educational data mining and students' performance prediction</article-title>
          .
          <source>International Journal of Advanced Computer Science and Applications</source>
          ,
          <volume>7</volume>
          (
          <issue>5</issue>
          ),
          <fpage>212</fpage>
          -
          <lpage>220</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Pelleg</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>A. W.</given-names>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>X-means: Extending k-means with e cient estimation of the number of clusters</article-title>
          .
          <source>In Icml (Vol. 1</source>
          , pp.
          <fpage>727</fpage>
          -
          <lpage>734</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Robinson</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2004</year>
          ).
          <article-title>Pathways to completion: Patterns of progression through a university degree</article-title>
          .
          <source>Higher Education</source>
          ,
          <volume>47</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          . doi:
          <volume>10</volume>
          .1023/B:HIGH.
          <volume>0000009803</volume>
          .70418.9c
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Asif</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Merceron</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ali</surname>
            ,
            <given-names>S. A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Haider</surname>
            ,
            <given-names>N. G.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Analyzing undergraduate students' performance using educational data mining</article-title>
          .
          <source>Computers &amp; Education</source>
          ,
          <volume>113</volume>
          ,
          <fpage>177</fpage>
          -
          <lpage>194</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>MacQueen</surname>
          </string-name>
          , J. (
          <year>1967</year>
          ).
          <article-title>Some methods for classi cation and analysis of multivariate observations</article-title>
          .
          <source>In Proceedings of the fth Berkeley symposium on mathematical statistics and probability</source>
          (Vol.
          <volume>1</volume>
          , No.
          <volume>14</volume>
          , pp.
          <fpage>281</fpage>
          -
          <lpage>297</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Campagni</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Merlini</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verri</surname>
            <given-names>M.C.</given-names>
          </string-name>
          (
          <year>2018</year>
          )
          <article-title>The In uence of First Year Behaviour in the Progressions of University Students. Computers Supported Education</article-title>
          .
          <source>CSEDU 2017. Communications in Computer and Information Science</source>
          , vol
          <volume>865</volume>
          . Springer, Cham.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Maxwell</surname>
            ,
            <given-names>S. E.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Delaney</surname>
            ,
            <given-names>H. D.</given-names>
          </string-name>
          (
          <year>2004</year>
          ).
          <article-title>Designing experiments and analyzing data: a model comparison perspective</article-title>
          . Lawrence Erlbaum Associates, Inc.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Stewart</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lim</surname>
            ,
            <given-names>D. H.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Factors in uencing college persistence for rst-time students</article-title>
          .
          <source>Journal of Developmental Education</source>
          ,
          <fpage>12</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Quinlan</surname>
            ,
            <given-names>J. R.</given-names>
          </string-name>
          (
          <year>1986</year>
          ).
          <article-title>Induction of decision trees</article-title>
          .
          <source>Machine learning</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ),
          <fpage>81</fpage>
          -
          <lpage>106</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Kutner</surname>
            ,
            <given-names>M. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nachtsheim</surname>
            ,
            <given-names>C. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neter</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          (
          <year>2005</year>
          ).
          <article-title>Applied linear statistical models</article-title>
          (Vol.
          <volume>5</volume>
          ). Boston:
          <string-name>
            <surname>McGraw-Hill Irwin</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Elbadrawy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polyzou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sweeney</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karypis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Rangwala</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Predicting student performance using personalized analytics</article-title>
          .
          <source>Computer</source>
          ,
          <volume>49</volume>
          (
          <issue>4</issue>
          ),
          <fpage>61</fpage>
          -
          <lpage>69</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Mendez</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ochoa</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chiluiza</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , &amp; de Wever,
          <string-name>
            <surname>B.</surname>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Curricular Design Analysis: A Data-Driven Perspective</article-title>
          .
          <source>Journal of Learning Analytics</source>
          ,
          <volume>1</volume>
          (
          <issue>3</issue>
          ),
          <fpage>84</fpage>
          -
          <lpage>119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Imielinski</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Swami</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>1993</year>
          ).
          <article-title>Mining association rules between sets of items in large databases</article-title>
          .
          <source>In Acm sigmod record</source>
          (Vol.
          <volume>22</volume>
          , No.
          <issue>2</issue>
          , pp.
          <fpage>207</fpage>
          -
          <lpage>216</lpage>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Almatra</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangwala</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Lester</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2016</year>
          ),
          <article-title>Identifying Course Trajectories of High Achieving Engineering Students through Data Analytics</article-title>
          .
          <source>ASEE Annual Conference and Exposition</source>
          , New Orleans, Louisiana. doi:
          <volume>10</volume>
          .18260/p.
          <fpage>25519</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Mahzoon</surname>
            ,
            <given-names>M. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maher</surname>
            ,
            <given-names>M. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eltayeby</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dou</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Grace</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>A Sequence Data Model for Analyzing Temporal Patterns of Student Data</article-title>
          .
          <source>Journal of Learning Analytics</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ),
          <fpage>55</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21. van Eck,
          <string-name>
            <given-names>M.L.</given-names>
            ,
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            ,
            <surname>Leemans</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.J.J</surname>
          </string-name>
          ., van der Aalst,
          <string-name>
            <surname>W.M.P.</surname>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>PM2: a Process Mining Project Methodology</article-title>
          .
          <source>CAiSE</source>
          <year>2015</year>
          , LNCS 9097, pp.
          <fpage>297</fpage>
          -
          <lpage>313</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>