Modeling Trajectories to Understand the Delayed Completion of Sequential Curricula Undergraduate Programs Renato Boegeholz[0000−0002−6826−1210] , Julio Guerra[0000−0002−8296−9848] , and Eliana Scheihing[0000−0003−1801−9167] Instituto de Informatica, Universidad Austral de Chile, Chile renato.boegeholz@uach.cl,{jguerra,escheihi}@inf.uach.cl Abstract. Taking more time than expected to complete university de- gree programs is a global and known problem, and in Chile, has relevance because pressure exists to complete degrees on time. In this work, we explore academic delay in higher education programs, in particular an engineering program, and its relation with academic information sum- marizing the trajectory of students along with the academic program. Academic information is represented by semester-by-semester features that reflect different aspects such as performance, workload, and dif- ficulty. Exploratory analyses of these variables reveal two orthogonal groups: performance and workload; then used to build models predicting the relative delay of a student at her 8th term at the program relative to the expected completion at 8th term. To further explore the trajectory of delay and analyze how the delay relates to other academic aspects such as term by term performance or workload, a sequential model was built. Results show different patterns of behaviors across different lev- els of delay in the 8th term. The methods and results of this research can be used by educational institutions or the government to support its decisions about the use of resources and attrition rates reduction. Keywords: academic analytics · curricular analytics · learning analyt- ics, · educational data mining · academic trajectories · time to degree. 1 Introduction Taking more time than expected to complete university degree programs is a global and known problem. The average time to degree1 is 1.36 [1] in Latin America, and 1.31 in Chile, a number that has not experienced a relevant vari- ation in the last 10 years. This statistic not only means that obtaining a degree 1 The ratio between the average time it takes for students to graduate; and the theo- retical duration of the study program Copyright c 2020 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). 2 R. Boegeholz, J. Guerra, E. Scheihing takes on average about 31% more time than expected but also reveals that career delays are a serious problem with economic considerations: this situation causes in Chile an additional expense for families and the government estimated at US$ 500 million [2,3]. Delay in completing a degree program may have multiple causes such as the failure and repetition of subjects, temporary suspension of studies, or assuming less course load than expected according to the curricular plan [4]. Special importance is given to academic reasons behind academic delay because pressure exists to complete degrees on time: high education in Chile is financed by the student (or her family) with the help of scholarships or other funding benefits that require good performance and usually do not tolerate delays [5]. In this work, we explore academic delay in higher education programs, in particular an engineering program, and its relation with academic information summarizing the trajectory of students along with the academic program. While similar work focused in performance variables such as grades [6, 17, 19], we in- clude other relevant aspects of the academic information such as the course load taken by the student each term, the difficulty associated with the courses taken, and repetition of courses which academic situation may determine the risk of punitive actions (for example, elimination of the study program by failing a sub- ject more than twice, or failing, in the same semester, more than two subjects), among others. The first research question of this work is: 1) What is the relation between delay in obtaining the degree and academic information along the trajectory of the student? We build prediction models on academic delay considering different features that can describe academic trajectories. As mentioned before, we seek to repre- sent such trajectories in terms of different academic information spanning per- formance, course workload, and difficulty. These academic factors are relevant not only because they could predict academic delay, but because they could characterize the trajectories in an actionable manner, that is, further analysis of such trajectories could bring insights that could support counseling practices and curricular re-design actions. Thus, we state a second research question: 2) Can the academic trajectories relate to delay be characterized in a manner that provides information about student behavior? 2 Related Work Researchers have used different approaches to analyze academic information and typically centered around performance measures. Based on grading data and dis- missing the background characteristics of the students, [9] identified curriculum subjects that can serve as effective indicators of academic performance. Using X-means [7] to group students yearly, [9] discovered typical progress patterns and evaluated the predictive capacity of the explanatory subjects. Considering the results in an entrance self-assessment test and the academic performance of Modeling Trajectories of Delayed Degree Completion 3 the first year, [11] used K-means [10] to group students and follow their perfor- mance trajectories in the following 2nd and 3rd year, measuring the influence of the first year behavior in the progression of the curriculum [11]. [6] explored multiple personal and social factors that can affect the academic performance of university students and, using the Grade Point Average (GPA) as an explained variable through decision trees [14] proposed a qualitative model to classify and predict it [6]. Dropout has also been investigated. Aiming to relieve dropout, [16] used rec- ommendation systems techniques to predict the grades that students will obtain in future subjects. The predictions were made using personalized multiple linear regressions (PLMR) [15] on student participation data in both traditional classes and Massive Open Online Courses (MOOC) [16]. [13] examined demographic variables as family characteristics; pre-university and university academic per- formance factors; and the participation or not in recovery courses, to predict the persistence of the students in the study programs. Using as explanatory variables the scores of the ACT standardized test for college admission, the av- erage of grades in high school, the average grades of the first semester of the university; and using analysis of variance (ANOVA), Pearson product-moment correlations, and multiple regression analysis [12] showed that students who were academically prepared to take college-level courses were more likely to persist than students assigned to mandatory recovery courses [13]. Researchers have also focused on performing analyses of students’trajectories to inform the design of curricula. [17] used student performance data from a spe- cific program to perform an analysis of the curriculum design of the program. In particular, they modeled the difficulty of each subject as its contribution (negative or positive) to the students’ GPA and then contrasting this measure to a survey of student perception. Using the same performance data they also performed a dropout and enrollment path analysis [17]. Important to consider is that most of the previous work has been carried related to a flexible course–credit systems where the degrees are obtained as the sum of core and optional approved subjects [8, 11, 13, 19]. This context differs from our sequential-non-flexible cur- ricula, where the flow of subjects to take is pre-defined for all the terms of the program. The representation of the academic trajectories of the students is not triv- ial, and it is necessary to consider the temporal dependence of the variables under study. Following this idea, [19] used frequent pattern mining [18] to re- veal academic trajectories to understand the sequences of subjects to take that could improve student performance. In [20], a sequential data model is proposed that explicitly captures the temporal dependencies of the academic performance characteristics to form fingerprints or signatures that are constructed allowing different analytical interpretations and the development of predictive models for the risk of academic delay. As presented, several of the proposed models -all of the regression type- con- sidered the performance of the students as explanatory variables, without taking into account the dependence that exists in the performance of a student in var- 4 R. Boegeholz, J. Guerra, E. Scheihing ious subjects within the same semester, as well as in successive semesters. Our work distinguished from previous work in several aspects: in its multivariate na- ture considers in addition to academic performance aspects such as the academic workload and the difficulty of the subjects in each semester; incorporating the temporal dimension in the modeling of the students’ trajectories; and explor- ing such trajectories in the context of a curriculum with a sequential structure, where about 90% of the courses to be completed are mandatory. 3 Methods 3.1 Academic Features and Data Descriptions Academic information at the level of the degree program includes data of courses taken, passed, failed (with their grades), and dropped in each term of the aca- demic life of the student. We combine these records with historic data and the expected curricular progress 2 to generate a series of academic features in each term of the student academic life. These features represent different dimensions related to academic performance, academic workload (courseload), the relative difficulty of the courses, and the consistency between the courses taken and their theoretical order in the curriculum. Nine features for each term of each student were computed and are defined in Table 1. More details about the definition of the variables can be found in Appendix A. We understand by academic trajectory to all this information organized in a term by term sequence. To allow comparisons between trajectories, we considered only the activity of the first 8 semesters (8 terms) for each student, counted from their first enrollment. These 8 semesters also represent a milestone of the study program because contain the required subjects to obtain the Licenciatura degree3 . Considering this, the explained variable will be the delay in the 8th term (DELAY8), which measures how far is the student of having completed all courses of the first 8 terms of the plan in her first 8 terms of academic life. A student who passed all planned courses of the first 8 semesters of the program in her first 8 semesters of academic life, has delay zero. To reduce inconsistencies in the comparisons of the trajectories (because of the dependency between the features and the structure of the program curric- ula), it was decided to analyze only one study program: Engineering in Computer Science. For the period of available data, this program implements three curricu- lum versions (2008, 2010, and 2015) each with 11 semesters of duration and an average of 6 subjects per semester. To study the performance of the students during their first 8 semesters, only whose admitted between 2008 and 2015 were 2 In Chile most of higher education programs have a semi-flexible curricular plan, where the study sequence in pre-defined term by term. 3 In Chile, “Licenciatura” is similar to a bachelor’s degree. To obtain this grade is nec- essary to complete 8 semesters of subjects that are part of an academic major. This degree allows you to continue an academic career. To be qualified for professional practice there are necessary between 2 and 4 semesters of additional subjects. Modeling Trajectories of Delayed Degree Completion 5 Table 1: Description and possible values of the features built for this study. Definition of the variables can be found in Appendix A. Feature Description Possible val- ues GP A Not cumulative weighted grade average for the semester. [1.0, 7.0] with 4.0 the passing grade P ASSRAT E Passing rate of the semester (the ratio of passed subjects [0, 1] to passed plus failed subjects). F IRST IM E The proportion of subjects enrolled for the very first time [0, 1] in the semester. P ROGRE Contribution of subjects passed in the semester to the [0, 1] total of subjects required to obtain the degree. W KLOAD Academic workload rate for the semester measured in [0, LENP rog ] CST to the average semester CST of the program. DIF F IC A Difficulty of the semester, as an additive measure (“Al- [0, max(HF Rj )SU Bi ] pha difficulty”). DIF F IC B Difficulty of the semester, as a geometric measure (“Beta [0, 1] difficulty”). DISP AR The disparity of subjects enrolled in the semester: the [0, 1] difference, in semesters, between the highest level subject and the lowest level subject. DELAY Measurement of the delay between the theoretical and [0, 1] actual (average) semester of the student given their date of admission. considered. The resulting data set was composed of 14,199 records of academic activity of 365 different students. 3.2 Data Modeling We are interested in modeling the behavior of the 8th-semester delay (DELAY8) as a function of the other eight features defined for each semester. We limit the scope of the independent variables (the features) to the first 4 semesters because of two reasons. First, the idea of predicting delay (and also predicting dropout) gain relevance if a prediction can be done early, thus it seems reasonable to predict a delay in the 8th semester with information from the four first terms. Second, the four initial semesters correspond to the “Bachillerato” milestone 6 R. Boegeholz, J. Guerra, E. Scheihing in the engineering programs of the university, where the foundational courses of math and physics are concentrated and which have the higher failure rates, thus are strongly related with academic failure or success. The analyses will be performed in three steps. The first step is to perform an exploratory data analysis (EDA) for all the variables. We summarize the main characteristics of each variable (mean, me- dian, quantiles, and range) together with their box plots, following with correla- tion matrix and principal component analysis (PCA) to gain an understanding of the structure of the set of variables and identify the most significant variables which can explain the academic delay. The second step includes building predictive models using two supervised algorithms, linear regression (LR) and support vector machine (SVM) on the delay at 8th semester with other features of the first 4 semesters, such as difficulty and workload as predictors. These analyses target research question 1. The third step is to characterize academic trajectories in terms of the relation of the term features and delay. An adaptation of the sequential model proposed by [20] is implemented. In our case, this model is built as follows. Each student trajectory is represented as a sequence of 4 nodes (semesters 1 to 4), where each node is a single value representing a delay score. To compute the i semester’s delay score, first all students are clustered using k-means on their i semester’s features. Then the score of the i semesters is the average delay of all students in the cluster (cluster members). 4 Results and Discussion 4.1 Exploratory Data Analysis We explored the behavior of all the features in their semester-by-semester pro- gression. Box plots and summary statistics for the variables can be found in Appendix B. Figure 2 presented here as a sample shows box plots of the fea- tures GPA (not cumulative weighted grade average) and DIFFIC A (an additive measure of difficulty) for the first 8 semesters. From the analysis can be observed that the DELAY variable shows distribu- tions with increasing medians and variability through the terms, as students on average accumulate more delay as they stay more terms. It can be observed that the distributions of the average grade (GPA) have medians that increase slightly but consistently through the terms. PASSRATE shows the lowest median in the 2nd semester with a value of 0.5. In the following semesters, the values of the medians increase progressively meanwhile the median passing rate for the 1st semester is much higher than the rest, with a value of 0.7. Dispersion is similar in all semesters. In the case of the academic workload (WKLOAD), the median distribution in semesters 2nd to 4th is below the academic workload defined by the cur- riculum and it approaches that value in semester 5. The dispersion of these distributions increases between the 6th and 8th semesters. The relative diffi- culty, both DIFFIC A, and DIFFIC B, show medians that descend as students Modeling Trajectories of Delayed Degree Completion 7 Fig. 2: Variability in the samples across the 8 semesters for (a) The not cumulative weighted grade average (GPA) and (b) The alpha difficulty (DIFFIC A). progress in their semesters, with quite similar dispersion. The disparity (DIS- PAR) shows only two median values: 0.143 for semesters 2 through 5; and 0.286, for semesters 6 through 8. Dispersion appears biased (positively or negatively) for all semesters, except for semester 8 in which symmetry is appreciated. Correlation matrices and principal component analysis (PCA) were obtained to explore options for reducing the set of variables. The results of the analyzes for semester 3 are described in Figure 4. Biplots and correlograms for all the features can be found in Appendix B. In all semesters it is observed that the delay is negatively correlated with the variables of performance, curricular progress, and academic load, namely: GPA, PASSRATE, PROGRE, AVG, and WKLOAD. On the other hand, it has a very weak positive correlation with the measures of difficulty of the subjects (DIFFIC A and DIFFIC B) and weak negative with the measure of disparity (DIS). Two main components of the PCA shows two Fig. 4: (a) Biplot and (b) correlogram of the 3rd-semester variables. groups of variables: i) those that are more related to the individual performance 8 R. Boegeholz, J. Guerra, E. Scheihing of the students: AVG, PASSRATE, PROGRE and GPA, and ii) those which are related to the characteristics of the study program: WKLOAD, DIFFIC A, and DIFFIC B. It is interesting the distinct components represented by performance and workload. To represent each of the groups in the subsequent analyzes of this work, we selected GPA and DIFFIC A respectively based on a better degree of interpretation than they may have compared to the other features. Table 2: Summary measures of the predictive capacity of the models. K indicates the number of predictors in the model. Regression SVM Model K AICc ∆AICc RMSE RSq MAE SGD1 2 -123.430 98.913 0.154 0.588 0.127 SGD2 4 -146.596 75.746 0.142 0.645 0.114 SGD3 6 -161.323 61.020 0.139 0.679 0.116 SGD4 8 -177.462 44.881 0.137 0.688 0.115 SA1 8 -118.020 104.322 0.153 0.562 0.127 SA2 16 -153.047 69.296 0.140 0.657 0.117 SA3 24 -180.135 42.208 0.122 0.738 0.102 SA4 32 -222.343 0 0.106 0.805 0.088 4.2 Prediction Models To analyze the predictive capacity of the explanatory variables, two families of models where built. The SGDi models use predictors GP Ai and DIF F IC Ai adding them incrementally from first to fourth semester. In this way, SGD1 implements: DELAY8 ∼ GP A1 + DIF F IC A1 SGD2 implements: DELAY8 ∼ GP A1 + DIF F IC A1 + GP A2 + DIF F IC A2 and so on. The SAi models use as predictors all the features in each semester in the same incremental way. Thus, SA1 implements: DELAY8 ∼ GP A1 +P ASSRAT E1 +W KLOAD1 +DIF F IC A1 + DIF F IC B1 + DISP AR1 + F IRST IM E1 + P ROGRE1 SA2 implements: DELAY8 ∼ GP A1 +P ASSRAT E1 +W KLOAD1 +DIF F IC A1 + DIF F IC B1 +DISP AR1 +F IRST IM E1 +P ROGRE1 +GP A2 +P ASSRAT E2 + W KLOAD2 + DIF F IC A2 + DIF F IC B2 + DISP AR2 + F IRST IM E2 + P ROGRE2 and so forth. Table 2 summarizes the results obtained. Regressions were built using a Gen- eralized Linear Model (GLM) with Gaussian errors and an identity link func- tion. The corrected Akaike Information Criterion (AICc) of every model and the Modeling Trajectories of Delayed Degree Completion 9 difference with the lower value was computed. We can observe that this indica- tor declines, as expected, with the inclusion of the variables of the consecutive semesters. An important finding is that the predictive power in the first two semesters is quite similar between the model with two variables and the model with all variables. In contrast, by including semesters 3 and 4, the predictive power of the complete model is much greater. In the case of SVM, the RSME, RSq, and MAE indicators are calculated and ratify the observed with AICc. In particular, with the complete model until the fourth semester, an RSq of 0.805 is obtained which is high. 4.3 Characterization of the Delay Trajectories Trajectories were built for each student as a sequence of 4 values, one per each of the 4 first terms, each representing the average delay of all students who have similar academic features in a semester. To do this, we performed a clustering at each semester with all its features. Each term, the student falls into one cluster, from which the average delay marks his/her delay trajectory. The number of clusters obtained varied from 3 to 4. Figure 5a shows the trajectories for all students who completed 8 semesters. To understand the delay behavior along time, the trajectories were presented into 3 groups: those students who reached their 8th semester with 2 semesters of delay or less (DELAY 8 ≤ 0.29), students who reached the 8th semester with a delay between 2 and 4 semesters (0.29 < DELAY 8 ≤ 0.57), and students who have a delay of more than 4 semesters (DELAY 8 > 0.57). Figures 5b to 5d shows the trajectories for every one of those groups. It can be observed -as in Figure 5a- that for the first semester, the prediction of delay for all students is 2 or 4 semesters. Most of the students who complete their 8th semester with a delay lower or equal than 2 semesters (Figure 5b), were projected with 2 semesters or less of delay throughout the entire program. On the other hand, students who finished the 8th semester with a delay greater than 4 semesters (Figure 5d), maintained similar forecasts during the 4 semesters under study. It can be seen that in both groups, the less and more delayed, the proportion of “good” and “bad” results for their 1st semester is quite similar, which would make the 1st semester a not reliable indicator of the final delay result. 5 Conclusions In this research paper, we applied statistical and data mining methods to under- stand the different behaviors in the progression of students across a sequential curriculum program related to delay in obtaining a degree. First, features that summarize different aspects of the academic records information such as perfor- mance, workload, and course difficulty were built for each term a student stays 10 R. Boegeholz, J. Guerra, E. Scheihing Fig. 6: Delay trajectories for (a) All the students. (b) Students with an 8th-semester delay less or equal than 2 semesters. (c) Students with an 8th-semester delay between 2 and 4 semesters. (d) Students with an 8th-semester delay greater than 4 semesters. in the academic program. A measure of the delay was custom-made in rela- tion to the expected progress at term 8th. Second, we performed an exploratory data analysis of the features which revealed two different sets of variables that appeared orthogonal within the two principal components of a PCA: a group with all performance variables, and a group with workload and difficulty. The weighted average grade (GPA) and one measure of difficulty (DIFFIC A) were selected to represent these groups. Third, the predictive capacity of the fea- tures was explored, revealing that the selected two variables can predict the 8th-semester delay close enough as a model using all predictors. Fourth, delay sequences were modeled and represented as trajectories showing distinguishable groups of students’behavior. The evidence provided shows that the delay is not strictly determined by the students’performance during their first semester. Low initial performances can follow a path of progressive improvement and reduce their potential delay while ends the program. In conclusion, this work provides valuable insight into a better understanding of the dynamics of the progress in Modeling Trajectories of Delayed Degree Completion 11 a sequential curricular program, potentially contributing to the decision-making of institutions, directors, and students. 6 Acknowledgments Work funded by Universidad Austral de Chile and the LALA project (grant no. 586120-EPP-1-2017-1-ES-EPPKA2-CBHE-JP). This project has been funded with support from the European Commission. This publication reflects only the views of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein. References 1. Ferreyra, M., Avitabile C., Botero, J., Paz, F., & Urzúa, S. (2017). At a Crossroads: Higher Education in Latin America and the Caribbean. Directions in Development. Washington, DC: World Bank. 2. SIES Servicio de Información de Educacion Superior del Ministerio de Educación de Chile. (2018). Informe Duracion Real y Sobreduracion de las carreras de Educación Superior (2013-2017) (Spanish). 3. Aequalis Foro de Educación Superior. (2019). Estimación del gasto fiscal y familiar para financiar la sobre-duración de los estudiantes en las carreras: caso chileno (Spanish). 4. Himmel, E. Modelo de análisis de la deserción estudiantil en la educación superior (Spanish). (2002). Facultad de Educación/ Pontificia Universidad Católica de Chile. (2018). Calidad en la Educación, (17), 91-108. 5. Trevino, E., Valdes, H., Castro, M., Costilla, R., Pardo, C., & Donoso Rivas, F. (2010). Factores asociados al logro cognitivo de los estudiantes de America Latina y el Caribe (Spanish). OREALC/UNESCO. 6. Saa, A. A. (2016). Educational data mining and students’ performance prediction. International Journal of Advanced Computer Science and Applications, 7(5), 212- 220. 7. Pelleg, D., & Moore, A. W. (2000). X-means: Extending k-means with efficient estimation of the number of clusters. In Icml (Vol. 1, pp. 727-734). 8. Robinson, R. (2004). Pathways to completion: Patterns of progres- sion through a university degree. Higher Education, 47(1), 1-20. doi:10.1023/B:HIGH.0000009803.70418.9c 9. Asif, R., Merceron, A., Ali, S. A., & Haider, N. G. (2017). Analyzing undergraduate students’ performance using educational data mining. Computers & Education, 113, 177-194. 10. MacQueen, J. (1967). Some methods for classification and analysis of multivari- ate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, No. 14, pp. 281-297). 11. Campagni R., Merlini D., Verri M.C. (2018) The Influence of First Year Be- haviour in the Progressions of University Students. Computers Supported Edu- cation. CSEDU 2017. Communications in Computer and Information Science, vol 865. Springer, Cham. 12. Maxwell, S. E., and Delaney, H. D. (2004). Designing experiments and analyzing data: a model comparison perspective. Lawrence Erlbaum Associates, Inc. 12 R. Boegeholz, J. Guerra, E. Scheihing 13. Stewart, S., Lim, D. H., & Kim, J. (2015). Factors influencing college persistence for first-time students. Journal of Developmental Education, 12-20. 14. Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106. 15. Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2005). Applied linear statistical models (Vol. 5). Boston: McGraw-Hill Irwin. 16. Elbadrawy, A., Polyzou, A., Ren, Z., Sweeney, M., Karypis, G., & Rangwala, H. (2016). Predicting student performance using personalized analytics. Computer, 49(4), 61-69. 17. Mendez, G., Ochoa, X., Chiluiza, K., & de Wever, B. (2014). Curricular Design Analysis: A Data-Driven Perspective. Journal of Learning Analytics, 1(3), 84-119. 18. Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In Acm sigmod record (Vol. 22, No. 2, pp. 207-216). ACM. 19. Almatrafi, O., Johri, A., Rangwala, H., & Lester, J. (2016), Identifying Course Trajectories of High Achieving Engineering Students through Data Analytics. ASEE Annual Conference and Exposition, New Orleans, Louisiana. doi:10.18260/p.25519 20. Mahzoon, M. J., Maher, M. L., Eltayeby, O., Dou, W., & Grace, K. (2018). A Sequence Data Model for Analyzing Temporal Patterns of Student Data. Journal of Learning Analytics, 5(1), 55-74. 21. van Eck, M.L., Lu, X., Leemans, S.J.J., van der Aalst, W.M.P. (2015). PM2: a Process Mining Project Methodology. CAiSE 2015, LNCS 9097, pp. 297-313. Modeling Trajectories of Delayed Degree Completion A1 Appendix A Definition of Explanatory Variables The following equations describe how the features were built for each semester i of the student’s stay and every subject j enrolled in that semester: PSU Bi j=1 (GRAj · CT Sj ) GP Ai = PSU Bi ∈ [1.0, 7.0] (1) j=1 CT Sj P ASSi P ASSRAT Ei = ∈ [0, 1] (2) P ASSi + F AILi SU B1Ti F IRST IM Ei = ∈ [0, 1] (3) SU Bi P ASSi P ROGREi = ∈ [0, 1] (4) SU BP rog PSU Bi j=1 CT Sj W KLOADi = ∈ [0, LENP rog ] (5) CT SAvg SU X Bi DIF F IC Ai = HF Rj ∈ [0, SU Bi · max(HF Rj )] (6) j=1 SU Y Bi DIF F IC Bi = 1 − (1 − HF Rj ) ∈ [0, 1] (7) j=1 max(SEMj ) − min(SEMj ) DISP ARi = ∈ [0, 1] where j = 1..SU Bi (8) LENP rog − 1 i ≤ AV Gi ( 0 DELAYi = i−AV Gi ∈ [0, 1] (9) LENP rog −1 i > AV Gi Where: LENP rog = The total number of semesters of the study program. SU BP rog = The total number of subjects to be completed in the program to obtain the degree. CT SP rog = The total number of CTS credits of the study program. SU Bi = The number of subjects enrolled in semester i. SU B1Ti = The number of subjects enrolled in the semester i for the very first time. P ASSi = The number of passed subjects in semester i. F AILi = The number of failed subjects in semester i. GRAj = The final grade obtained by the student in the subject j. SEMj = Semester in which the subject j is located within the study program. PSU Bi j=1 SEMj AV Gi = Average semester in which the student is, with AV Gi = SU Bi ∈ [1, LENP rog ] A2 R. Boegeholz, J. Guerra, E. Scheihing CT Sj = The number of CTS credits of the subject j. CT SAvg = The average number of CTS credits per semester of the study program. HF Rj = The historical failure rate of the subject j. Modeling Trajectories of Delayed Degree Completion B1 Appendix B Exploratory Data Visualization Fig. B.1: Variability in the samples for the weighted semiannual average (GPA), throughout the 8 semesters. Table of Fig. B.1: Not cumulative weighted grade average (GPA) Statistic N Mean St. Dev. Min Pctl(25) Median Pctl(75) Max 1 365 4.087 1.099 1.000 3.440 4.120 4.870 6.630 2 324 3.718 1.106 1.000 3.173 3.870 4.442 6.350 3 277 3.818 0.997 1.000 3.250 3.910 4.560 6.070 4 234 3.896 1.041 1.000 3.482 4.060 4.545 6.360 5 200 4.110 0.981 1.300 3.470 4.120 4.680 6.400 6 184 4.143 1.158 1.000 3.500 4.225 4.942 6.510 7 163 4.305 1.028 1.000 3.630 4.260 4.955 6.490 8 154 4.531 1.030 1.000 3.883 4.570 5.255 6.410 B2 R. Boegeholz, J. Guerra, E. Scheihing Fig. B.2: Variability in samples for the passing rate (PASSRATE), over the 8 semesters. Table of Fig. B.2: Passing rate (PASSRATE) Semester N Mean St. Dev. Min Pctl(25) Median Pctl(75) Max 1 365 0.672 0.295 0 0.4 0.7 1 1 2 337 0.526 0.321 0.000 0.250 0.500 0.750 1.000 3 280 0.554 0.338 0.000 0.333 0.585 0.833 1.000 4 241 0.558 0.331 0.000 0.250 0.600 0.800 1.000 5 204 0.618 0.336 0.000 0.333 0.600 1.000 1.000 6 191 0.593 0.354 0.000 0.333 0.600 1.000 1.000 7 164 0.661 0.338 0.000 0.333 0.750 1.000 1.000 8 158 0.693 0.314 0.000 0.500 0.775 1.000 1.000 Table of Fig. B.3: First-time-enrolled subjects rate (FIRSTIME) Semester N Mean St. Dev. Min Pctl(25) Median Pctl(75) Max 1 365 0.672 0.295 0 0.4 0.7 1 1 2 337 0.526 0.321 0.000 0.250 0.500 0.750 1.000 3 280 0.554 0.338 0.000 0.333 0.585 0.833 1.000 4 241 0.558 0.331 0.000 0.250 0.600 0.800 1.000 5 204 0.618 0.336 0.000 0.333 0.600 1.000 1.000 6 191 0.593 0.354 0.000 0.333 0.600 1.000 1.000 7 164 0.661 0.338 0.000 0.333 0.750 1.000 1.000 8 158 0.693 0.314 0.000 0.500 0.775 1.000 1.000 Modeling Trajectories of Delayed Degree Completion B3 Fig. B.3: Variability in samples for the first-time-enrolled subjects rate (FIRSTIME), over the 8 semesters. Fig. B.4: Variability in the samples for the academic workload (WKLOAD), throughout the 8 semesters. B4 R. Boegeholz, J. Guerra, E. Scheihing Table of Fig. B.4: Academic workload (WKLOAD) Semester N Mean St. Dev. Min Pctl(25) Median Pctl(75) Max 1 365 1.036 0.108 0.131 1.067 1.067 1.067 1.067 2 337 0.770 0.226 0.000 0.667 0.800 0.984 1.133 3 280 0.754 0.196 0.000 0.656 0.767 0.875 1.167 4 241 0.713 0.251 0.000 0.600 0.733 0.885 1.148 5 204 0.797 0.253 0.000 0.667 0.867 0.942 1.333 6 191 0.832 0.316 0.000 0.633 0.918 1.049 1.333 7 164 0.827 0.203 0.000 0.700 0.900 0.938 1.167 8 158 0.926 0.289 0.000 0.738 0.967 1.167 1.500 Fig. B.5: Variability in the samples for the alpha difficulty (DIFFIC A), throughout the 8 semesters. Table of Fig. B.5: Difficulty (alpha) (DIFFIC A) Semester N Mean St. Dev. Min Pctl(25) Median Pctl(75) Max 1 365 2.009 0.331 0.053 1.822 2.248 2.248 2.248 2 324 1.914 0.386 0.465 1.737 2.005 2.209 2.615 3 277 1.741 0.340 0.742 1.488 1.830 2.030 2.610 4 235 1.552 0.490 0.155 1.357 1.627 1.876 2.484 5 201 1.490 0.472 0.096 1.151 1.552 1.846 2.484 6 184 1.336 0.487 0.107 1.018 1.285 1.747 2.450 7 163 1.307 0.509 0.450 0.805 1.297 1.772 2.415 8 154 1.068 0.436 0.400 0.747 1.014 1.299 2.443 Modeling Trajectories of Delayed Degree Completion B5 Fig. B.6: Variability in samples for beta difficulty (DIFFIC B), over the 8 semesters. Table of Fig. B.6: Difficulty (beta) (DIFFIC B) Semester N Mean St. Dev. Min Pctl(25) Median Pctl(75) Max 1 365 0.922 0.089 0.053 0.909 0.956 0.956 0.956 2 324 0.913 0.084 0.402 0.908 0.938 0.952 0.973 3 277 0.897 0.058 0.615 0.862 0.921 0.939 0.976 4 235 0.882 0.067 0.470 0.854 0.900 0.926 0.969 5 201 0.853 0.096 0.452 0.814 0.890 0.923 0.962 6 184 0.812 0.121 0.107 0.727 0.841 0.914 0.962 7 163 0.773 0.156 0.397 0.611 0.841 0.911 0.958 8 154 0.709 0.175 0.347 0.567 0.742 0.865 0.959 Table of Fig. B.7: Disparity in the semester (DISPAR) Semester N Mean St. Dev. Min Pctl(25) Median Pctl(75) Max 1 365 0.000 0.000 0 0 0 0 0 2 337 0.110 0.112 0.000 0.000 0.143 0.143 0.429 3 280 0.132 0.106 0.000 0.000 0.143 0.143 0.429 4 241 0.190 0.136 0.000 0.143 0.143 0.286 0.571 5 204 0.196 0.124 0.000 0.143 0.143 0.286 0.571 6 191 0.237 0.161 0.000 0.143 0.286 0.286 0.714 7 164 0.257 0.150 0.000 0.143 0.286 0.286 0.857 8 158 0.302 0.189 0.000 0.143 0.286 0.429 0.714 B6 R. Boegeholz, J. Guerra, E. Scheihing Fig. B.7: Variability in the samples for the disparity in the semester (DISPAR), throughout the 8 semesters. Fig. B.8: Variability in the samples for the delay related to the curriculum (DELAY), throughout the 8 semesters. Modeling Trajectories of Delayed Degree Completion B7 Table of Fig. B.8: Delay related to curriculum (DELAY) Statistic N Mean St. Dev. Min Pctl(25) Median Pctl(75) Max 1 365 0.000 0.000 0 0 0 0 0 2 337 0.052 0.055 −0.048 0.000 0.048 0.107 0.143 3 280 0.131 0.085 −0.020 0.057 0.143 0.190 0.286 4 241 0.173 0.105 −0.036 0.086 0.179 0.257 0.429 5 204 0.232 0.134 0.000 0.114 0.286 0.333 0.476 6 191 0.275 0.165 0.000 0.120 0.314 0.387 0.643 7 164 0.316 0.214 −0.048 0.114 0.371 0.486 0.714 8 158 0.362 0.234 0.000 0.107 0.371 0.571 0.857 Fig. B.9: Variability in the samples for the curricular progress (PROGRE), throughout the 8 semesters. Table of Fig. B.9: Curricular progress (PROGRE) Semester N Mean St. Dev. Min Pctl(25) Median Pctl(75) Max 1 365 0.094 0.041 0.000 0.061 0.100 0.140 0.143 2 337 0.055 0.037 0.000 0.020 0.061 0.082 0.143 3 280 0.053 0.037 0.000 0.020 0.041 0.082 0.143 4 241 0.053 0.037 0.000 0.020 0.061 0.082 0.122 5 204 0.062 0.039 0.000 0.020 0.061 0.102 0.143 6 191 0.065 0.047 0.000 0.020 0.061 0.102 0.163 7 164 0.074 0.046 0.000 0.041 0.080 0.122 0.143 8 158 0.082 0.046 0.000 0.041 0.082 0.122 0.163 B8 R. Boegeholz, J. Guerra, E. Scheihing Fig. B.11: Biplot and correlogram for semester 1. Fig. B.13: Biplot and correlogram for semester 2. Fig. B.15: Biplot and correlogram for semester 3. Modeling Trajectories of Delayed Degree Completion B9 Fig. B.17: Biplot and correlogram for semester 4. Fig. B.19: Biplot and correlogram for semester 5. Fig. B.21: Biplot and correlogram for semester 6. B10 R. Boegeholz, J. Guerra, E. Scheihing Fig. B.23: Biplot and correlogram for semester 7. Fig. B.25: Biplot and correlogram for semester 8.