=Paper= {{Paper |id=Vol-1183/bkt20y_paper9 |storemode=property |title= Prediction of Student Success Using Enrollment Data |pdfUrl=https://ceur-ws.org/Vol-1183/bkt20y_paper09.pdf |volume=Vol-1183 |dblpUrl=https://dblp.org/rec/conf/edm/CengizU14 }} == Prediction of Student Success Using Enrollment Data== https://ceur-ws.org/Vol-1183/bkt20y_paper09.pdf
                        Prediction of Student Success Using Enrolment Data
                          Nihat Cengiz                                                         Arban Uka
                        Epoka University                                                    Epoka University
              Department of Computer Engineering                                  Department of Computer Engineering
                    Rr. Tiranë-Rinas,Km. 12                                             Rr. Tiranë-Rinas,Km. 12
                     1039 Tirana, Albania                                                1039 Tirana, Albania
                    ncengiz@epoka.edu.al                                                  auka@epoka.edu.al




ABSTRACT                                                               the graduation rate. Preventing students' failure depends on the
Predicting the success of students as a function of different          identification of the factors affecting success.
predictors has been a topic that has been investigated over the        Here in this work we will analyze whether the background
years. This paper explores the socio-demographic variables like        information has any effect on the success rate of regular students.
gender, region lived and studied, nationality and high school          The only data we collected during the registration period of Epoka
degree that may influence success of students. We examine to           University based on the registration form. The content of this
what extent these factors help us to predict students’ academic        form determined by the local authorities and University
achievement and will help to identify the vulnerable students and      Administration. In this study we tried to get answers if we can use
their need for extra tutoring or similar supportive services at an     this data to predict student success. The main objective of our
early time.                                                            study is to determine the factors that may affect the study
                                                                       outcomes in Epoka University.
We analyzed the data of the Epoka University students that have
been enrolled from 2007 to 2013. The sample includes 1211                   2. DATA AND METHODOLOGY
undergraduate students where 716 did and were supposed to              Epoka University student management system does not provide
complete the three-year bachelor studies in the past six semesters.    data in the format ready for a direct statistical analysis and
                                                                       modeling. Therefore a data preparation and cleaning were
Based on the data mining techniques the most important                 undertaken to prepare database for modeling.
predictors for student success were the students’ high school GPA
and gender. For students with high school grades below average,        Table Descriptive statistics – Study outcome (716 students)
females were found to have a higher percentage of success than                                   Descriptive
boys. No significant correlation was found between the students’                                      count                       %
success and the demographic information.
                                                                                                                 PASS




                                                                                                                                      PASS
                                                                                                        FAIL




                                                                                                                           FAIL




                                                                                                                                                Total
Keywords
Academic achievement, influence, classification tree, outcome                       Domain
                                                                                    M                221       189      53.9      46.1       57.3
                                                                       GENDER
     1. INTRODUCTION                                                                F                 78       228      25.5      74.5       42.7
Increasing the student graduation and decreasing the dropout rates                  ALB              238       372      39.0      61.0       85.2
is a long term goal of the higher education institutions. From the     COUNTR       TUR               35        14      71.4      28.6        6.8
students’ perspective, a timely and successful graduation is vital     Y            KOS               14        17      45.2      54.8        4.3
as these two factors would strongly affect their employability rate.                OTH               12        14      46.2      53.8        3.6
Employability rate has become an indicator in determining the          NATION       ALB              256       382      40.1      59.9       89.1
ranking of higher education institution (HEI), thus HEIs are           ALITY        OTH               43        35      55.1      44.9       10.9
focusing more on increasing this rate [2].                                          CITY             262       372      41.3      58.7       88.5
                                                                       REGION
                                                                                    VILL.             37        44      45.7      54.3       11.3
Many of the students studying at the university face several                        UPPER             48       224      17.6      82.4       38.0
difficulties during the first year and thus the performance of the     HS_GPA       INTER.            89       113      44.1      55.9       28.2
first year has been identified as an important predictor of timely                  LOWER            160        77      67.5      32.5       33.1
graduation rate. In terms of keeping the students in the university,
the retention rate is a factor that has been studied extensively.      2.1. Data and Methodology
Mallincrodt and Sedlacek (1987) found that freshman class              Outcome that we used in our analysis is for the outcome of the
attrition rate were greater than the other academic years with         student at the end of three-year study. We measured only
numbers running up to 30%.[3] Therefore most researchers               outcomes, labeled as: Pass and Fail. Students labeled ‘Pass’
targeted the first year students. An early identification of the       successfully completed the program at the end of three years.
students at high risk of failing will enable a timely intervention     Students labeled as ‘Fail’ include the withdrawn students from the
with the necessary measures by the educators that would increase
program voluntarily or by the academic registry for not fulfilling      Almost all growing methods, (CHAID, exhaustive CHAID, CRT
the regulations. Those students who stayed on the program until         and QUEST) generated exactly the same trees. The largest
the end of the study but scored less than the graduation grade          successful group consists of 272 (38%) students. HS_GPA of this
(2.00) were also allocated into this category.                          group is over 90%. The largest unsuccessful group contains 237
                                                                        students (33% of all participants). They have a HS_GPA less than
The data set with numeric continuous variable such as secondary         80%. The next largest group considered also as unsuccessful
school grade (HS GPA) was converted into a categorical variable         students are male students having lower HS_GPA.
with only three levels A (UPPER), B (INTERMEDIATE) or C
(LOWER) denoting grades above 9 out of 10, grades between 8             As the cross-validation estimate of the risk (0.309) indicates that
and 9 and grades less than 8 respectively. Other variables              the successful or unsuccessful students are predicted with an error
(nationality, citizenship, and region) were classified upon major       of 30.9% of the cases which means the risk of misclassifying a
groups.                                                                 student is approximately 31%. This result is consistent with the
                                                                        results in the CHAID classification matrix. The Overall
In this study we conducted three main types of data mining              percentage shows that the model only classified correctly 70% of
approaches. Descriptive approach which concerns the nature of           students. The classification tables, however, reveal one potential
the dataset such as the frequency table and the relationship            problem with this model: for unsuccessful students, it predicts as
between the attributes obtained using cross tabulation analysis.        successful for only 65.9% of them, which means that 34% of
Predictive approach which is conducted by using four different          failing students are inaccurately classified with the passing
classification trees and a comparison between these and Logistic        students.
regression to confirm the accuracy of the predictors.

Classification tree models can handle a large number of predictor
                                                                        2.4. Logistic regression
variables, are non-parametric, can capture nonlinear relationships      The Variables not in the Equation table in block 0 shows that four
                                                                        of the five variables are individually significant predictors of
and complex interactions between predictors and dependent
                                                                        whether a student is successful or not. Region is not a significant
variable.[1]
                                                                        predictor. The variables not in the Equation table in block 1 shows
Before generating the classification trees we classified the            that only high school grade point average and gender are
                                                                        significant predictors, but not the other variables. This result also
variables according to the study outcome, i.e. whether students are
                                                                        confirms why these two were the only variables used in decision
eligible to be graduated or not. We used attribute selection to rank    trees
the variables by their importance for further analysis. Then we
generated the classification trees in four different growing                 3. CONCLUSIONS
methods.                                                                This study examines the background information from enrolment
                                                                        data that impacts upon the study outcome programs at the Epoka
2.2. Summary Data Description                                           University. Based on results, the classification accuracy from the
We carried out a cross-tabulation for each variable and the study       classification trees was significantly high 71% in all tree methods.
outcome after cleaning the data as shown in the table above. Table      Although all the variables except the region individually
shows that the majority of the successful students are female (over     significant predictors as described in attribute selection trees
57%) which is the result of the fact that 74.5% of the female           displayed only two variables Gender and secondary school
students successfully completed the study. This suggests that           degree. This outcome is also confirmed by the logistic regression.
female students are more likely to succeed than their male              Block 0 classification implied that all except region were good
classmates. In terms of country and nationality it is clearly seen      predictors (p<,001) but block 1 classification highlighted that only
that Albanian population is leading the group.                          gender and secondary school degree were significant.

An expected result has been observed in secondary school
degrees. We can say that high school degree graduation ratio is
                                                                             4. REFERENCES
directly proportional to the university graduation ratio. While 82%
                                                                        [1]. Kovačić, Z.J. 2010, Early Prediction of Student Success:
of upper students were able to complete the study on time 56% of
                                                                             Mining Students Enrolment Data, proceedings of Informing
intermediate and 32% of lower group students were able to
                                                                             Science & IT Education Conference (InSITE) 2010, Open
complete.                                                                    Polytechnic, Wellington, New Zealand

2.3. Decision Trees                                                     [2]. Bratti, M., McKnight, A., Naylor, R., & Smith, J. (2004):
                                                                             Higher Education Out-comes, Graduate Employment and
Although the results of the attribute selection suggests continuing
                                                                             University Performance Indicators. In: Journal of the Royal
analysis with only the subset of predictors, we included all                 Statistical Society, 167(3), pp 475-496.
available predictors in our classification trees but only 2 variables
were used in the diagrams: HS_GPA and GENDER. Even though               [3]. Mallinckrodt, B., & Sedlacek, W. E. (1987). Student
                                                                             retention and the use of campus facilities by race. NASPA
some variables may have little significance to the overall
                                                                             Journal, 24, 28-32.
prediction outcome, they can be essential to a specific record [1].