<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>W. E. (1987). Student
retention and the use of campus facilities by race. NASPA
Journal</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Prediction of Student Success Using Enrolment Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Arban Uka</string-name>
          <email>auka@epoka.edu.al</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nihat Cengiz</string-name>
          <email>ncengiz@epoka.edu.al</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Epoka University, Department of Computer Engineering</institution>
          ,
          <addr-line>Rr. Tiranë-Rinas,Km. 12, 1039 Tirana</addr-line>
          ,
          <country country="AL">Albania</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Epoka University, Department of Computer Engineering</institution>
          ,
          <addr-line>Rr. Tiranë-Rinas,Km. 12, 1039 Tirana</addr-line>
          ,
          <country country="AL">Albania</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1987</year>
      </pub-date>
      <abstract>
        <p>Predicting the success of students as a function of different predictors has been a topic that has been investigated over the years. This paper explores the socio-demographic variables like gender, region lived and studied, nationality and high school degree that may influence success of students. We examine to what extent these factors help us to predict students' academic achievement and will help to identify the vulnerable students and their need for extra tutoring or similar supportive services at an early time. We analyzed the data of the Epoka University students that have been enrolled from 2007 to 2013. The sample includes 1211 undergraduate students where 716 did and were supposed to complete the three-year bachelor studies in the past six semesters. Based on the data mining techniques the most important predictors for student success were the students' high school GPA and gender. For students with high school grades below average, females were found to have a higher percentage of success than boys. No significant correlation was found between the students' success and the demographic information.</p>
      </abstract>
      <kwd-group>
        <kwd>Academic achievement</kwd>
        <kwd>influence</kwd>
        <kwd>classification tree</kwd>
        <kwd>outcome</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>Increasing the student graduation and decreasing the dropout rates
is a long term goal of the higher education institutions. From the
students’ perspective, a timely and successful graduation is vital
as these two factors would strongly affect their employability rate.
Employability rate has become an indicator in determining the
ranking of higher education institution (HEI), thus HEIs are
focusing more on increasing this rate [2].</p>
      <p>Many of the students studying at the university face several
difficulties during the first year and thus the performance of the
first year has been identified as an important predictor of timely
graduation rate. In terms of keeping the students in the university,
the retention rate is a factor that has been studied extensively.
Mallincrodt and Sedlacek (1987) found that freshman class
attrition rate were greater than the other academic years with
numbers running up to 30%.[3] Therefore most researchers
targeted the first year students. An early identification of the
students at high risk of failing will enable a timely intervention
with the necessary measures by the educators that would increase
the graduation rate. Preventing students' failure depends on the
identification of the factors affecting success.</p>
      <p>Here in this work we will analyze whether the background
information has any effect on the success rate of regular students.
The only data we collected during the registration period of Epoka
University based on the registration form. The content of this
form determined by the local authorities and University
Administration. In this study we tried to get answers if we can use
this data to predict student success. The main objective of our
study is to determine the factors that may affect the study
outcomes in Epoka University.</p>
      <p>2. DATA AND METHODOLOGY
Epoka University student management system does not provide
data in the format ready for a direct statistical analysis and
modeling. Therefore a data preparation and cleaning were
undertaken to prepare database for modeling.</p>
      <p>Table Descriptive statistics – Study outcome (716 students)</p>
      <sec id="sec-1-1">
        <title>Descriptive count</title>
        <p>L
I
A
F
S
S
A
P
l
a
t
o
T</p>
      </sec>
      <sec id="sec-1-2">
        <title>GENDER</title>
      </sec>
      <sec id="sec-1-3">
        <title>COUNTR Y</title>
      </sec>
      <sec id="sec-1-4">
        <title>NATION ALITY</title>
      </sec>
      <sec id="sec-1-5">
        <title>REGION HS_GPA</title>
      </sec>
      <sec id="sec-1-6">
        <title>Domain M F ALB</title>
        <p>TUR
KOS
OTH
ALB
OTH
CITY
VILL.</p>
        <p>UPPER
INTER.
LOWER</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2.1. Data and Methodology</title>
      <p>Outcome that we used in our analysis is for the outcome of the
student at the end of three-year study. We measured only
outcomes, labeled as: Pass and Fail. Students labeled ‘Pass’
successfully completed the program at the end of three years.
Students labeled as ‘Fail’ include the withdrawn students from the
program voluntarily or by the academic registry for not fulfilling
the regulations. Those students who stayed on the program until
the end of the study but scored less than the graduation grade
(2.00) were also allocated into this category.</p>
      <p>The data set with numeric continuous variable such as secondary
school grade (HS GPA) was converted into a categorical variable
with only three levels A (UPPER), B (INTERMEDIATE) or C
(LOWER) denoting grades above 9 out of 10, grades between 8
and 9 and grades less than 8 respectively. Other variables
(nationality, citizenship, and region) were classified upon major
groups.</p>
      <p>In this study we conducted three main types of data mining
approaches. Descriptive approach which concerns the nature of
the dataset such as the frequency table and the relationship
between the attributes obtained using cross tabulation analysis.
Predictive approach which is conducted by using four different
classification trees and a comparison between these and Logistic
regression to confirm the accuracy of the predictors.</p>
      <p>Classification tree models can handle a large number of predictor
variables, are non-parametric, can capture nonlinear relationships
and complex interactions between predictors and dependent
variable.[1]
Before generating the classification trees we classified the
variables according to the study outcome, i.e. whether students are
eligible to be graduated or not. We used attribute selection to rank
the variables by their importance for further analysis. Then we
generated the classification trees in four different growing
methods.</p>
    </sec>
    <sec id="sec-3">
      <title>2.2. Summary Data Description</title>
      <p>We carried out a cross-tabulation for each variable and the study
outcome after cleaning the data as shown in the table above. Table
shows that the majority of the successful students are female (over
57%) which is the result of the fact that 74.5% of the female
students successfully completed the study. This suggests that
female students are more likely to succeed than their male
classmates. In terms of country and nationality it is clearly seen
that Albanian population is leading the group.</p>
      <p>An expected result has been observed in secondary school
degrees. We can say that high school degree graduation ratio is
directly proportional to the university graduation ratio. While 82%
of upper students were able to complete the study on time 56% of
intermediate and 32% of lower group students were able to
complete.</p>
    </sec>
    <sec id="sec-4">
      <title>2.3. Decision Trees</title>
      <p>Although the results of the attribute selection suggests continuing
analysis with only the subset of predictors, we included all
available predictors in our classification trees but only 2 variables
were used in the diagrams: HS_GPA and GENDER. Even though
some variables may have little significance to the overall
prediction outcome, they can be essential to a specific record [1].
Almost all growing methods, (CHAID, exhaustive CHAID, CRT
and QUEST) generated exactly the same trees. The largest
successful group consists of 272 (38%) students. HS_GPA of this
group is over 90%. The largest unsuccessful group contains 237
students (33% of all participants). They have a HS_GPA less than
80%. The next largest group considered also as unsuccessful
students are male students having lower HS_GPA.</p>
      <p>As the cross-validation estimate of the risk (0.309) indicates that
the successful or unsuccessful students are predicted with an error
of 30.9% of the cases which means the risk of misclassifying a
student is approximately 31%. This result is consistent with the
results in the CHAID classification matrix. The Overall
percentage shows that the model only classified correctly 70% of
students. The classification tables, however, reveal one potential
problem with this model: for unsuccessful students, it predicts as
successful for only 65.9% of them, which means that 34% of
failing students are inaccurately classified with the passing
students.</p>
    </sec>
    <sec id="sec-5">
      <title>2.4. Logistic regression</title>
      <p>The Variables not in the Equation table in block 0 shows that four
of the five variables are individually significant predictors of
whether a student is successful or not. Region is not a significant
predictor. The variables not in the Equation table in block 1 shows
that only high school grade point average and gender are
significant predictors, but not the other variables. This result also
confirms why these two were the only variables used in decision
trees</p>
    </sec>
    <sec id="sec-6">
      <title>3. CONCLUSIONS</title>
      <p>This study examines the background information from enrolment
data that impacts upon the study outcome programs at the Epoka
University. Based on results, the classification accuracy from the
classification trees was significantly high 71% in all tree methods.
Although all the variables except the region individually
significant predictors as described in attribute selection trees
displayed only two variables Gender and secondary school
degree. This outcome is also confirmed by the logistic regression.
Block 0 classification implied that all except region were good
predictors (p&lt;,001) but block 1 classification highlighted that only
gender and secondary school degree were significant.</p>
    </sec>
    <sec id="sec-7">
      <title>4. REFERENCES</title>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>