=Paper=
{{Paper
|id=Vol-1183/ncfpal_paper06
|storemode=property
|title=Non-cognitive Factors of Learning as Predictors of Academic Performance in Tertiary Education
|pdfUrl=https://ceur-ws.org/Vol-1183/ncfpal_paper06.pdf
|volume=Vol-1183
|dblpUrl=https://dblp.org/rec/conf/edm/GrayMO14
}}
==Non-cognitive Factors of Learning as Predictors of Academic Performance in Tertiary Education==
Non-cognitive factors of learning as predictors of
academic performance in tertiary education
Geraldine Gray, Colm McGuinness, Philip Owende
Institute of Technology Blanchardstown
Blanchardstown Road North
Dublin 15, Ireland
geraldine.gray@itb.ie
ABSTRACT cation providers, particularly log data from Virtual Learn-
This paper reports on an application of classification and ing Environments and Intelligent tutoring systems [16, 2].
regression models to identify college students at risk of fail- Further work is needed to determine if gathering additional
ing in first year of study. Data was gathered from three predictors of academic performance can add value to exist-
student cohorts in the academic years 2010 through 2012 ing models of learning.
(n=1207). Students were sampled from fourteen academic
courses in five disciplines, and were diverse in their aca- Research from educational psychology has identified a range
demic backgrounds and abilities. Metrics used included non- of non-cognitive psychometric factors that are directly or
cognitive psychometric indicators that can be assessed in the indirectly related to academic performance in tertiary ed-
early stages after enrolment, specifically factors of personal- ucation, particularly factors of personality, motivation, self
ity, motivation, self regulation and approaches to learning. regulation and approaches to learning [8, 9, 35, 39, 44, 25].
Models were trained on students from the 2010 and 2011 co- Personality based studies have focused on the Big-5 per-
horts, and tested on students from the 2012 cohort. Is was sonality dimensions of conscientiousness, openness, extro-
found that classification models identifying students at risk version, stability and agreeableness [9, 22, 27]. There is
of failing had good predictive accuracy (> 79%) on courses broad agreement that conscientiousness is the best person-
that had a significant proportion of high risk students (over ality based predictor of academic performance [44]. For ex-
30%). ample, Chamorro et al. [9] reported a correlation of r=0.37
(p<0.01, n=158) between conscientiousness and academic
performance. Correlations between academic performance
Keywords and openness to new ideas, feelings and imagination are
Educational data mining, learning analytics, academic per-
weaker. Chamorro et al. [9] reported a correlation of r=0.21
formance, non cognitive factors of learning, personality, mo-
(p<0.01, n=158) but lower correlations were reported in
tivation, learning style, learning approach, self-regulation
other studies (see Table 1) which may be explained by vari-
ations in assessment type. Open personalities tend to do
1. INTRODUCTION AND LITERATURE RE- better when assessment methods are unconstrained by sub-
VIEW mission rules and deadlines [27]. Studies are inconclusive on
Learning is a latent variable, typically measured as academic the predictive validity of other personality factors [44].
performance in continuous assessment and end of term ex-
aminations [33]. Identifying predictors of academic perfor- A meta-analysis of 109 studies analysing psychosocial and
mance has been the focus of research for many years [20, study skill factors found two factors of motivation, namely
34], and continues as an active research topic [6, 8], indicat- self-efficacy (90% CI [0.444,0.548]) and achievement motiva-
ing the inherent difficulty in generating models of learning tion (90% CI [0.353, 0.424]), had the highest correlations
[29, 46]. More recently, the application of data mining to with academic performance [39]. Distinguishing between
educational settings is emerging as an evolving and grow- learning (intrinsic) achievement and performance (extrin-
ing research discipline [40, 43]. Educational Data Mining sic) achievement goals, Eppler and Harju [19] found learn-
(EDM) aims to better understand students and how they ing goals (r=0.3, p<0.001, n=212) were more strongly cor-
learn through the use of data analytics on educational data related with academic performance than performance goals
[42, 10]. Much of the published work to date is based on ever- (r=0.13, p> 0.05, n=212). Covington [13] however argues
increasing volumes of data systematically gathered by edu- that setting goals in itself is not enough, as ability to self-
regulate learning can be the difference between achieving, or
not achieving, goals set. Self-regulated learning is recognised
as a complex concept to define as it overlaps with a num-
ber of other concepts including personality, self-efficacy and
goal setting [4]. Ning and Downing [35] reported high corre-
lations between self regulation and academic performance,
specifically self-testing (r=0.48, p<0.001) and monitoring
understanding (r= 0.42, p<0.001). On the other hand, Ko-
marraju and Nadler [31] found effort management, includ-
ing persistence, had higher correlation with academic perfor- of factors of motivation. Komarraju et al. [30] predicted
mance (r=0.39, p<0.01) than other factors of self-regulation GPA (R2 =0.15) from variables of personality and learn-
and found that self-regulation (monitoring and evaluating ing approach, while Bidjerano & Dai [4] had similar results
learning) did not account for any additional variance in aca- (R2 =0.11) with factors of personality and self-regulation.
demic performance over and above self-efficacy, but study
effort and study time did account for additional variance. Linear regression assumes constant variance and linearity
between independent and dependent attributes. There is
Research into approaches to learning has its foundations in evidence to suggest variance is not constant for some non-
the work of Marton & Säljö [32] who classified learners as cognitive factors. For example, De Feyter et al. [14] found
shallow or deep. Deep learners aim to understand content, low levels of self-efficacy had a positive, direct effect on aca-
while shallow learners aim to memorise content regardless demic performance for neurotic students, and for stable stu-
of their level of understanding. Later studies added strate- dents, average or higher levels of self-efficacy only had a
gic learners [18, pg. 19], whose priority is to do well, and direct effect on academic performance. In addition, Van-
will adopt either a shallow or deep learning approach de- couver & Kendall [48] found evidence that high levels of
pending on the requisites for academic success. Comparing self-efficacy can lead to overconfidence regarding exam pre-
the influence of approaches to learning on academic perfor- paredness, which in turn can have a negative impact on aca-
mance, Chamorro et al [9] reported a deep learning approach demic performance. Similarly, Poropat [38] cites evidence
(r=0.33, p<0.01) had higher correlations with academic per- of non-linear relationships between factors of personality
formance than a strategic learning approach (r=0.18, p<0.05). and academic performance, including conscientiousness and
Cassidy [8] on the other hand found correlations with a deep openness. It is therefore pertinent to ask if data mining’s
learning approach (r=0.31, p<0.01) were marginally lower empirical modelling approach is more appropriate for models
than with a strategic learning approach (r=0.32, p<0.01). based on non-cognitive factors of learning.
Differences found have been explained, in part, by assess-
ment type [49], highlighting the importance of assessment A growing number of educational data mining studies have
design in encouraging appropriate learning strategies. investigated the role of non-cognitive factors in models of
learning [6, 41, 36]. Bergin [3] cited an accuracy of 82% us-
Knight, Buckingham Shum and Littleton argued learning ing an ensemble model based on prior academic achievement,
measurement should go beyond measures of academic per- self-efficacy and study hours, but due to the small sample
formance [29], promoting greater focus on learning envi- size (n=58) could not draw reliable conclusions from the
ronment and encouragement of malleable, effective learn- findings. The class label distinguished strong (grade>55%)
ing dispositions. Disposition relates to a tendency to be- versus weak (grade<55%) academic performance based on
have in a certain way [6]. An effective learning disposition end of term results in a single module. Gray et al. [23] cited
describes attributes and behaviour characteristic of a good similar accuracies (81%, n=350) with a Support Vector Ma-
learner [6]. A range of non-cognitive psychometric factors chine model using cognitive and non cognitive attributes to
have been associated with an effective learning disposition distinguish high risk (GPA<2.0) from low risk (GPA≥2.5)
such as a deep learning approach, ability to self-regulate, set- students based on first year GPA. Model accuracy was con-
ting learning goals, persistence, conscientiousness and sub- tingent on modelling younger students (under 21) and older
factors of openness, namely intellectual curiosity, creativity students (over 21) separately.
and open-mindednesss [6, 29, 47]. A lack of correlation be-
tween such non-cognitive factors and academic performance The focus of this study was to investigate if non-cognitive
is in itself insightful, suggesting assessment design that fails factors of learning, measured during first year student in-
to reward important learning dispositions. It has been ar- duction, were predictive of academic performance at the
gued that effective learning dispositions are as important as end of first year of study. We evaluated both regression
discipline specific knowledge [6, 29]. models of GPA and classification models that predicted first
year students at risk of failing. Participants were from a
Statistical models have dominated data analysis in educa- diverse student population that included mature students,
tional psychology [15], particularly correlation and regres- students with disabilities, and students from disadvantaged
sion [25]. Relatively high levels of accuracy were reported socio-economic backgrounds.
in regression models of academic performance that included
cognitive and non-cognitive factors. For example, Chamorro- 2. METHODOLOGY
Premuzic et al [9] reported a coefficient of determination The following sections report on study participants and the
(R2 ) of 0.4 when predicting 2nd year GPA (based on essay study dataset. Data analysis was conducted following the
type examinations) in a regression model that included prior CRoss Industry Standard for Data Mining (CRISP-DM) us-
academic ability, personality factors and a deep learning ap- ing RapidMiner V5.3 and R V3.0.2.
proach. Robbins [39] reported similar results (R2 =0.34) in
a meta-analysis of models of cognitive ability, motivation 2.1 Description of the study participants
factors and socio-economic status. Models of non-standard
The participants were first year students at the Institute of
students were less accurate, for example Swanberg & Mar-
Technology Blanchardstown (ITB), Ireland. The admission
tinsen [44] reported R2 =0.21 in models of older students
policy at ITB supports the integration of a diverse student
(age: m=24.8) based on prior academic performance, per-
population in terms of age, disability and socio-economic
sonality, learning strategy, age and gender. Lower accuracies
background. Each September 2010 to 2012, all full-time,
were also reported in studies not including cognitive ability.
first-year students at ITB were invited to participate in the
Robbins [39] reported R2 =0.27 in a meta-analysis of models
study by completing an online questionnaire administered
Table 1: Correlations with Academic Performance in Tertiary Education
Study N age AP Temperament Motivation Learning Approach Learning Strategy
Concient- Open Self Effi- Intrinsic Extrinsic Deep Shallow Strategic Self Reg- Study Study
ious cacy Goal Goal ulation Time Effort
[4] 217 m=22 self reported GPA 0.33** 0.0.23**
[8] 97 m=23.5 GPA 0.397*** 0.398** -0.013 0.316**
[9] 158 18-21 GPA 0.37** 0.21** 0.398* -0.15 0.18*
[17] 146 17-52 GPA 0.21 0.06 0.097 -0.054 0.153
[19] 212 m=19.2 GPA 0.3*** 0.13
[27] 133 18-22 GPA 0.46** -0.08
[30] 308 18-24 self reported GPA 0.29** 0.13*
[31] 257 m=20.5 GPA 0.3** 0.14* 0.31** 0.39**
[35] 581 20.48 GPA 0.0.24**
[39] meta analysis, 18+ GPA 0.496 0.179
[44] 687 m=24.5 single exam 0.16 -0.25
*p < .05, **p < .01, ***p < 0.001
during first year student induction. A total of 1,376 (52%)
full-time, first year students completed the online question-
naire. Eliminating students who did not give permission to
be included in the study (35) and invalid data (134) resulted
in 45% of first year full time students participating in the
study (n=1207).
Participants ranged in age from 18 to 60, with an average age
of 23.27; of which, 355 (29%) were mature students (over 23),
713 (59%) were male and 494 (41%) were female. There were
32 (3%) participants registered with a disability. Students
were enrolled on fourteen courses across five academic dis-
ciplines, Business (n=402, 33%), Humanities (n=353, 29%),
Computing (n=239, 20%), Engineering (n=172, 14%) and
Horticulture (n=41, 3%).
Academic performance was measured as GPA, an aggre-
gate score of between 10 and 12 first year modules, range
0 to 4, and was calculated on first exam sitting only. The
GPA distribution (profiled sample) was compared with the
GPA distribution of the full cohort of students for that Figure 1: Notched box plots for GPA by course
year (reference sample) using a Kolmogorov-Smirnov non-
parametric test. The recorded differences in the distribution
for 2010 (D=0.032, p=0.93), 2011 (D=0.036, p=0.90) and the context. Where two questions were similar on the pub-
2012 (D=0.042, p=0.69) were not statistically significant. lished instrument, only one was included. This choice was
The distribution of GPA was also similar across the three made to reduce the overall size of the questionnaire, despite
years of study. The largest difference was between the 2010 the likely negative impact on internal reliability statistics.
and 2012 profiled samples (D=0.063, p=0.37) and was not Questionnaire validity and internal reliability were assessed
significant. To pass overall, a student must achieve a GPA using a paper-based questionnaire that included both the re-
≥ 2.0 and pass each first year module. 89% of students with vised wording of questions used on the online questionnaire
GPA > 2.5 passed all modules indication a low risk group (reduced scale), and the original questions from the pub-
that can progress to year two. 84% of students with a GPA lished instruments (original scale). The paper questionnaire
< 2 failed three or more modules, indicating a high risk was administered during scheduled first year lectures across
group falling well short of progression requirements. Of the all academic disciplines. Pearson correlations between scores
students in GPA range [2.0, 2.49], 39% passed all modules, calculated from the reduced scale, and scores calculated from
36% failed one module, 18% failed two modules, and 7 % the original scale, were high for all factors (>=0.9) except
failed more than two modules. This is a less homogenous intrinsic goal orientation and study time and environment,
group in terms of academic profile, but could be generally confirming the validity of the study instrument for those fac-
regarded as borderline, either progressing on low grades or tors. Internal reliability was assessed using Cronbach’s al-
required to repeat one or two modules in the repeat exam pha. All factors had acceptable reliability (>0.7)1 given the
sittings. Figure 1 and Table 2 illustrate GPA distribution small number of questions per scale (between 3 and 6), with
by course. the exception again of intrinsic goal orientation and study
time and environment. Learner modality data (Visual, Au-
ditory, Kinaesthetic (VAK) [21]) was based an instrument
2.2 The Study Dataset developed by the National Leaning Network Assessment Ser-
Table 3 lists the psychometric factors included in the dataset, vices (NLN) (www.nln.ie).
collected using an online questionnaire developed for the
study (www.howilearn.ie). With the exception of learning 1
While generally a Cronbach alpha of > 0.8 indicates good
modality, questions were taken from openly available, val- internal consistency, Cronbach alpha closer to 0.7 can be
idated instruments, with some changes to wording to suit regarded as acceptable for scales with fewer items [12, 45].
Table 2: Academic profile by course Table 3: Study factors, mean and standard deviation
Course Name n GPA∗ high border- low Category & Instrument Study Factor
risk line risk Personality: IPIP scales Conscientiousness (5.9±1.5)
all participants 1207 2.1±1.1 28% 16% 46% (ipip.ori.org) [22] Openness (6.1±1.3)
Computing (IT) 137 2.0±1.2 47% 11% 42% Motivation: Intrinsic Goal Orientation (7.1±1.4)
Creative Digital Media 102 2.6±1.0 20% 8% 72% MSLQ [37] Self Efficacy (6.9±1.4)
Engineering common 73 1.1±0.9 79% 8% 13% Extrinsic Goal Orientation (7.8±1.4)
Electronic & computer eng. 52 1.8±1.2 52% 10% 38% Learning approach: Deep Learner (5.4±2.9)
Mechatronics 27 1.6±1.2 63% 7% 30% R-SPQ-2F [5] Shallow Learner (1.3±1.9)
Sustainable Electrical & 20 2.8±1.1 30% 5% 65% Strategic Learner (3.4±2.5)
Control Technology Self-regulation: Self Regulation (5.9±1.4)
Horticulture 41 2.4±1.1 27% 2% 71% MSLQ [37] Study Effort (5.9±1.8)
Business General 183 1.7±1.1 56% 15% 29% Study Time & Environment (6.2±2.3)
Business with IT 60 1.8±1.2 46% 22% 32% Learner modality: Visual (7.2±2.1)
Business International 64 2.2±1.1 41% 14% 45% NLN profiler Auditory(3.3±2.2)
Sports Management 95 2.3±0.9 22% 24% 54% Kinaesthetic(4.5±2.4)
Applied Social Care 146 2.5±0.7 15% 16% 69% Other factors: Preference for group work (6.5±3.4)
Early Childcare 80 2.4±0.6 20% 28% 52% Age (23.27±7.3)
Social & Community De- 127 2.2±0.9 30% 27% 43% Male=713 (59%), Female=494 (41%)
velopment Note: All ranges are 0 to 10 apart from age.
∗
GPA mean and standard deviation.
Prior knowledge of the student available to the college at Table 4: Participant profile based on prior knowl-
registration, namely age, gender and prior academic perfor- edge, means and standard deviation
mance, was also available to the study. Access to full time
Course Name n CAO age %age Z∗
college courses in Ireland is based on academic achievement points male
in the Leaving Certificate, a set of state exams at the end of Computing (IT) 137 232±67 24±8 91% 9
secondary school. College places are offered based on CAO2 Creative Digital Media 102 305±79 23±7 68% 7
points, an aggregate score of grades achieved in a student’s Engineering common 73 220±61 20±3 92% 8
Electronic & computer eng 52 232±53 22±7 92% 3
top six leaving certificate subjects, range 0 to 600. Table 4 Mechatronics 27 238±46 21±3 85% 1
summarises participant profile by course. Sustainable Electrical & 20 199±97 27±7 95% 0
Control Technology
Horticulture 41 273±66 28±11 8% 4
3. RESULTS Business General 183 256±57 21±5 54% 10
Business with IT 60 229±75 22±5 60% 6
Correlation and regression were used to analyse relationships Business International 64 248±51 21±5 24% 6
between study factors and GPA. Subsequent analysis used Sports Management 95 306±86 23±6 84% 8
classification techniques to identify students at risk of failing. Applied Social Care 146 259±84 28±9 32% 10
Early Childcare 80 308±78 22±5 6% 7
Unless otherwise stated, models are based age, gender and Social & Community De- 127 266±78 25±8 29% 9
non-cognitive factors of learning as listed in Table 3. velopment
∗
Number of study factors differing significantly from a
All non-cognitive factors of learning failed the Shapiro−Wilk normal distribution (p<<0.001).
normality test which is common in data relating to educa-
tion and psychology [26]. However factors of personality
were normally distributed within each discipline except for
business. Intrinsic motivation and study effort were also nor- 95% confidence intervals using the bias corrected and accel-
mally distributed for engineering and computing students. erated method [7] on 1999 bootstrap iterations.
There were further improvements when analysing subgroups
by academic course. Factors of personality, self regulation Bootstrap correlation coefficients are given in Table 5. With
and intrinsic motivation were normally distributed for all the exception of learning modality, all non-cognitive factors
courses. With the exception of approaches to learning, learner were significantly correlated with GPA. The highest corre-
modality, preference for group work and GPA, other factors lations with GPA were found for approaches to learning,
were normally distributed for most courses. Table 4 illus- specifically deep learning approach (r=0.23, bootstrap 95%
trates the number of attributes that differed significantly CI[0.18, 0.29]), and study effort (r=0.19, bootstrap 95% CI
from a normal distribution by course. Larger groups were [0.13, 0.24] ). Age also had a relatively high correlation
more likely to fail tests of normality. with GPA (r=0.25, bootstrap 95% CI [0.19, 0.3]). A shallow
learning approach (r=-0.15, bootstrap 95% CI[-0.21, -0.09])
and preference for group work (r=-0.076, bootstrap 95% CI
3.1 Correlations with Academic Performance [-0.14, -0.02]) were negatively correlated with GPA. Open-
Correlations between study factors and GPA were assessed
ness had one of the weakest significant correlations with
using Pearson’s product-moment correlation coefficient (PP-
GPA (r=0.08, bootstrap 95% CI [0.03, 0.14]). Correlations
MCC). As some attributes violated the assumption of nor-
were comparable with other studies that included a diverse
mal distribution, significance was verified with bootstrapped
student population [4, 9, 28] with the exception of self ef-
2
CAO refers to the Central Applications Office with respon- ficacy (r=0.12, bootstrap 95% CI [0.06, 0.17])) which was
sibility for processing applications for undergraduate courses lower than expected. This may be reflective of the low entry
in the Higher Education Institutes in Ireland. requirements for some courses.
3.2 Regression models factors were most predictive of GPA. Approaches to learn-
Regression models predicting GPA from non-cognitive vari- ing and age were significant for models of all participants,
ables were run for the full dataset and for subgroups by computing students and engineering students, but motiva-
disciplines and by course. The coefficient of determination tion and learning strategy were more significant for Busi-
(R2 ) is reported to facilitate comparison with other stud- ness with IT. Factors of motivation, learning strategy and
ies. However R2 is influenced by the variability of the un- approaches to learning were also relevant to models in the
derlying independent variables. Consequently Achen [1, pg humanities courses. All regression models improved when
58-61] argued that prediction error is a more appropriate prior academic performance was included in the model. The
fitness measure for psychometric data. Therefore absolute most significant increase was for sports management, R2 in-
error mean and standard deviation is also reported. creased from 0.16 to 0.30. Business with IT and applied
social care also increased by more than 0.1. For all other
A regression model for all participants (R2 = 0.14) was com- regression models, R2 increased by between 0.05 and 0.09
parable with other reported models of non-cognitive factors
[4, 30]. However when modelling students by discipline and
by course, there were significant differences in model per-
3.3 Classification models
formance. A chow test [11] comparing the residual error in Classification models were generated using four classification
a regression model of all participants (full model) with the algorithms, namely Naı̈ve Bayes (NB), Decision Tree (DT),
residual errors of models by discipline (restricted models) Support Vector Machine (SVM), and k-Nearest Neighbour
showed significant differences between the full and restricted (k-NN). A binary class label was used based on end of year
models (F(17,1098)=22.02, p=0). There was also significant GPA score, range [0-4]. The two classes were: high risk stu-
differences between models based on a particular discipline dents (GPA<2, n=459); and low risk students (GPA≥2.5,
(full model) and models of courses within that discipline n=558) giving a dataset of n=1017. Borderline students (2.0
(restricted models). In computing, significant differences ≤ GPA ≤ 2.49) have not been considered to date. Gray et
of F(17,205)=2.22 (p=0.005) were found between the full al. [24] found that cross validation over-estimated model
model and the two restricted models. Within engineering, accuracy compared to models applied to a different student
a model combining mechatronics with electronic & comput- cohort. Therefore models were trained on participants from
ing engineering was not significantly different from a model 2010 and 2011 and tested on participants from 2012. All
of those two courses individually (F(17,79)=0.58, p=0.89), datasets were balanced by over sampling the minority class,
but including either common entry students and/or sustain- and attributes were scaled to have a mean of 0 and standard
able electrical & control technology resulted in significant deviation of 1. Significant attributes were identified by find-
differences between the full and restricted models. Sustain- ing the optimal threshold for selecting attributes by weight.
able electrical & control technology was therefore excluded Attributes were weighted based on uncertainty4 for DT, k-
from further consideration because of the small sample size NN and Naı̈ve Bayes models, and based on SVM weights
(n=20). Significant differences were also found in models of for SVM models. Table 6 shows the accuracies achieved and
each of the three humanities courses compared with those factors used in each model.
courses combined (F(17,302)=2.22, p=0.004). The least sig-
nificant differences were found in models of business students k-NN had the highest accuracy for models of all students
provided sport management was excluded (F(17, 307)=1.95, (66%). Accuracies for DT (61%), SVM (62%) and Naı̈ve
p=0.015). Adding sports management further increased the Bayes (62%) were similar. The most significance attributes
difference in model residual errors (F(17,334)=8.36, p=0). by weight were age, deep learning approach and study effort.
Table 6 gives model details by course and factors used in Including factors of prior academic performance improved
each model. Electronic & computer engineering students model accuracy marginally to 72%.
and mechatronic students were combined.
Model accuracy improved when modelling each course sepa-
In general, models based on technical courses had a higher rately. In general, k-NN had either the highest accuracy,
R2 than models for non technical courses. For example, en- or close to the highest accuracy, for all groups with the
gineering courses, computing (IT) and business with IT all exception of two courses, international business and early
had R2 > 0.3. Absolute error for these courses was in the childcare & education. Naı̈ve Bayes had the highest accu-
range [0.63,0.8]. The difference between the highest abso- racy for both those courses and their attributes of signif-
lute error (m=0.8, s=0.563 ) and the lowest absolute error icance were normally distributed. Five courses had accu-
(m=0.63, s=0.54) was not significant (t(15)=1.74, p=0.1). racies marginally higher than the model for all students,
Regression results for International Business was also rel- social & community development (70%), applied social care
atively good (R2 =0.27). For the remaining non-technical (68%), early childcare & education (69%), creative digital
disciplines R2 was lower (range [0.12,0.17]) but the absolute media (67%) and sports management (70%). As illustrated
error was more varied. Early childcare had the lowest abso- in Table 1, these courses were distinguished by a high av-
lute error (m=0.37, s=0.34) while general business had the erage GPA and a low failure rate. Consequently, patterns
highest absolute error (m=0.9, s=0.53). The difference was identifying high risk students may be under represented in
significant (t(15)=10.3, p<0.001) and may be explained by these groups. Accuracies for other courses were significantly
the greater distribution of GPA scores in general business. higher (≥ 79%). For example the difference between sports
management (70%) and the next highest accuracy (Engi-
There was little agreement across models on which study neering other, 79%) was significant (Z=5.86, p<0.001)5 .
4
Symmetrical uncertainty with respect to the class label.
3 5
m=mean, s=standard deviation Accuracy comparisons were based on the mean accuracy of
Table 5: Bootstrap correlations of non-cognitive factors with GPA
Study Factors: Temperament Motivation Learning Approach Learning Strategy Other Modality
C O SE IM EM De Sh St SR ST StE Group Age Gen V A K
Correlation with 0.15 0.08 0.12 0.15 0.12 0.23 -0.15 -0.16 0.13 0.1 0.19 -0.08 0.25 0.09 0.06 0.02 0.06
GPA (n=1207): *** ** *** *** *** *** *** *** *** ** *** ** *** **
*p < .05, **p < .01, ***p < 0.001; C:Conscientiousness; O:Openness; SE:Self Efficacy; IM:Intrinsic Goal Orientation; EM:Extrinsic Goal
Orientation; De:Deep Learner; Sh: Shallow Learner; St: Strategic Learner; SR: Self Regulation; ST:Study Time; StE: Study Effort; Group:Likes
to work in groups; Gen=Gender; V:Visual Learner; A:Auditory Learner; K:Kinaesthetic Learner.
Table 6: Regression and classification models by discipline, using non-cognitive factors only
Regression models: Temperament Motivation Approach Strategy Other Modality
Course N Absolute error R2 C O SE IM EM De Sh St SR ST StE G age In V A K
All 1207 0.83±0.56 0.125 + + + + *** **** *** *** ** *** *** **** * +
Computing 137 0.8 ±0.56 0.34 + + + ** + * **** *
Creative Dig Media 103 0.68±0.58 0.11 + + **** **** **** + + + + ***
Eng Common Entry 73 0.67±0.53 0.34 * + + + + + *** *** + +
Engineering other 99 0.72±0.5 0.43 + *** + + + ** ** ** * + * **** +
Horticulture 41 0.63±0.54 0.34 + + + + + **** **** **** + + + + * **** **
General Business 183 0.9±0.53 0.13 + + + + + + + + + ** +
Business With IT 60 0.67±0.52 0.48 + ** ** * + *** ** ** ** **
International Business 64 0.78±0.5 0.27 *** + + * + * **** +
Sports Management 95 0.64±0.53 0.16 + + + ** + ***
Applied Social Care 146 0.5±0.5 0.08 + + + + + + * * + + + + ****
Early childcare 80 0.37±0.34 0.17 + + + * ** + + + + +
Social & Comm Dev 127 0.63±0.5 0.12 + + + + ** + +
Classification models: Temperament Motivation Approach Strategy Other Modality
Course N Learner Accuracy Kappa C O SE IM EM De Sh St SR ST StE G age gen V A K
All 1017 11-NN 66% 0.33 X X X X X X X X X X X
Computing 122 SVM 81% 0.62 X X X X X X X X
Creative Dig Media 94 2-NN 67% 0.35 X X X X X X X
Eng Common Entry 73 SVM 94% 0.88 X X X X X X X X X
Engineering other 72 DT 79% 0.58 X X X
Horticulture 40 7-NN 86% 0.71 X X X X X X X X X X
Business General 156 5-NN 85% 0.69 X X
Business With IT 47 7-NN 83% 0.67 X X X X X X X X X
International Business 55 NB 80% 0.6 X X
Sports Mgmt 72 SVM 70% 0.39 X X X X X
Applied Social Care 122 4-NN 68% 0.37 X X X X X X X
Early childcare 58 NB 69% 0.38 X X X X
Community dev 93 2-NN 70% 0.39 X X X
Significant model coefficients: +p > .05, *p < .05, **p < .01, ***p < 0.001, ****p << 0.001; X: factors included in the classification model
C:Conscientiousness; O:Openness; SE:Self Efficacy; IM:Intrinsic Goal Orientation; EM:Extrinsic Goal Orientation; De:Deep Learner; Sh: Shallow
Learner; St: Strategic Learner; SR: Self Regulation; ST:Study Time; StE: Study Effort; G:Likes to work in groups; IN:Regression model intercept;
gen=Gender; V:Visual Learner; A:Auditory Learner; K:Kinaesthetic Learner; Engineering others: Mechatronics and Electrical & Computer Engineering.
It could be argued that the smaller sample size of course regression model and classification model of all students.
groups over estimated model accuracy as smaller samples Extrinsic motivation, preference for working alone and self
may under represent the complexity of patterns predictive regulation were also significant in the regression model, while
of academic achievement. Therefore 30 samples randomly all factors except extrinsic motivation, preference for work-
generated from the full dataset (n=100) were also mod- ing alone and study time were significant in a classification
elled. Model accuracy for the random samples was nor- model of all students. Models of individual courses also dif-
mally distributed, with mean=63.12% (s=11%), which was fered in the range of factors used. The lack of consensus
marginally lower than the model of all students (Z=2.68, in identification of significant factors may be explained by
p=0.017). an overlap in the constructs measured by each [24]. Open-
ness appeared frequently in both classification and regres-
There was little agreement across models on which study sion models despite its relatively low correlation with GPA.
factors were most predictive of high risk and low risk stu-
dents. Conscientiousness, study effort and a shallow learning In general, regression models for students in technical dis-
approach were used most frequently, followed by openness, ciplines, such as engineering, computing and business with
intrinsic motivation and age. There was no significant im- IT, had a higher coefficient of determination (R2 ) than mod-
provement in model accuracy when prior academic perfor- els of non technical disciplines. However the coefficient of
mance was included in each model. For example, the largest determination did not reflect prediction error, highlighting
increase in accuracy was from 79% to 82% in a model of the underlying variability in independent variables. For ex-
Engineering students. ample, early childcare (R2 =0.17) and sports management
(R2 =0.16) had the same R2 , but sports management had a
higher absolute error (0.64±0.53) than early childcare (0.37
4. CONCLUSIONS ± 0.34). The difference was significant (t(15)=3.996, p=0.001).
Results from this study suggest that models of academic per- Prediction error was reflective of the GPA distribution for
formance, based on non-cognitive psychometric factors mea- each course regardless of discipline.
sured during first year student induction, can achieve good
predictive accuracy, particularly when individual courses are Classification models that distinguished between high and
modelled separately. A deep learning approach, study effort low risk students based on GPA had good accuracy for both
and age had the highest correlations with GPA across all technical and non technical disciplines, particularly for courses
disciplines. These factors were also significant in both the with a significant proportion (>30%) of high risk students.
100 bootstrap samples from each group. As with regression, models of individual courses outper-
formed both models of the full dataset and models of random in higher education. Studies in Higher Education,
samples taken from the full dataset. This would suggest 37(7):1–18, 2011.
models trained for specific courses can outperform models [9] Chamorro-Premuzic, T. and Furnham, A. Personality,
generalising patterns for all students. k-NN, a non-linear intelligence and approaches to learning as predictors of
classification algorithm, gave optimal or near optimal ac- academic performance. Personality and Individual
curacies for most course groups. This may be reflective of Differences, 44:1596–1603, 2008.
non-linear patterns in the dataset. [10] Chatti, M. A., Dychhoff, A. L., Schroeder, U., and
Thüs, H. A reference model for learning analytics.
Including a cognitive factor of prior academic performance International Journal of Technology Enhanced
did not improve the accuracy of classification models sig- Learning. Special Issue on State of the Art in TEL,
nificantly. On the other hand, Gray et al. [23] reported pages 318–331, 2012.
that predictive accuracy of models based on cognitive fac- [11] Chow, G. C. Tests of equality between sets of
tors only (prior academic performance) increased marginally coefficients in two linear regressions. Econometrica,
when non-cognitive factors were included in the model. This 28(3):591–605, 1960.
would suggest a high overlap in constructs captured by both [12] Cooper, A. J., Smillie, L. D., and Corr, P. J. A
cognitive and non-cognitive factors of learning. confirmatory factor analysis of the mini-IPIP
five-factor model personality scale. Personality and
Model accuracies are based on a heuristic search of attribute Individual Differences, 48(5):688–691, 2010.
subsets. A more exhaustive search is needed to verify opti-
[13] Covington, M. V. Goal theory, motivation, and school
mal attribute subsets. Further work is also required to inves-
achievement: An integrative review. Annual Review of
tigate principal components amongst non-cognitive factors.
Psychology, 51:171–200, 2000.
In addition, results are based on full time students in a tra-
ditional classroom setting at one college. Further work is [14] De Feyter, T., Caers, R., Vigna, C., and Berings, D.
needed to determine if these results generalise to students Unraveling the impact of the big five personality traits
in other colleges, and other delivery modes. on academic performance. The moderating and
mediating effects of self-efficacy and academic
motivation. Learning and Individual Differences,
5. ACKNOWLEDGMENTS 22:439–448, 2012.
The authors would like to thank Institute of Technology [15] Dekker, G., Pechenizkiy, M., and Vleeshouwers, J.
Blanchardstown for their support in facilitating this research, Predicting students drop out: a case study. In Barnes,
and staff at the National Learning Network for assistance T., Desmarais, M. C., Romero, C., and Ventura, S.,
administering questionnaires during student induction. editors, Proceedings of the 2nd International
Conference on Educational Data Mining, pages 41–50,
6. REFERENCES Cordoba, Spain, 2009.
[1] Achen, C. Intrepreting and Using Regression. Number [16] Drachsler, H. and Greller, W. The pulse of learning
07-029 in Quantitative Applications in the Social analytics. Understandings and expectations from the
Sciences. Sage Publications, Inc, 1982. stakeholders. In 2nd International Conference on
[2] Baker, R. S. J. D. and Yacef, K. The state of Learning Analytics and Knowledge, pages 120–129,
educational data mining in 2009: A review and future Vancouver, BC, Canada, 29 April- 2 May 2012. ACM.
visions. Journal of Educational Data Mining, [17] Duff, A., Boyle, E., Dunleavy, K., and Ferguson, J.
1(1):3–17, 2010. The relationship between personality, approach to
[3] Bergin, S. Statistical and machine learning models to learning and academic performance. Personality and
predict programming performance. PhD thesis, Individual Differences, 36:1907–1920, 2004.
Computer Science, NUI Maynooth, 2006. [18] Entwhistle, N. Contrasting perspectives in learning. In
[4] Bidjerano, T. and Dai, D. Y. The relationship between Marton, F., Hounsell, D., and Entwhistle, N., editors,
the big-five model of personality and self-regulated The Experience of Learning, pages 3–22. Edinburgh:
learning strategies. Learning and Individual University of Edinburgh, Centre for Teaching,
Differences, 17:69 – 81, 2007. Learning and Assessment, 2005.
[5] Biggs, J., Kember, D., and Leung, D. The revised [19] Eppler, M. A. and Harju, B. L. Achievement
two-factor study process questionnaire: R-SPQ-2F. motivation goals in relation to academic performance
British Journal of Education Psychology, 71:133–149, in traditional and nontraditional college students.
2001. Research in Higher Education, 38 (5):557–573, 1997.
[6] Buckingham Shum, S. and Deakin Crick, R. Learning [20] Farsides, T. and Woodfield, R. Individual differences
dispositions and transferable competencies. pedagogy, and undergraduate academic success: The roles of
modelling and learning analytics. In 2nd International personality, intelligence, and application. Personality
Conference on Learning Analytics and Knowledge, and Individual Differences, 34:1225–1243, 2003.
pages 92–101, Vancouver, BC, Canada, 2012. [21] Fleming, N. D. I’m different, not dumb. Modes of
[7] Carpenter, J. and Bithell, J. Bootstrap confidence presentation (VARK) in the tertiary classroom.
intervals - when, which, what? A practical guide for Research and Development in Higher Education,
medical statisticians. Statistics in Medicine, Proceedings of the 1995 Annual Conference of the
19:1141–1164, 2000. Higher Education and Research Development Society
[8] Cassidy, S. Exploring individual differences as of Australasia, 18:308–313, 1995.
determining factors in student academic achievement [22] Goldberg, L. R. The development of markers for the
big-five factor structure. Psychological Assessment, 4 and Individual Differences, 20:682–686, 2010.
(1):26–42, 1992. [36] Pardos, Z. A., Baker, R. S. J. D., San Pedro, M. O.
[23] Gray, G., McGuinness, C., and Owende, P. An C. A., Gowda, S. M., and Gowda, S. M. Affective
investigation of psychometric measures for modelling states and state test. Investigating how affect
academic performance in tertiary education. In throughout the school year predicts end of year
D’Mello, S. K., Calvo, R. A., and Olney, A., editors, learning. In Proceedings of the Third International
Sixth International Conference on Educational Data Conference on Learning Analytics and Knowledge
Mining, pages 240–243, Memphis, Tennessee, July 6-9 (LAK ’13), pages 117–124, Leuven, Belgium, April
2013. 2013. ACM.
[24] Gray, G., McGuinness, C., and Owende, P. An [37] Pintrich, P., Smith, D., Garcia, T., and McKeachie,
application of classification models to predict learner W. A manual for the use of the motivated strategies
progression in tertiary education. 4th IEEE for learning questionnaire. Technical Report 91-B-004,
International Advanced Computing Conference, pages The Regents of the University of Michigan, 1991.
549–554, February 2014. [38] Poropat, A. E. A meta-analysis of the five-factor
[25] Gray, G., McGuinness, C., Owende, P., and Carthy, A. model or personality and academic performance.
A review of psychometric data analysis and Psychological Bulletin, 135(2):322–338, 2009.
applications in modelling of academic achievement in [39] Robbins, S. B., Lauver, K., Le, H., Davis, D., and
tertiary education. Journal of Learning Analytics, Langley, R. Do psychosocial and study skill factors
1(1):75–106, 2014. predict college outcomes? A meta analysis.
[26] Kang, Y. and Harring, J. R. Reexamining the impact Psychological Bulletin, 130 (2):261–288, 2004.
of non-normality in two-group comparison procedures. [40] Sachin, B. R. and Vijay, S. M. A survey and future
Journal of Experimental Education, in press. vision of data mining in educational field. In Advanced
[27] Kappe, R. and van der Flier, H. Using multiple and Computing Communication Technologies (ACCT),
specific criteria to assess the predictive validity of the 2012 Second International Conference on, pages
big five personality factors on academic performance. 96–100, Jan 2012.
Journal of Research in Personality, 44:142–145, 2010. [41] Shute, V. and Ventura, M. Stealth Assessment.
[28] Kaufman, J. C., Agars, M. D., and Lopez-Wagner, Measuring and Supporting Learning in Video Games.
M. C. The role of personality and motivation in The John D. and Catherine T. MacArthur Foundation
predicting early college academic success in Reports on Digital Media and Learning. MIT Press,
non-traditional students at a hispanic-serving 2013.
institution. Learning and Individual Differences, [42] Siemens, G. Learning analytics. Envisioning a research
18:492 – 496, 2008. discipline and a domain of practice. Proceedings of the
[29] Knight, S., Buckingham Shum, S., and Littleton, K. 2nd International Conference on Learning Analytics
Epistemology, pedagogy, assessment and learning and Knowledge, pages 4–8, 2012.
analytics. In Third Conference on Learning Analytics [43] Siemens, G. and Baker, R. S. J. D. Learning analytics
and Knowledge (LAK 2013), pages 75–84, Leuven, and educational data mining. Towards communication
Belgium, April 2013. and collaboration. Proceedings of the 2nd
[30] Komarraju, M., Karau, S. J., Schmeck, R. R., and International Conference on Learning Analytics and
Avdic, A. The big five personality traits, learning Knowledge, pages 252–254, 2012.
styles, and academic achievement. Personality and [44] Swanberg, A. B. and Martinsen, Ø. L. Personality,
Individual Differences, 51:472–477, 2011. approaches to learning and achievement. Educational
[31] Komarraju, M. and Nadler, D. Self-efficacy and Psychology, 30(1):75–88, 2010.
academic achievement. Why do implicit beliefs, goals, [45] Tavakol, M. and Dennick, R. Making sense of
and effort regulation matter? Learning and Individual Cronbach’s alpha. International Journal of Medical
Differences, 25:67–72, 2013. Education, 2:53–55, 2011.
[32] Marton, F. and Säljö, R. Approaches to learning. In [46] Tempelaar, D. T., Cuypers, H., van de Vrie, E., Heck,
Marton, F., Hounsell, D., and Entwhistle, N., editors, A., and van der Kooij, H. Formative assessment and
The Experience of Learning, pages 36–58. Edinburgh: learning analytics. In Proceedings of the Third
University of Edinburgh, Centre for Teaching, International Conference on Learning Analytics and
Learning and Assessment, 2005. Knowledge (LAK ’13), pages 205–209, New York, NY,
[33] Mislevy, R. J., Behrens, J. T., and Dicerbo, K. E. USA, 2013. ACM.
Design and discovery in educational assessment: [47] Tishman, S., Jay, E., and Perkins, D. N. Teaching
Evidence-centered design, psychometrics, and thinking disposition: From transmission to
educational data mining. Journal of Educational Data enculturation. Theory into Practice, 32:147–153, 1993.
Mining, 4 (1):11–48, 2012. [48] Vancouver, J. B. and Kendall, L. N. When self-efficacy
[34] Moran, M. A. and Crowley, M. J. The leaving negatively relates to motivation and performance in a
certificate and first year university performance. learning context. Journal of Applied Psychology,
Journal of Statistical and Social Enquiry in Ireland, 91(5):1146–53, 2006.
XXIV, part 1:231–266, 1979. [49] Volet, S. E. Cognitive and affective variables in
[35] Ning, H. K. and Downing, K. The reciprocal academic learning: the significance of direction and
relationship between motivation and self-regulation: A effort in students’ goals. Learning and Instruction,
longitudinal study on academic performance. Learning 7(3):235–254, 1996.