=Paper= {{Paper |id=Vol-1183/ncfpal_paper06 |storemode=property |title=Non-cognitive Factors of Learning as Predictors of Academic Performance in Tertiary Education |pdfUrl=https://ceur-ws.org/Vol-1183/ncfpal_paper06.pdf |volume=Vol-1183 |dblpUrl=https://dblp.org/rec/conf/edm/GrayMO14 }} ==Non-cognitive Factors of Learning as Predictors of Academic Performance in Tertiary Education== https://ceur-ws.org/Vol-1183/ncfpal_paper06.pdf
          Non-cognitive factors of learning as predictors of
            academic performance in tertiary education

                              Geraldine Gray, Colm McGuinness, Philip Owende
                                           Institute of Technology Blanchardstown
                                                  Blanchardstown Road North
                                                       Dublin 15, Ireland
                                                   geraldine.gray@itb.ie

ABSTRACT                                                          cation providers, particularly log data from Virtual Learn-
This paper reports on an application of classification and        ing Environments and Intelligent tutoring systems [16, 2].
regression models to identify college students at risk of fail-   Further work is needed to determine if gathering additional
ing in first year of study. Data was gathered from three          predictors of academic performance can add value to exist-
student cohorts in the academic years 2010 through 2012           ing models of learning.
(n=1207). Students were sampled from fourteen academic
courses in five disciplines, and were diverse in their aca-       Research from educational psychology has identified a range
demic backgrounds and abilities. Metrics used included non-       of non-cognitive psychometric factors that are directly or
cognitive psychometric indicators that can be assessed in the     indirectly related to academic performance in tertiary ed-
early stages after enrolment, specifically factors of personal-   ucation, particularly factors of personality, motivation, self
ity, motivation, self regulation and approaches to learning.      regulation and approaches to learning [8, 9, 35, 39, 44, 25].
Models were trained on students from the 2010 and 2011 co-        Personality based studies have focused on the Big-5 per-
horts, and tested on students from the 2012 cohort. Is was        sonality dimensions of conscientiousness, openness, extro-
found that classification models identifying students at risk     version, stability and agreeableness [9, 22, 27]. There is
of failing had good predictive accuracy (> 79%) on courses        broad agreement that conscientiousness is the best person-
that had a significant proportion of high risk students (over     ality based predictor of academic performance [44]. For ex-
30%).                                                             ample, Chamorro et al. [9] reported a correlation of r=0.37
                                                                  (p<0.01, n=158) between conscientiousness and academic
                                                                  performance. Correlations between academic performance
Keywords                                                          and openness to new ideas, feelings and imagination are
Educational data mining, learning analytics, academic per-
                                                                  weaker. Chamorro et al. [9] reported a correlation of r=0.21
formance, non cognitive factors of learning, personality, mo-
                                                                  (p<0.01, n=158) but lower correlations were reported in
tivation, learning style, learning approach, self-regulation
                                                                  other studies (see Table 1) which may be explained by vari-
                                                                  ations in assessment type. Open personalities tend to do
1.   INTRODUCTION AND LITERATURE RE-                              better when assessment methods are unconstrained by sub-
     VIEW                                                         mission rules and deadlines [27]. Studies are inconclusive on
Learning is a latent variable, typically measured as academic     the predictive validity of other personality factors [44].
performance in continuous assessment and end of term ex-
aminations [33]. Identifying predictors of academic perfor-       A meta-analysis of 109 studies analysing psychosocial and
mance has been the focus of research for many years [20,          study skill factors found two factors of motivation, namely
34], and continues as an active research topic [6, 8], indicat-   self-efficacy (90% CI [0.444,0.548]) and achievement motiva-
ing the inherent difficulty in generating models of learning      tion (90% CI [0.353, 0.424]), had the highest correlations
[29, 46]. More recently, the application of data mining to        with academic performance [39]. Distinguishing between
educational settings is emerging as an evolving and grow-         learning (intrinsic) achievement and performance (extrin-
ing research discipline [40, 43]. Educational Data Mining         sic) achievement goals, Eppler and Harju [19] found learn-
(EDM) aims to better understand students and how they             ing goals (r=0.3, p<0.001, n=212) were more strongly cor-
learn through the use of data analytics on educational data       related with academic performance than performance goals
[42, 10]. Much of the published work to date is based on ever-    (r=0.13, p> 0.05, n=212). Covington [13] however argues
increasing volumes of data systematically gathered by edu-        that setting goals in itself is not enough, as ability to self-
                                                                  regulate learning can be the difference between achieving, or
                                                                  not achieving, goals set. Self-regulated learning is recognised
                                                                  as a complex concept to define as it overlaps with a num-
                                                                  ber of other concepts including personality, self-efficacy and
                                                                  goal setting [4]. Ning and Downing [35] reported high corre-
                                                                  lations between self regulation and academic performance,
                                                                  specifically self-testing (r=0.48, p<0.001) and monitoring
                                                                  understanding (r= 0.42, p<0.001). On the other hand, Ko-
                                                                  marraju and Nadler [31] found effort management, includ-
ing persistence, had higher correlation with academic perfor-      of factors of motivation. Komarraju et al. [30] predicted
mance (r=0.39, p<0.01) than other factors of self-regulation       GPA (R2 =0.15) from variables of personality and learn-
and found that self-regulation (monitoring and evaluating          ing approach, while Bidjerano & Dai [4] had similar results
learning) did not account for any additional variance in aca-      (R2 =0.11) with factors of personality and self-regulation.
demic performance over and above self-efficacy, but study
effort and study time did account for additional variance.         Linear regression assumes constant variance and linearity
                                                                   between independent and dependent attributes. There is
Research into approaches to learning has its foundations in        evidence to suggest variance is not constant for some non-
the work of Marton & Säljö [32] who classified learners as       cognitive factors. For example, De Feyter et al. [14] found
shallow or deep. Deep learners aim to understand content,          low levels of self-efficacy had a positive, direct effect on aca-
while shallow learners aim to memorise content regardless          demic performance for neurotic students, and for stable stu-
of their level of understanding. Later studies added strate-       dents, average or higher levels of self-efficacy only had a
gic learners [18, pg. 19], whose priority is to do well, and       direct effect on academic performance. In addition, Van-
will adopt either a shallow or deep learning approach de-          couver & Kendall [48] found evidence that high levels of
pending on the requisites for academic success. Comparing          self-efficacy can lead to overconfidence regarding exam pre-
the influence of approaches to learning on academic perfor-        paredness, which in turn can have a negative impact on aca-
mance, Chamorro et al [9] reported a deep learning approach        demic performance. Similarly, Poropat [38] cites evidence
(r=0.33, p<0.01) had higher correlations with academic per-        of non-linear relationships between factors of personality
formance than a strategic learning approach (r=0.18, p<0.05).      and academic performance, including conscientiousness and
Cassidy [8] on the other hand found correlations with a deep       openness. It is therefore pertinent to ask if data mining’s
learning approach (r=0.31, p<0.01) were marginally lower           empirical modelling approach is more appropriate for models
than with a strategic learning approach (r=0.32, p<0.01).          based on non-cognitive factors of learning.
Differences found have been explained, in part, by assess-
ment type [49], highlighting the importance of assessment          A growing number of educational data mining studies have
design in encouraging appropriate learning strategies.             investigated the role of non-cognitive factors in models of
                                                                   learning [6, 41, 36]. Bergin [3] cited an accuracy of 82% us-
Knight, Buckingham Shum and Littleton argued learning              ing an ensemble model based on prior academic achievement,
measurement should go beyond measures of academic per-             self-efficacy and study hours, but due to the small sample
formance [29], promoting greater focus on learning envi-           size (n=58) could not draw reliable conclusions from the
ronment and encouragement of malleable, effective learn-           findings. The class label distinguished strong (grade>55%)
ing dispositions. Disposition relates to a tendency to be-         versus weak (grade<55%) academic performance based on
have in a certain way [6]. An effective learning disposition       end of term results in a single module. Gray et al. [23] cited
describes attributes and behaviour characteristic of a good        similar accuracies (81%, n=350) with a Support Vector Ma-
learner [6]. A range of non-cognitive psychometric factors         chine model using cognitive and non cognitive attributes to
have been associated with an effective learning disposition        distinguish high risk (GPA<2.0) from low risk (GPA≥2.5)
such as a deep learning approach, ability to self-regulate, set-   students based on first year GPA. Model accuracy was con-
ting learning goals, persistence, conscientiousness and sub-       tingent on modelling younger students (under 21) and older
factors of openness, namely intellectual curiosity, creativity     students (over 21) separately.
and open-mindednesss [6, 29, 47]. A lack of correlation be-
tween such non-cognitive factors and academic performance          The focus of this study was to investigate if non-cognitive
is in itself insightful, suggesting assessment design that fails   factors of learning, measured during first year student in-
to reward important learning dispositions. It has been ar-         duction, were predictive of academic performance at the
gued that effective learning dispositions are as important as      end of first year of study. We evaluated both regression
discipline specific knowledge [6, 29].                             models of GPA and classification models that predicted first
                                                                   year students at risk of failing. Participants were from a
Statistical models have dominated data analysis in educa-          diverse student population that included mature students,
tional psychology [15], particularly correlation and regres-       students with disabilities, and students from disadvantaged
sion [25]. Relatively high levels of accuracy were reported        socio-economic backgrounds.
in regression models of academic performance that included
cognitive and non-cognitive factors. For example, Chamorro-        2.    METHODOLOGY
Premuzic et al [9] reported a coefficient of determination         The following sections report on study participants and the
(R2 ) of 0.4 when predicting 2nd year GPA (based on essay          study dataset. Data analysis was conducted following the
type examinations) in a regression model that included prior       CRoss Industry Standard for Data Mining (CRISP-DM) us-
academic ability, personality factors and a deep learning ap-      ing RapidMiner V5.3 and R V3.0.2.
proach. Robbins [39] reported similar results (R2 =0.34) in
a meta-analysis of models of cognitive ability, motivation         2.1    Description of the study participants
factors and socio-economic status. Models of non-standard
                                                                   The participants were first year students at the Institute of
students were less accurate, for example Swanberg & Mar-
                                                                   Technology Blanchardstown (ITB), Ireland. The admission
tinsen [44] reported R2 =0.21 in models of older students
                                                                   policy at ITB supports the integration of a diverse student
(age: m=24.8) based on prior academic performance, per-
                                                                   population in terms of age, disability and socio-economic
sonality, learning strategy, age and gender. Lower accuracies
                                                                   background. Each September 2010 to 2012, all full-time,
were also reported in studies not including cognitive ability.
                                                                   first-year students at ITB were invited to participate in the
Robbins [39] reported R2 =0.27 in a meta-analysis of models
                                                                   study by completing an online questionnaire administered
                         Table 1: Correlations with Academic Performance in Tertiary Education
 Study    N    age             AP                     Temperament                  Motivation                    Learning Approach               Learning Strategy
                                                    Concient- Open    Self Effi-     Intrinsic    Extrinsic   Deep    Shallow Strategic    Self Reg-    Study   Study
                                                    ious              cacy           Goal         Goal                                     ulation      Time    Effort
 [4]      217   m=22            self reported GPA                                                                                                       0.33** 0.0.23**
 [8]      97    m=23.5          GPA                                   0.397***                                0.398**   -0.013   0.316**
 [9]      158   18-21           GPA                 0.37**   0.21**                                           0.398*    -0.15    0.18*
 [17]     146   17-52           GPA                 0.21     0.06                                             0.097     -0.054   0.153
 [19]     212   m=19.2          GPA                                                  0.3***       0.13
 [27]     133   18-22           GPA                 0.46**   -0.08
 [30]     308   18-24           self reported GPA   0.29**   0.13*
 [31]     257   m=20.5          GPA                                   0.3**                                                                0.14*      0.31**   0.39**
 [35]     581   20.48           GPA                                                                                                                            0.0.24**
 [39]     meta analysis, 18+    GPA                                   0.496                   0.179
 [44]     687   m=24.5          single exam                                                                   0.16      -0.25
 *p < .05, **p < .01, ***p < 0.001




during first year student induction. A total of 1,376 (52%)
full-time, first year students completed the online question-
naire. Eliminating students who did not give permission to
be included in the study (35) and invalid data (134) resulted
in 45% of first year full time students participating in the
study (n=1207).

Participants ranged in age from 18 to 60, with an average age
of 23.27; of which, 355 (29%) were mature students (over 23),
713 (59%) were male and 494 (41%) were female. There were
32 (3%) participants registered with a disability. Students
were enrolled on fourteen courses across five academic dis-
ciplines, Business (n=402, 33%), Humanities (n=353, 29%),
Computing (n=239, 20%), Engineering (n=172, 14%) and
Horticulture (n=41, 3%).

Academic performance was measured as GPA, an aggre-
gate score of between 10 and 12 first year modules, range
0 to 4, and was calculated on first exam sitting only. The
GPA distribution (profiled sample) was compared with the
GPA distribution of the full cohort of students for that                                Figure 1: Notched box plots for GPA by course
year (reference sample) using a Kolmogorov-Smirnov non-
parametric test. The recorded differences in the distribution
for 2010 (D=0.032, p=0.93), 2011 (D=0.036, p=0.90) and                              the context. Where two questions were similar on the pub-
2012 (D=0.042, p=0.69) were not statistically significant.                          lished instrument, only one was included. This choice was
The distribution of GPA was also similar across the three                           made to reduce the overall size of the questionnaire, despite
years of study. The largest difference was between the 2010                         the likely negative impact on internal reliability statistics.
and 2012 profiled samples (D=0.063, p=0.37) and was not                             Questionnaire validity and internal reliability were assessed
significant. To pass overall, a student must achieve a GPA                          using a paper-based questionnaire that included both the re-
≥ 2.0 and pass each first year module. 89% of students with                         vised wording of questions used on the online questionnaire
GPA > 2.5 passed all modules indication a low risk group                            (reduced scale), and the original questions from the pub-
that can progress to year two. 84% of students with a GPA                           lished instruments (original scale). The paper questionnaire
< 2 failed three or more modules, indicating a high risk                            was administered during scheduled first year lectures across
group falling well short of progression requirements. Of the                        all academic disciplines. Pearson correlations between scores
students in GPA range [2.0, 2.49], 39% passed all modules,                          calculated from the reduced scale, and scores calculated from
36% failed one module, 18% failed two modules, and 7 %                              the original scale, were high for all factors (>=0.9) except
failed more than two modules. This is a less homogenous                             intrinsic goal orientation and study time and environment,
group in terms of academic profile, but could be generally                          confirming the validity of the study instrument for those fac-
regarded as borderline, either progressing on low grades or                         tors. Internal reliability was assessed using Cronbach’s al-
required to repeat one or two modules in the repeat exam                            pha. All factors had acceptable reliability (>0.7)1 given the
sittings. Figure 1 and Table 2 illustrate GPA distribution                          small number of questions per scale (between 3 and 6), with
by course.                                                                          the exception again of intrinsic goal orientation and study
                                                                                    time and environment. Learner modality data (Visual, Au-
                                                                                    ditory, Kinaesthetic (VAK) [21]) was based an instrument
2.2      The Study Dataset                                                          developed by the National Leaning Network Assessment Ser-
Table 3 lists the psychometric factors included in the dataset,                     vices (NLN) (www.nln.ie).
collected using an online questionnaire developed for the
study (www.howilearn.ie). With the exception of learning                            1
                                                                                     While generally a Cronbach alpha of > 0.8 indicates good
modality, questions were taken from openly available, val-                          internal consistency, Cronbach alpha closer to 0.7 can be
idated instruments, with some changes to wording to suit                            regarded as acceptable for scales with fewer items [12, 45].
            Table 2: Academic profile by course                        Table 3: Study factors, mean and standard deviation
    Course Name                n    GPA∗       high   border-   low     Category & Instrument        Study Factor
                                               risk   line      risk    Personality: IPIP scales     Conscientiousness (5.9±1.5)
    all participants           1207 2.1±1.1    28%    16%       46%     (ipip.ori.org) [22]          Openness (6.1±1.3)
    Computing (IT)             137 2.0±1.2     47%    11%       42%     Motivation:                  Intrinsic Goal Orientation (7.1±1.4)
    Creative Digital Media     102 2.6±1.0     20%    8%        72%     MSLQ [37]                    Self Efficacy (6.9±1.4)
    Engineering common         73    1.1±0.9   79%    8%        13%                                  Extrinsic Goal Orientation (7.8±1.4)
    Electronic & computer eng. 52    1.8±1.2   52%    10%       38%     Learning approach:           Deep Learner (5.4±2.9)
    Mechatronics               27    1.6±1.2   63%    7%        30%     R-SPQ-2F [5]                 Shallow Learner (1.3±1.9)
    Sustainable Electrical &   20    2.8±1.1   30%    5%        65%                                  Strategic Learner (3.4±2.5)
    Control Technology                                                  Self-regulation:             Self Regulation (5.9±1.4)
    Horticulture               41    2.4±1.1   27%    2%        71%     MSLQ [37]                    Study Effort (5.9±1.8)
    Business General           183 1.7±1.1     56%    15%       29%                                  Study Time & Environment (6.2±2.3)
    Business with IT           60    1.8±1.2   46%    22%       32%     Learner modality:            Visual (7.2±2.1)
    Business International     64    2.2±1.1   41%    14%       45%     NLN profiler                 Auditory(3.3±2.2)
    Sports Management          95    2.3±0.9   22%    24%       54%                                  Kinaesthetic(4.5±2.4)
    Applied Social Care        146 2.5±0.7     15%    16%       69%     Other factors:               Preference for group work (6.5±3.4)
    Early Childcare            80    2.4±0.6   20%    28%       52%                                  Age (23.27±7.3)
    Social & Community De-     127 2.2±0.9     30%    27%       43%                                  Male=713 (59%), Female=494 (41%)
    velopment                                                           Note: All ranges are 0 to 10 apart from age.
    ∗
      GPA mean and standard deviation.




Prior knowledge of the student available to the college at             Table 4: Participant profile based on prior knowl-
registration, namely age, gender and prior academic perfor-            edge, means and standard deviation
mance, was also available to the study. Access to full time
                                                                        Course Name                 n     CAO       age     %age   Z∗
college courses in Ireland is based on academic achievement                                               points            male
in the Leaving Certificate, a set of state exams at the end of          Computing (IT)               137  232±67      24±8  91%    9
secondary school. College places are offered based on CAO2              Creative Digital Media       102  305±79      23±7  68%    7
points, an aggregate score of grades achieved in a student’s            Engineering common           73   220±61      20±3  92%    8
                                                                        Electronic & computer eng    52   232±53      22±7  92%    3
top six leaving certificate subjects, range 0 to 600. Table 4           Mechatronics                 27   238±46      21±3  85%    1
summarises participant profile by course.                               Sustainable Electrical &     20   199±97      27±7  95%    0
                                                                        Control Technology
                                                                        Horticulture                 41    273±66 28±11 8%         4
3.      RESULTS                                                         Business General             183 256±57 21±5 54%           10
                                                                        Business with IT             60    229±75 22±5 60%         6
Correlation and regression were used to analyse relationships           Business International       64    248±51 21±5 24%         6
between study factors and GPA. Subsequent analysis used                 Sports Management            95    306±86 23±6 84%         8
classification techniques to identify students at risk of failing.      Applied Social Care          146 259±84 28±9 32%           10
                                                                        Early Childcare              80    308±78 22±5 6%          7
Unless otherwise stated, models are based age, gender and               Social & Community De-       127 266±78 25±8 29%           9
non-cognitive factors of learning as listed in Table 3.                 velopment
                                                                        ∗
                                                                          Number of study factors differing significantly from a
All non-cognitive factors of learning failed the Shapiro−Wilk           normal distribution (p<<0.001).
normality test which is common in data relating to educa-
tion and psychology [26]. However factors of personality
were normally distributed within each discipline except for
business. Intrinsic motivation and study effort were also nor-         95% confidence intervals using the bias corrected and accel-
mally distributed for engineering and computing students.              erated method [7] on 1999 bootstrap iterations.
There were further improvements when analysing subgroups
by academic course. Factors of personality, self regulation            Bootstrap correlation coefficients are given in Table 5. With
and intrinsic motivation were normally distributed for all             the exception of learning modality, all non-cognitive factors
courses. With the exception of approaches to learning, learner         were significantly correlated with GPA. The highest corre-
modality, preference for group work and GPA, other factors             lations with GPA were found for approaches to learning,
were normally distributed for most courses. Table 4 illus-             specifically deep learning approach (r=0.23, bootstrap 95%
trates the number of attributes that differed significantly            CI[0.18, 0.29]), and study effort (r=0.19, bootstrap 95% CI
from a normal distribution by course. Larger groups were               [0.13, 0.24] ). Age also had a relatively high correlation
more likely to fail tests of normality.                                with GPA (r=0.25, bootstrap 95% CI [0.19, 0.3]). A shallow
                                                                       learning approach (r=-0.15, bootstrap 95% CI[-0.21, -0.09])
                                                                       and preference for group work (r=-0.076, bootstrap 95% CI
3.1      Correlations with Academic Performance                        [-0.14, -0.02]) were negatively correlated with GPA. Open-
Correlations between study factors and GPA were assessed
                                                                       ness had one of the weakest significant correlations with
using Pearson’s product-moment correlation coefficient (PP-
                                                                       GPA (r=0.08, bootstrap 95% CI [0.03, 0.14]). Correlations
MCC). As some attributes violated the assumption of nor-
                                                                       were comparable with other studies that included a diverse
mal distribution, significance was verified with bootstrapped
                                                                       student population [4, 9, 28] with the exception of self ef-
2
 CAO refers to the Central Applications Office with respon-            ficacy (r=0.12, bootstrap 95% CI [0.06, 0.17])) which was
sibility for processing applications for undergraduate courses         lower than expected. This may be reflective of the low entry
in the Higher Education Institutes in Ireland.                         requirements for some courses.
3.2     Regression models                                        factors were most predictive of GPA. Approaches to learn-
Regression models predicting GPA from non-cognitive vari-        ing and age were significant for models of all participants,
ables were run for the full dataset and for subgroups by         computing students and engineering students, but motiva-
disciplines and by course. The coefficient of determination      tion and learning strategy were more significant for Busi-
(R2 ) is reported to facilitate comparison with other stud-      ness with IT. Factors of motivation, learning strategy and
ies. However R2 is influenced by the variability of the un-      approaches to learning were also relevant to models in the
derlying independent variables. Consequently Achen [1, pg        humanities courses. All regression models improved when
58-61] argued that prediction error is a more appropriate        prior academic performance was included in the model. The
fitness measure for psychometric data. Therefore absolute        most significant increase was for sports management, R2 in-
error mean and standard deviation is also reported.              creased from 0.16 to 0.30. Business with IT and applied
                                                                 social care also increased by more than 0.1. For all other
A regression model for all participants (R2 = 0.14) was com-     regression models, R2 increased by between 0.05 and 0.09
parable with other reported models of non-cognitive factors
[4, 30]. However when modelling students by discipline and
by course, there were significant differences in model per-
                                                                 3.3      Classification models
formance. A chow test [11] comparing the residual error in       Classification models were generated using four classification
a regression model of all participants (full model) with the     algorithms, namely Naı̈ve Bayes (NB), Decision Tree (DT),
residual errors of models by discipline (restricted models)      Support Vector Machine (SVM), and k-Nearest Neighbour
showed significant differences between the full and restricted   (k-NN). A binary class label was used based on end of year
models (F(17,1098)=22.02, p=0). There was also significant       GPA score, range [0-4]. The two classes were: high risk stu-
differences between models based on a particular discipline      dents (GPA<2, n=459); and low risk students (GPA≥2.5,
(full model) and models of courses within that discipline        n=558) giving a dataset of n=1017. Borderline students (2.0
(restricted models). In computing, significant differences       ≤ GPA ≤ 2.49) have not been considered to date. Gray et
of F(17,205)=2.22 (p=0.005) were found between the full          al. [24] found that cross validation over-estimated model
model and the two restricted models. Within engineering,         accuracy compared to models applied to a different student
a model combining mechatronics with electronic & comput-         cohort. Therefore models were trained on participants from
ing engineering was not significantly different from a model     2010 and 2011 and tested on participants from 2012. All
of those two courses individually (F(17,79)=0.58, p=0.89),       datasets were balanced by over sampling the minority class,
but including either common entry students and/or sustain-       and attributes were scaled to have a mean of 0 and standard
able electrical & control technology resulted in significant     deviation of 1. Significant attributes were identified by find-
differences between the full and restricted models. Sustain-     ing the optimal threshold for selecting attributes by weight.
able electrical & control technology was therefore excluded      Attributes were weighted based on uncertainty4 for DT, k-
from further consideration because of the small sample size      NN and Naı̈ve Bayes models, and based on SVM weights
(n=20). Significant differences were also found in models of     for SVM models. Table 6 shows the accuracies achieved and
each of the three humanities courses compared with those         factors used in each model.
courses combined (F(17,302)=2.22, p=0.004). The least sig-
nificant differences were found in models of business students   k-NN had the highest accuracy for models of all students
provided sport management was excluded (F(17, 307)=1.95,         (66%). Accuracies for DT (61%), SVM (62%) and Naı̈ve
p=0.015). Adding sports management further increased the         Bayes (62%) were similar. The most significance attributes
difference in model residual errors (F(17,334)=8.36, p=0).       by weight were age, deep learning approach and study effort.
Table 6 gives model details by course and factors used in        Including factors of prior academic performance improved
each model. Electronic & computer engineering students           model accuracy marginally to 72%.
and mechatronic students were combined.
                                                                 Model accuracy improved when modelling each course sepa-
In general, models based on technical courses had a higher       rately. In general, k-NN had either the highest accuracy,
R2 than models for non technical courses. For example, en-       or close to the highest accuracy, for all groups with the
gineering courses, computing (IT) and business with IT all       exception of two courses, international business and early
had R2 > 0.3. Absolute error for these courses was in the        childcare & education. Naı̈ve Bayes had the highest accu-
range [0.63,0.8]. The difference between the highest abso-       racy for both those courses and their attributes of signif-
lute error (m=0.8, s=0.563 ) and the lowest absolute error       icance were normally distributed. Five courses had accu-
(m=0.63, s=0.54) was not significant (t(15)=1.74, p=0.1).        racies marginally higher than the model for all students,
Regression results for International Business was also rel-      social & community development (70%), applied social care
atively good (R2 =0.27). For the remaining non-technical         (68%), early childcare & education (69%), creative digital
disciplines R2 was lower (range [0.12,0.17]) but the absolute    media (67%) and sports management (70%). As illustrated
error was more varied. Early childcare had the lowest abso-      in Table 1, these courses were distinguished by a high av-
lute error (m=0.37, s=0.34) while general business had the       erage GPA and a low failure rate. Consequently, patterns
highest absolute error (m=0.9, s=0.53). The difference was       identifying high risk students may be under represented in
significant (t(15)=10.3, p<0.001) and may be explained by        these groups. Accuracies for other courses were significantly
the greater distribution of GPA scores in general business.      higher (≥ 79%). For example the difference between sports
                                                                 management (70%) and the next highest accuracy (Engi-
There was little agreement across models on which study          neering other, 79%) was significant (Z=5.86, p<0.001)5 .
                                                                 4
                                                                     Symmetrical uncertainty with respect to the class label.
3                                                                5
    m=mean, s=standard deviation                                     Accuracy comparisons were based on the mean accuracy of
                               Table 5: Bootstrap correlations of non-cognitive factors with GPA
 Study Factors:        Temperament          Motivation          Learning Approach        Learning Strategy          Other             Modality
                       C      O         SE    IM       EM     De     Sh        St        SR     ST     StE   Group Age      Gen V        A     K
 Correlation   with    0.15   0.08      0.12  0.15     0.12   0.23   -0.15     -0.16     0.13   0.1    0.19  -0.08    0.25 0.09 0.06 0.02 0.06
 GPA (n=1207):         ***    **        ***   ***      ***    ***    ***       ***       ***    **     ***   **       ***   **
 *p < .05, **p < .01, ***p < 0.001; C:Conscientiousness; O:Openness; SE:Self Efficacy; IM:Intrinsic Goal Orientation; EM:Extrinsic Goal
 Orientation; De:Deep Learner; Sh: Shallow Learner; St: Strategic Learner; SR: Self Regulation; ST:Study Time; StE: Study Effort; Group:Likes
 to work in groups; Gen=Gender; V:Visual Learner; A:Auditory Learner; K:Kinaesthetic Learner.




           Table 6: Regression and classification models by discipline, using non-cognitive factors only
 Regression models:                                          Temperament      Motivation      Approach          Strategy           Other         Modality
 Course                     N     Absolute error     R2      C      O        SE   IM   EM De     Sh    St     SR   ST    StE G       age In    V   A      K
 All                        1207    0.83±0.56        0.125   +      +        +    +    *** **** *** ***       **         *** *** **** *        +
 Computing                  137     0.8 ±0.56        0.34    +      +             +              **    +                       *     ****      *
 Creative Dig Media         103     0.68±0.58        0.11                    +         +    **** **** **** +       +     +     +          ***
 Eng Common Entry           73      0.67±0.53        0.34           *                  +    +    +     +           +                 *** ***   +          +
 Engineering other          99       0.72±0.5        0.43           +        *** +     +    +    **    **     **   *     +     *     ****          +
 Horticulture               41      0.63±0.54        0.34    +      +        +    +    +    **** **** **** +       +     +     +     *    ****     **
 General Business           183      0.9±0.53        0.13    +               +         +    +    +     +                 +     +     +    **              +
 Business With IT           60      0.67±0.52        0.48    +                    **   **   *                 +          *** **      **   **              **
 International Business     64       0.78±0.5        0.27           ***      +    +         *                                  +     *    ****            +
 Sports Management          95      0.64±0.53        0.16    +      +             +              **           +    ***
 Applied Social Care        146       0.5±0.5        0.08    +      +        +    +    +    +    *     *      +    +     +     +          ****
 Early childcare            80      0.37±0.34        0.17                         +    +    +    *     **     +    +     +           +         +
 Social & Comm Dev          127      0.63±0.5        0.12                         +    +    +    +                       **    +                          +
 Classification models:                                      Temperament      Motivation      Approach          Strategy           Other         Modality
 Course                     N    Learner Accuracy Kappa C           O        SE   IM   EM De     Sh    St     SR   ST    StE G       age gen   V   A      K
 All                        1017 11-NN     66%       0.33    X      X        X    X         X    X     X      X          X           X    X
 Computing                  122 SVM        81%       0.62           X             X    X         X                       X     X               X          X
 Creative Dig Media         94   2-NN      67%       0.35           X             X    X    X          X                 X                         X
 Eng Common Entry           73   SVM       94%       0.88    X      X                       X    X     X                 X           X         X          X
 Engineering other          72   DT        79%       0.58                                   X                            X           X
 Horticulture               40   7-NN      86%       0.71    X      X        X    X    X         X     X                 X                         X      X
 Business General           156 5-NN       85%       0.69                                                          X     X
 Business With IT           47   7-NN      83%       0.67           X        X    X              X     X      X          X           X                    X
 International Business     55   NB        80%       0.6            X        X
 Sports Mgmt                72   SVM       70%       0.39    X                         X         X            X    X
 Applied Social Care        122 4-NN       68%       0.37    X               X    X         X                 X    X                                      X
 Early childcare            58   NB        69%       0.38    X                         X         X                                   X
 Community dev              93   2-NN      70%       0.39    X                                   X                                   X
 Significant model coefficients: +p > .05, *p < .05, **p < .01, ***p < 0.001, ****p << 0.001; X: factors included in the classification model
 C:Conscientiousness; O:Openness; SE:Self Efficacy; IM:Intrinsic Goal Orientation; EM:Extrinsic Goal Orientation; De:Deep Learner; Sh: Shallow
 Learner; St: Strategic Learner; SR: Self Regulation; ST:Study Time; StE: Study Effort; G:Likes to work in groups; IN:Regression model intercept;
 gen=Gender; V:Visual Learner; A:Auditory Learner; K:Kinaesthetic Learner; Engineering others: Mechatronics and Electrical & Computer Engineering.




It could be argued that the smaller sample size of course                             regression model and classification model of all students.
groups over estimated model accuracy as smaller samples                               Extrinsic motivation, preference for working alone and self
may under represent the complexity of patterns predictive                             regulation were also significant in the regression model, while
of academic achievement. Therefore 30 samples randomly                                all factors except extrinsic motivation, preference for work-
generated from the full dataset (n=100) were also mod-                                ing alone and study time were significant in a classification
elled. Model accuracy for the random samples was nor-                                 model of all students. Models of individual courses also dif-
mally distributed, with mean=63.12% (s=11%), which was                                fered in the range of factors used. The lack of consensus
marginally lower than the model of all students (Z=2.68,                              in identification of significant factors may be explained by
p=0.017).                                                                             an overlap in the constructs measured by each [24]. Open-
                                                                                      ness appeared frequently in both classification and regres-
There was little agreement across models on which study                               sion models despite its relatively low correlation with GPA.
factors were most predictive of high risk and low risk stu-
dents. Conscientiousness, study effort and a shallow learning                         In general, regression models for students in technical dis-
approach were used most frequently, followed by openness,                             ciplines, such as engineering, computing and business with
intrinsic motivation and age. There was no significant im-                            IT, had a higher coefficient of determination (R2 ) than mod-
provement in model accuracy when prior academic perfor-                               els of non technical disciplines. However the coefficient of
mance was included in each model. For example, the largest                            determination did not reflect prediction error, highlighting
increase in accuracy was from 79% to 82% in a model of                                the underlying variability in independent variables. For ex-
Engineering students.                                                                 ample, early childcare (R2 =0.17) and sports management
                                                                                      (R2 =0.16) had the same R2 , but sports management had a
                                                                                      higher absolute error (0.64±0.53) than early childcare (0.37
4.    CONCLUSIONS                                                                     ± 0.34). The difference was significant (t(15)=3.996, p=0.001).
Results from this study suggest that models of academic per-                          Prediction error was reflective of the GPA distribution for
formance, based on non-cognitive psychometric factors mea-                            each course regardless of discipline.
sured during first year student induction, can achieve good
predictive accuracy, particularly when individual courses are                         Classification models that distinguished between high and
modelled separately. A deep learning approach, study effort                           low risk students based on GPA had good accuracy for both
and age had the highest correlations with GPA across all                              technical and non technical disciplines, particularly for courses
disciplines. These factors were also significant in both the                          with a significant proportion (>30%) of high risk students.
100 bootstrap samples from each group.                                                As with regression, models of individual courses outper-
formed both models of the full dataset and models of random            in higher education. Studies in Higher Education,
samples taken from the full dataset. This would suggest                37(7):1–18, 2011.
models trained for specific courses can outperform models          [9] Chamorro-Premuzic, T. and Furnham, A. Personality,
generalising patterns for all students. k-NN, a non-linear             intelligence and approaches to learning as predictors of
classification algorithm, gave optimal or near optimal ac-             academic performance. Personality and Individual
curacies for most course groups. This may be reflective of             Differences, 44:1596–1603, 2008.
non-linear patterns in the dataset.                               [10] Chatti, M. A., Dychhoff, A. L., Schroeder, U., and
                                                                       Thüs, H. A reference model for learning analytics.
Including a cognitive factor of prior academic performance             International Journal of Technology Enhanced
did not improve the accuracy of classification models sig-             Learning. Special Issue on State of the Art in TEL,
nificantly. On the other hand, Gray et al. [23] reported               pages 318–331, 2012.
that predictive accuracy of models based on cognitive fac-        [11] Chow, G. C. Tests of equality between sets of
tors only (prior academic performance) increased marginally            coefficients in two linear regressions. Econometrica,
when non-cognitive factors were included in the model. This            28(3):591–605, 1960.
would suggest a high overlap in constructs captured by both       [12] Cooper, A. J., Smillie, L. D., and Corr, P. J. A
cognitive and non-cognitive factors of learning.                       confirmatory factor analysis of the mini-IPIP
                                                                       five-factor model personality scale. Personality and
Model accuracies are based on a heuristic search of attribute          Individual Differences, 48(5):688–691, 2010.
subsets. A more exhaustive search is needed to verify opti-
                                                                  [13] Covington, M. V. Goal theory, motivation, and school
mal attribute subsets. Further work is also required to inves-
                                                                       achievement: An integrative review. Annual Review of
tigate principal components amongst non-cognitive factors.
                                                                       Psychology, 51:171–200, 2000.
In addition, results are based on full time students in a tra-
ditional classroom setting at one college. Further work is        [14] De Feyter, T., Caers, R., Vigna, C., and Berings, D.
needed to determine if these results generalise to students            Unraveling the impact of the big five personality traits
in other colleges, and other delivery modes.                           on academic performance. The moderating and
                                                                       mediating effects of self-efficacy and academic
                                                                       motivation. Learning and Individual Differences,
5.   ACKNOWLEDGMENTS                                                   22:439–448, 2012.
The authors would like to thank Institute of Technology           [15] Dekker, G., Pechenizkiy, M., and Vleeshouwers, J.
Blanchardstown for their support in facilitating this research,        Predicting students drop out: a case study. In Barnes,
and staff at the National Learning Network for assistance              T., Desmarais, M. C., Romero, C., and Ventura, S.,
administering questionnaires during student induction.                 editors, Proceedings of the 2nd International
                                                                       Conference on Educational Data Mining, pages 41–50,
6.   REFERENCES                                                        Cordoba, Spain, 2009.
 [1] Achen, C. Intrepreting and Using Regression. Number          [16] Drachsler, H. and Greller, W. The pulse of learning
     07-029 in Quantitative Applications in the Social                 analytics. Understandings and expectations from the
     Sciences. Sage Publications, Inc, 1982.                           stakeholders. In 2nd International Conference on
 [2] Baker, R. S. J. D. and Yacef, K. The state of                     Learning Analytics and Knowledge, pages 120–129,
     educational data mining in 2009: A review and future              Vancouver, BC, Canada, 29 April- 2 May 2012. ACM.
     visions. Journal of Educational Data Mining,                 [17] Duff, A., Boyle, E., Dunleavy, K., and Ferguson, J.
     1(1):3–17, 2010.                                                  The relationship between personality, approach to
 [3] Bergin, S. Statistical and machine learning models to             learning and academic performance. Personality and
     predict programming performance. PhD thesis,                      Individual Differences, 36:1907–1920, 2004.
     Computer Science, NUI Maynooth, 2006.                        [18] Entwhistle, N. Contrasting perspectives in learning. In
 [4] Bidjerano, T. and Dai, D. Y. The relationship between             Marton, F., Hounsell, D., and Entwhistle, N., editors,
     the big-five model of personality and self-regulated              The Experience of Learning, pages 3–22. Edinburgh:
     learning strategies. Learning and Individual                      University of Edinburgh, Centre for Teaching,
     Differences, 17:69 – 81, 2007.                                    Learning and Assessment, 2005.
 [5] Biggs, J., Kember, D., and Leung, D. The revised             [19] Eppler, M. A. and Harju, B. L. Achievement
     two-factor study process questionnaire: R-SPQ-2F.                 motivation goals in relation to academic performance
     British Journal of Education Psychology, 71:133–149,              in traditional and nontraditional college students.
     2001.                                                             Research in Higher Education, 38 (5):557–573, 1997.
 [6] Buckingham Shum, S. and Deakin Crick, R. Learning            [20] Farsides, T. and Woodfield, R. Individual differences
     dispositions and transferable competencies. pedagogy,             and undergraduate academic success: The roles of
     modelling and learning analytics. In 2nd International            personality, intelligence, and application. Personality
     Conference on Learning Analytics and Knowledge,                   and Individual Differences, 34:1225–1243, 2003.
     pages 92–101, Vancouver, BC, Canada, 2012.                   [21] Fleming, N. D. I’m different, not dumb. Modes of
 [7] Carpenter, J. and Bithell, J. Bootstrap confidence                presentation (VARK) in the tertiary classroom.
     intervals - when, which, what? A practical guide for              Research and Development in Higher Education,
     medical statisticians. Statistics in Medicine,                    Proceedings of the 1995 Annual Conference of the
     19:1141–1164, 2000.                                               Higher Education and Research Development Society
 [8] Cassidy, S. Exploring individual differences as                   of Australasia, 18:308–313, 1995.
     determining factors in student academic achievement          [22] Goldberg, L. R. The development of markers for the
     big-five factor structure. Psychological Assessment, 4            and Individual Differences, 20:682–686, 2010.
     (1):26–42, 1992.                                             [36] Pardos, Z. A., Baker, R. S. J. D., San Pedro, M. O.
[23] Gray, G., McGuinness, C., and Owende, P. An                       C. A., Gowda, S. M., and Gowda, S. M. Affective
     investigation of psychometric measures for modelling              states and state test. Investigating how affect
     academic performance in tertiary education. In                    throughout the school year predicts end of year
     D’Mello, S. K., Calvo, R. A., and Olney, A., editors,             learning. In Proceedings of the Third International
     Sixth International Conference on Educational Data                Conference on Learning Analytics and Knowledge
     Mining, pages 240–243, Memphis, Tennessee, July 6-9               (LAK ’13), pages 117–124, Leuven, Belgium, April
     2013.                                                             2013. ACM.
[24] Gray, G., McGuinness, C., and Owende, P. An                  [37] Pintrich, P., Smith, D., Garcia, T., and McKeachie,
     application of classification models to predict learner           W. A manual for the use of the motivated strategies
     progression in tertiary education. 4th IEEE                       for learning questionnaire. Technical Report 91-B-004,
     International Advanced Computing Conference, pages                The Regents of the University of Michigan, 1991.
     549–554, February 2014.                                      [38] Poropat, A. E. A meta-analysis of the five-factor
[25] Gray, G., McGuinness, C., Owende, P., and Carthy, A.              model or personality and academic performance.
     A review of psychometric data analysis and                        Psychological Bulletin, 135(2):322–338, 2009.
     applications in modelling of academic achievement in         [39] Robbins, S. B., Lauver, K., Le, H., Davis, D., and
     tertiary education. Journal of Learning Analytics,                Langley, R. Do psychosocial and study skill factors
     1(1):75–106, 2014.                                                predict college outcomes? A meta analysis.
[26] Kang, Y. and Harring, J. R. Reexamining the impact                Psychological Bulletin, 130 (2):261–288, 2004.
     of non-normality in two-group comparison procedures.         [40] Sachin, B. R. and Vijay, S. M. A survey and future
     Journal of Experimental Education, in press.                      vision of data mining in educational field. In Advanced
[27] Kappe, R. and van der Flier, H. Using multiple and                Computing Communication Technologies (ACCT),
     specific criteria to assess the predictive validity of the        2012 Second International Conference on, pages
     big five personality factors on academic performance.             96–100, Jan 2012.
     Journal of Research in Personality, 44:142–145, 2010.        [41] Shute, V. and Ventura, M. Stealth Assessment.
[28] Kaufman, J. C., Agars, M. D., and Lopez-Wagner,                   Measuring and Supporting Learning in Video Games.
     M. C. The role of personality and motivation in                   The John D. and Catherine T. MacArthur Foundation
     predicting early college academic success in                      Reports on Digital Media and Learning. MIT Press,
     non-traditional students at a hispanic-serving                    2013.
     institution. Learning and Individual Differences,            [42] Siemens, G. Learning analytics. Envisioning a research
     18:492 – 496, 2008.                                               discipline and a domain of practice. Proceedings of the
[29] Knight, S., Buckingham Shum, S., and Littleton, K.                2nd International Conference on Learning Analytics
     Epistemology, pedagogy, assessment and learning                   and Knowledge, pages 4–8, 2012.
     analytics. In Third Conference on Learning Analytics         [43] Siemens, G. and Baker, R. S. J. D. Learning analytics
     and Knowledge (LAK 2013), pages 75–84, Leuven,                    and educational data mining. Towards communication
     Belgium, April 2013.                                              and collaboration. Proceedings of the 2nd
[30] Komarraju, M., Karau, S. J., Schmeck, R. R., and                  International Conference on Learning Analytics and
     Avdic, A. The big five personality traits, learning               Knowledge, pages 252–254, 2012.
     styles, and academic achievement. Personality and            [44] Swanberg, A. B. and Martinsen, Ø. L. Personality,
     Individual Differences, 51:472–477, 2011.                         approaches to learning and achievement. Educational
[31] Komarraju, M. and Nadler, D. Self-efficacy and                    Psychology, 30(1):75–88, 2010.
     academic achievement. Why do implicit beliefs, goals,        [45] Tavakol, M. and Dennick, R. Making sense of
     and effort regulation matter? Learning and Individual             Cronbach’s alpha. International Journal of Medical
     Differences, 25:67–72, 2013.                                      Education, 2:53–55, 2011.
[32] Marton, F. and Säljö, R. Approaches to learning. In        [46] Tempelaar, D. T., Cuypers, H., van de Vrie, E., Heck,
     Marton, F., Hounsell, D., and Entwhistle, N., editors,            A., and van der Kooij, H. Formative assessment and
     The Experience of Learning, pages 36–58. Edinburgh:               learning analytics. In Proceedings of the Third
     University of Edinburgh, Centre for Teaching,                     International Conference on Learning Analytics and
     Learning and Assessment, 2005.                                    Knowledge (LAK ’13), pages 205–209, New York, NY,
[33] Mislevy, R. J., Behrens, J. T., and Dicerbo, K. E.                USA, 2013. ACM.
     Design and discovery in educational assessment:              [47] Tishman, S., Jay, E., and Perkins, D. N. Teaching
     Evidence-centered design, psychometrics, and                      thinking disposition: From transmission to
     educational data mining. Journal of Educational Data              enculturation. Theory into Practice, 32:147–153, 1993.
     Mining, 4 (1):11–48, 2012.                                   [48] Vancouver, J. B. and Kendall, L. N. When self-efficacy
[34] Moran, M. A. and Crowley, M. J. The leaving                       negatively relates to motivation and performance in a
     certificate and first year university performance.                learning context. Journal of Applied Psychology,
     Journal of Statistical and Social Enquiry in Ireland,             91(5):1146–53, 2006.
     XXIV, part 1:231–266, 1979.                                  [49] Volet, S. E. Cognitive and affective variables in
[35] Ning, H. K. and Downing, K. The reciprocal                        academic learning: the significance of direction and
     relationship between motivation and self-regulation: A            effort in students’ goals. Learning and Instruction,
     longitudinal study on academic performance. Learning              7(3):235–254, 1996.