Learning Activity Features of High Performance Students

                   Fumiya Okubo, Kyushu University, Japan, fokubo@artsci.kyushu-u.ac.jp
                   Sachio Hirokawa, Kyushu University, Japan, hirokawa@cc.kyushu-u.ac.jp
                          Misato Oi, Kyushu University, Japan, oimisato@gmail.com
                  Atsushi Shimada, Kyushu University, Japan, atsushi@limu.ait.kyushu-u.ac.jp
                   Kentaro Kojima, Kyushu University, Japan, kojima@artsci.kyushu-u.ac.jp
                       Masanori Yamada, Kyushu University, Japan, mark@mark-lab.net
                      Hiroaki Ogata, Kyushu University, Japan, hiroaki.ogata@gmail.com

         Abstract: In this paper, we present a method of identifying learning activities that are
         important for students to achieve good grades. For this purpose, the data of 99 students were
         collected from a learning management system and an e-book system, including attendance,
         time on preparation and review, submission of reports, and quiz scores. We applied a support
         vector machine to these data to calculate a score of importance for each learning activity
         reflecting its contribution to the attainment of an A grade. Selecting certain important learning
         activities by following several evaluation measures, we verified that these learning activities
         played a crucial role in predicting final student achievements. One of the obtained results
         implies that time on preparation and review in the middle part of a course influences a
         student’s final achievement.

         Keywords: learning analytics, data of LMS and e-book system, learning activity feature, support vector
         machine

Introduction
In recent years, intensive data mining in the field of education research has become possible, as the widespread
use of ICT-based educational systems, such as learning management systems (LMSs) and e-book systems.
These systems enable us to automatically collect many kinds of log data that corresponding to students’ online
learning activities. Such collected data can be analyzed in order to identify the students’ learning activities and
typical learning patterns of particular target students or groups, for example, those who are likely to fail or drop
out of class, commonly referred to as “at-risk” students (Baradwaj & Pal (2011)).
          In October 2014, the well-known LMS Moodle and the e-book system BookLooper(1), provided by
KYOCERA MARUZEN Systems Integration Co., Ltd., were introduced at Kyushu University in Japan in order
to facilitate the collection and analysis of educational data. The e-book system records detailed action logs with
the user id and timestamp, such as moves back and forth between pages, the contents of memos, and the kind of
access device used (PC or smartphone). These records enable us to investigate a range of learning activities,
both inside and outside of class. Utilizing the log data stored in Moodle and BookLooper, a number of
investigations have been conducted at Kyushu University (Ogata et al. (2015)).
          Several studies on educational data mining have some specified measures of learning activities that are
utilized for visualizing and analyzing students’ learning behaviors. For example, in Okubo et al. (2015), it has
realized to visualize the four types of learning logs stored in an LMS and an e-book system, namely attendance,
time spent for browsing slides in an e-book system, submission of reports and quiz scores, by using a discrete
graph for each academic achievement, referring to the method proposed in Hlosta et al. (2014). On the other
hand, You (2016) has claimed that researchers ought to identify meaningful learning activities to predict
students’ achievement. They have verified significance of particular learning activities by the statistical analysis
methods. Focusing on methods of analysis, Ifenthaler and Widanapathirana (2014) showed case studies of
educational data mining utilizing the well-known method of machine learning, namely support vector machines
(SVMs) introduced in Cortes & Vapnik (1995). Goda et al. (2013) applied an SVM to students’ comments and
showed the relationships between self-evaluation comments and the final grade of the students.
          In this paper, we propose a method of discovering learning activities that are important for students to
achieve good grades, by applying SVM to log data stored in an LMS and an e-book system. For this purpose, we
use four types of learning logs, namely, attendance, slide browsing time, submission of reports and quiz scores.
We verify the performance of prediction of student’s final achievement on the basis of only certain learning
activities selected as important by our method. We also discuss an interpretation on learning activities selected
as important by our method. Finally, we give an indication of future research plans along the same line as the
research presented in this paper.

                                                          28
                     Copyright © 2016 for this paper by its authors. Copying permitted for private and academic purposes.
Method
Data collection
We collected the learning logs of 99 students attending an “Information Science” course that started in October,
2014. The course was held over 14 weeks, with a cancellation in the 9 th week: thus, 13 lectures were given. Each
of these lectures was presented by using several slides in the e-book system, with each slide associated with only
one lecture. The slides were used by the students to complete their preparation and/or review sessions before
and after each lecture, respectively. Furthermore, the students were required each week to submit a report and
answer a quiz that contained three to five questions related to that week’s lecture through the LMS.
          As mentioned above, we refer to four kinds of data stored in the LMS and the e-book system in this
paper, namely
      attendance or absence (represented by p),
      submission of a report or failure to do so (r),
      a sum of the time spent browsing slides for preparation and/or review (b), and
      quiz score (t),
of each student participating in each week of the course. For each of the four items, we consider whether or not
it was achieved. Thus, for the ith week, a particular student’s lecture attendance or absence is coded by the word
ip:o or ip:x, respectively. Each student’s report submission datum was coded as ir:o if a report was submitted
and ir:x if not, respectively. Slide browsing time was also transformed into a binary category, with browsing
time of 600 seconds or longer coded as ib:o, and anything shorter as ib:x. Similarly, a quiz score of 70% or
above was coded as it:o, and anything lower as it:x. For example, if a student in the 7th week attended a lecture,
did not submit a report, browsed slides for longer than 600 seconds, and scored below 70% on the quiz, the
words 7p:o, 7r:x, 7b:o, and 7t :x would represent this student’s activities in the 7th week. We note that
the data on the activities of each student in the class can be represented by an 8*14=112-dimensional vector, in
which each element is either 0 (for ip:x, ir:x, ib:x, and it:x) or 1 (for ip:o, ir:o, ib:o, and it:o).
          The students in the course were graded in terms of the categories A, B, C, D, and F according to their
grade score out of 100, which reflected all their activities. The relationship between grade and grade score is
indicated in Table 1, which shows the frequency with which the 99 students attained each grade. The words s:A,
s:B, s:C. s:D and s:F were used to represent the log data of these grades.
          By registering the data of the 99 students as documents, we constructed a search engine by means of
GETA(2)(Generic Engine for Transposable Association) provided by National Institute of Informatics.

Table 1: Frequency of grades and grade scores.

                          Grade        Grade score range           Number of students
                           A               90-100                         37
                            B               80-89                         30
                            C               70-79                         13
                           D                60-69                         9
                            F                0-59                         10


Classification based on SVM and feature selection
The aim of this study is to discover important learning activities that distinguish students achieving A grades
from students achieving lower grades. For this purpose, we utilized an SVM in which the documents of A grade
students are positive instances and the documents of students with other grades are negative instances.
Following the method proposed in Sakai & Hirokawa (2012), we applied machine learning method based on
SVM and feature selection to classify students’ learning activities data.
          An evaluation of classification performance of the proposed method is conducted be means of 5-fold
cross validation. Thus, the data are separated into five parts, of which four parts are used as training data and the
remaining part as test data. Then, there are five ways to choose the parts for training data and test data. Thus, the
final result of such 5-fold cross validation is the average of the results of these five ways.
          Specifically, our method is conducted in terms of the following steps.

Data generation by multiple instance method
The collected data consisted of 37 positive instances and 62 negative instances. As the number of instances in
the training data is somewhat small, we attempt to apply a restricted version of multiple instance learning
                                                         29
(Dietterich et al. (1997)). Specifically, from the training data, we randomly choose a pair of positive instances,
and regard this pair as a new positive instance (called a “bag” in Dietterich et al. (1997)). In this way, we
construct new 100 positive instances. Similarly, 100 new negative instances can be constructed. By adding these
new instances to the original training data, we can increase its volume.

Application of linear SVM
Applying linear SVM to the training data containing all words, we construct the model that classifies the
documents of A grade students. We used SVM-light(3) for the learning tool.
          Recall that a document of a student is represented by a 112-dimensional vector. For a word w and a
document d, the number of occurrences of w in d is denoted by tf(w, d), which equals either 0 or 1. The
classifier (or model) f of the linear SVM learned from the training data is of the form
                                    f (d) = å weight(wi ) ´tf (wi , d) + b,
                                            wi Îd
where b is a constant term. For a document d of test data, if f(d) is greater than 0, then d is classified as a
document of grade A student. Conversely, if f(d) is less than 0, then d is classified as a document of a student
with another grade.

Score of word and feature selection
For a given word wi, its weight(wi) can be regarded as a score of importance of wi on the classifier f. A score of
importance of wi can be obtained by applying f to a document containing only wi and removing the constant b.
          Feature selection of words is conducted by following the six measures for evaluation, shown in Table
2, in which the number of documents containing w is denoted by df(w) and an absolute value of x is denoted by
abs(x). For N=1, 2, …, 10, 20, …70, we choose the top N positive words and top N negative words regarding
each measure of w.o, d.o, and l.o. Similarly, we choose the top 2N words regarding each measure of w.a, d.a,
and l.a. These 2N words are called feature words of f.
          We apply linear SVM to the training data which containing 2N words selected by the six type of feature
selection and to the training data of all words. Then, using the test data, we evaluate an estimation performance
of each model obtained by the linear SVM, by means of 5-fold cross validation.

Table 2: Measures for feature selection.

                                     Type             Measure for evaluation
                                      w.o                   weight(wi)
                                      d.o               weight(wi)*df(wi)
                                      l.o             weight(wi)*log(df(wi))
                                      w.a                abs(weight(wi))
                                      d.a             abs(weight(wi)*df(wi))
                                      l.a           abs(weight(wi)*log(df(wi)))


Experimental results
Accuracy
We have conducted experiments to obtain the values for accuracy, precision, recall, and F-measure as evaluation
indexes for each model obtained by the linear SVM for
     all words, and 2N selected words with N=1, 2, …, 10, 20, …, 70, and
     the six types (w.o, d.o, l.o, w.a, d.a and l.a) of feature selection.
Figure 1 illustrates the relationship between the accuracy of each model obtained by the linear SVM and the
value of N. The vertical axis represents for accuracy, and the horizontal axis is for the number of selected words
N. The baseline represents the accuracy of the model by using all words.


                                                         30
                                                 Figure 2. Accuracy

In the case of the model using all words, the accuracy was 0.7358. On the other hand, when N=6, applying
feature selection w.o, the accuracy was 0.8853, which was the best score of all models. Thus, it seems that an
appropriate feature selection based on linear SVM may be effective in reinforcing the estimation performance.
         The scores for precision, recall, F-measure, and accuracy of the five models with the best scores are
summarized in Table 3.

            Table 3. The top five models and their precision, recall, F-measure, and accuracy scores.

               Type of FS     N      precision          recall         F-measure          accuracy
                  w.o         6       0.7000           1.0000            0.7976            0.8853
                  d.a         4       0.7033           1.0000            0.8148            0.8470
                  w.o         8       0.7286           1.0000            0.8359            0.8300
                  l.a         9       0.6451           1.0000            0.7641            0.8224
                  w.o         9       0.6803           1.0000            0.8092            0.8139


Feature word
For the case of N=6, applying feature selection by w.o, we summarize the top six positive feature words and the
top six negative feature words and their scores of importance in Table 4.
          For example, the presence of the 12th week was the most influential learning activity in obtaining an A
grade, and preparation and/or review of the 6 th week was the most influential in failing to achieve an A grade.
          We notice that 11b:o is the second positive feature words, while 11b:x appears as the third negative
feature word. Thus, it can be suggested that preparation and/or review of the 11 th week significantly
distinguished A grade students from other students. In this “information science” course, the learning contents
for the 11st week was bucket sort and binary search. It may therefore be supposed that
      these contents were the basis of other contents in the following weeks, or
      these contents were included in the final examination and were sufficiently important to classify the
       students’ grades.
Focusing on negative feature words, the top three words were of the form ib:x, reflecting less than 600 seconds
of slide browsing time of for preparation and/or review. The top three negative words shown that, in the middle
part of the course, the students who neglected a preparation and/or review missed achieving an A grade. This
result suggested that it may be important for a teacher to guide students in continuing to prepare for and review
lectures through out the course, until the last week, in order to maximize their achievement.
          These feature words may be used for two purposes. First, after finishing the course and grading the
students, a teacher can tell the students which learning activities were not sufficient to obtain a good grade.
Second, if a teacher is to conduct a similar course in the future, he/she can call students’ attention to the learning
activities indicated by positive feature words, and advise them to avoid learning activities indicated by negative
feature words.

                                                         31
                        Table 4. The top six positive and negative feature words for N=6.

                                 Positive                                  Negative
                      Word            Score of word            Word            Score of word
                      12p:o               0.4554                6b:x               -1.0110
                      11b:o               0.4480                8b:x               -0.9019
                      10r:o               0.3223               11b:x               -0.8511
                      11r:x               0.2871                8r:x               -0.6693
                      5b:o                0.2686                3t:x               -0.6227
                       8p:x               0.2415               13t:x               -0.5108


Course grade scores vs. predicated A-score
For the case of N=6, applying feature selection by w.o, we let f1, f2, …, f5 be the classifiers of linear SVM,
learned during the 5-fold cross validation. Then, we define the predicated A-score pr(d) of a document d of a
student as the average of f1(d), f2(d), …, f5(d).
          For each student, we compared the grade score with the predicated A-score. The results are
summarized in Figure 2, in which the vertical axis shows the predicated A-score, and the horizontal axis the
grade score.
          We can observe that there is a positive correlation between grade score and predicated A-score.
Specifically, the correlation coefficient of them is 0.6333. Thus, it can be said that this model regarding whether
or not students obtain an A grade is appropriate to discuss students’ grade scores.


                                Figure 2. Course grade score vs. predicated A-score

Conclusion
In this paper, we proposed a method by which learning activities important for attaining high achievement in a
course may be identified by using learning logs stored in an LMS and an e-book system. These logs contain the
following four items, namely, attendance, slide browsing time for preparation and/or review, submission of
reports, quiz scores, and grades. The learning activities of the students in a course can be represented by a vector
that reflects achievement or non-achievement of each of the above four items in each week. In our method, first,
linear SVM is applied to these vectors and a score of importance for the contribution of each learning activity to
students’ attainment of an A grade is calculated. Following this, we select N activities that have the best and
worst scores of importance by following the six measures for evaluation. We then apply linear SVM to the data
that consists of only the selected activities to verify that these activities are sufficient to infer a student’s final
achievement.
          We applied this method to the data from 99 students attending an “Information Science” course. In the
case of N=6, and applying feature selection w.o, the accuracy of prediction by using linear SVM was 0.8853,
which was the best score of all constructed models. On the other hand, in the case of using of all learning
activities, the accuracy was 0.7358. Hence, feature selection based on linear SVM appears to be effective in

                                                          32
reinforcing the estimation performance. The selected activities were shown in section Table 3. From these
results, we can observe that (i) preparation and/or review in the 11th week significantly distinguished A grade
students from other students, and (ii) students who neglected preparation and/or review in the middle part of the
course were unlikely to obtain an A grade. Furthermore, for each student, we compared the grade score with the
predicated A-score by linear SVM with N=6, applying feature selection w.o. The correlation coefficient of them
was 0.6333. Thus, it seems that the model regarding whether or not students obtain an A grade is appropriate in
discussing the grade scores of students.
          These results can be informative in telling students which learning activities were insufficient to obtain
a good grade, and in advising students in following years of the same course on the important learning activities.
          A number of issues remain to be investigated. Points of particular importance includes the following:
                  More data from additional courses is required to support the present conclusions. It may be
         also interesting to compare the results of this study to data from another course.
                  In our method, the thresholds for the achievement of slide browsing time for preparation
         and/or review and quiz scores were decided manually by the authors with no justification. It is
         important to determine the most suitable thresholds for identifying the specific features of the learning
         activities of high achieving students, automatically.
                  By using our method, we can discover important learning activities for a good achievement.
         However, the reasons why these learning activities are selected by the model are not so easy to
         understand. Analysis of the relationships among learning activities, such as associations analysis, may
         help to interpret the present results further.

Endnotes
(1)      http://booklooper.jp
(2)      http://geta.nii.a.c.jp
(3)      http://svmlight.joachims.org/

References
     Baradwaj, B. & Pal, S. (2011) Mining Educational Data to Analyze Student’s Performance, International
          Journal of Advanced Computer Science and Applications, vol. 6, 2, pp. 63-69.
     Cortes, C. & Vapnik, V. (1995) Support-Vector Networks, Machine Learning, vol.20, pp.273-297.
Dietterich, T.G., Lathrop, R.H. & Lozano-Perez, T. (1997) Solving the multiple instance problem with axis-
         parallel rectangles, Artificial Intelligence, Vol. 89, No.1-1, pp.31-71.
Goda, K., Hirokawa, S., & Mine, T. (2013) Correlation of grade prediction performance and validity of self-
         evaluation comments, Proc. SIGITE’13, pp. 35-42.
Hlosta, M., Herrmannová, D., Váchová, L., Kužílek, J., Zdrahal, Z. & Wolff, A. (2014) Modelling student
         online behaviour in a virtual learning environment, Workshop Proc. LAK 2014, 4 pages.
Ifenthaler, D. & Widanapathirana, C. (2014) Development and Validation of a Learning Analytics Framework:
         Two Case Studies Using Support Vector Machines, Technology, Knowledge and Learning, vol.19,
         pp.221-240.
Ogata, H., Yin, C., Oi, M., Okubo, F., Shimada, A., Kojima, K. & Yamada, M. (2015) E-Book-based Learning
         Analytics in University Education, proc. ICCE2015, pp.401-406.
Okubo, F., Shimada, A., Yin, C. & Ogata, H. (2015) Visualization and Prediction of Learning Activities by
         Using Discrete Graphs, proc. ICCE2015, pp.739-744.
Sakai, T. & Hirokawa, S. (2012) Feature Words that Classify Problem Sentence in Scientific Article, Proc.
         iiWAS2012, pp.360-367.
You, J. W. (2016) Identifying significant indicators using LMS data to predict course achievement in online
         learning, Internet and Higher Education, vol.29, pp.23-30.

Acknowledgments
The research results have been achieved by “Research and Development on Fundamental and Utilization
Technologies for Social Big Data”, the Commissioned Research of National Institute of Information and
Communications Technology (NICT), Japan. The work of F. Okubo was in part supported by Kyushu
University Interdisciplinary Programs in Education and Projects in Research Development No.27115.


                                                        33