<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards A General Method for Building Predictive Models of Learner Success using Educational Time Series Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christopher A. Brooks</string-name>
          <email>brooksch@umich.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Craig Thompson</string-name>
          <email>craig.thompson@usask.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stephanie Teasley</string-name>
          <email>steasley@umich.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer</institution>
          ,
          <addr-line>Science</addr-line>
          ,
          <institution>University of Saskatchewan</institution>
          ,
          <addr-line>Saskatchewan</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Information, University of Michigan</institution>
          ,
          <addr-line>Michigan</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents a pedagogical and instructional-technology general method for building predictive models for education from time series log data. While it is common for models of learner achievement to include cognitive features, we instead are data mining only resource accesses in the learning environment. This has bene ts in that the approach is inherently scalable to new contexts due to its data driven nature. While we have only just begun to apply these methods to our institutional Massive Open Online Course (MOOC) data, it shows promise as both a descriptive modeling technique as well as an engine for creating predictive early alerts.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>Predictive models in education generally require intimate
knowledge of the domain being taught, the objectives
being learned, and the pedagogical circumstances under which
the instruction takes place. While there is some work that
focuses on removing some of these constraints and
focusing instead on speci c tools or pedagogies (e.g. analysis of
discussion forum communication), this limits techniques to
only those courses which use a particular technology or
pedagogical approach.</p>
      <p>In this paper we present our initial work towards a general
method of building predictive models for educational data.
Unlike existing work in the area, we aim to build models
solely from coarse grained observations of interactions
between a student and course resources over time. Our goal
is not to build the most predictive model for a particular
course, though predictive accuracy is an important aspect
of our work. Instead, we aim to enable \one click modelling"
of a large variety of educational data systems without the
need to involve instructors, pedagogical experts, or learning
technologists. These models can then be used to gain insight
into how a course operates, build early-warning systems for
student success, or characterise how courses relate to one
another.</p>
      <p>Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.</p>
      <p>LAK ’14, March 24 - 28 2014, Indianapolis, IN, USA
Copyright 2014 ACM 978-1-4503-2664-3/14/03 ...$15.00.</p>
      <p>A strong motivation for this approach comes from the
growing list of educational software systems that collect
socalled \clickstream" data about learners. For instance, the
BlackBoard and Sakai learning content management
systems both collect data on the accesses learners have with
various tools and content, the Opencast lecture capture
system collects ne-grained data on access to lecture video and
con guration of the playback environment, and the
Coursera massive open online course platform collects web logs of
how users have navigated through the course website. All of
these systems do this educational data logging in addition
to maintaining traditional operations data based on the
features available to learners.</p>
      <p>This paper proceeds as follows: In section 2 we provide
a more formal de nition for our characterisation of
educational log data. This is followed in section 3 where we focus
on demonstrating how time series data from the Coursera
platform can be used to generate predictive models with
little effort. We provide discussion of a novel method of
mining time series data based on n-gram techniques used in
text mining, as well as details on how accurate and reusable
models might be for MOOC environments. We conclude
the work in section 4 with a discussion of impact and future
directions.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>EDUCATIONAL DATA</title>
      <p>
        Much of the attention in the technology enhanced learning
eld has been paid to understanding how people learn from
a cognitive perspective. For instance, Anderson's ACT-R
theory of skill knowledge [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which is used as a basis for
many intelligent tutoring systems (see [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]), suggests that
cognitive skills can be described as production rules: small
operations of data manipulation organized around atomic
goals. Firing of correct rules is done repeatedly with the
facts available to a learner, causing them to demonstrate a
particular higher level cognitive skill. Inability to re correct
rules in such a way that a skill is demonstrated indicates
a lack of having the correct rules, and suggests a need for
educational intervention (learning) or that the rule matching
mechanism needs improvement.
      </p>
      <p>
        An alternative to this is Ohlsson's theory of learning based
on performance errors, where he argues that it is through
making mistakes and correcting them that we demonstrate
learning [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Providing a correct answer does not signify the
learner understands; instead, the learner may just not yet
have made a mistake and may have inadvertently answered
correctly. It is the times the learner demonstrates mistakes
that indicate learning is happening. This approach is core to
the constraint-based modeling family of intelligent tutoring
systems such as [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        Interactions with content and problems are not the only
learning theories, and learning through communication with
other individuals has been well explored under the theory of
social constructivism [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. While the majority of work in
this area has been on peer-to-peer learning through chat or
discussion forums, some have also applied intelligent systems
in the form of peer matching [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or tutors based on dialogue
systems [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>In this paper, we refrain from trying to understand,
apply, and model learning processes directly, and instead aim
to build learning systems that observe patters of interactions
students have with resources. This is a data-driven
perspective on the learning process, and we aim to recognize
successful patterns of achievement by virtue of their existence
in the learning environment. This has both advantages and
disadvantages to traditional methods of learner modeling,
with both of these explored in section 4.</p>
      <p>We view the learning system as being made up of ve
pieces: students, resources, interactions, events, and some
measurement of outcome. The rst of these, students, is a set
of individuals who interact with some learning environment.
These individuals have characteristics that are known when
they rst begin interacting with the environment and, for
simpli cation of modelling purposes, these characteristics do
not change. For example, demographic variables (e.g. age,
gender, ethnicity) as well as prior knowledge (e.g. previous
grades or other measures of evaluation) can be associated
with an individual, and may be a direct in uence on their
outcomes. In the results described in the next session we
omit from our modeling student characteristics, but we note
here that they may be useful (and readily accessible) when
creating predictive models.</p>
      <p>Students interact with a learning system through resources.
These resources may be web content, discussion forums,
lecture video, or even intelligent tutoring systems. Resources
may be described through different levels of generalization.
For instance the coarse grain \lecture" resource may be make
up of individual \lectures" each of which may be made up of
\segments". An important distinction between this view of
resources and others is that we intentionally con ate
pedagogy, technology, and content into a single item, and do not
attempt to disambiguate resources by de ning them to be
about concepts, methods, or delivery mechanisms.</p>
      <p>An interaction denotes a singular circumstance in which
a student uses a resource, and represents a temporal
relationship between the student and resource. For instance, an
interaction may be viewing a lecture, submitting a quiz, or
reading a discussion forum post. It is expected that
individual interactions will be manipulated through aggregation,
summation, scaling, or other mathematical functions in
order to describe different levels of granularity that may be
useful in the modelling process. This manipulation is to be
applied in an automated manner, and not require a priori
hypotheses based on the content, concepts, or individuals
involved.</p>
      <p>Each interaction exists between two events. Events are
demarcations of the beginning and end of time-frames of
interest. Conceptually, events can be hierarchically arranged,
and a given set of data might have a start and end time
which encompass other events such as assignment deadlines,
examinations, or course beginning and endings. In the
investigation section to follow we will focus only on a single
set of events that note the beginning and end of a course,
but one can readily imagine how it may be useful to predict
outcomes for other pairs of events.</p>
      <p>
        Educational outcomes can be measured in various ways
including through taxonomies of skill acquisition (e.g. through
Blooms taxonomy [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] or the like), grades (which may be
content-based or a comparison between students in a
cohort), or student satisfaction (which may be measured through
self-reports or through proxy variables such as retention in a
program). In our characterization of educational data
modelling we make no attempt to link speci c interactions to
outcomes in a theoretical matter. Instead, we argue that
correlations found through the data mining process will
either support or not support linkages between interaction
patterns and educational theory. Thus, evidence for
learning theory is an output of the modelling process, which can
be re ected upon by practitioners, but theory is not
necessarily an input to the process. The only constraint we put
on the educational outcome is that it be well-de ned and
measurable so that it can be used as a predictor variable in
the data mining process.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. INITIAL INVESTIGATION</title>
      <p>For an approach to be considered a strong contribution to
the elds of learning analytics and educational data mining
we outline three criteria: First, the approach must be able
to produce accurate descriptive models for different
circumstances (which may include different outcomes and/or kinds
of different interactions). While there is no clear cut-off as
to how accurate a model must be to be useful, we nd this
discussion one of growing important and refer to it as
descriptive validity. Second, the models generated must have
some level of intra-course validity. We recognize that
variance exists between courses (or course offerings), and that
population changes can have a signi cant impact on validity
of models. We have no clear cut-off as to how applicable a
given model must be in new circumstances in order to be
valuable for the eld. Nonetheless, this is an important
issue to consider when building predictive models. Finally, in
addition to descriptive validity and intra-course validity, it
is important to recognize the predictive validity of a given
technique. How does the passage of time affect the accuracy
of a model trained from previous circumstances? This is a
not well understood issue in the eld, yet a critical one in
being able to compare the results of various techniques.</p>
      <p>In this section we describe our initial investigations using
the aforementioned characterization of educational data. As
our work is ongoing, we have not completely addressed how
our approach meets these three criteria. Instead, we
provide a work in progress of our initial methods, results, and
validation efforts.
3.1</p>
    </sec>
    <sec id="sec-4">
      <title>Methodology</title>
      <p>In our rst application of this approach we have chosen
two offerings of a Massive Open Online Course (MOOC)
that was delivered through the Coursera platform.
Coursera stores individual page requests in a JSON-encoded
clickstream le, which we transformed into a comma separated
list of values1. The results are log les where interactions
are in the form (username; timestamp; resource) where the
username is some uniquely identifying hashed value of the
learner interacting with the system, the timestamp is the
server time when a resource was accessed2, and resource is
one of lectureview, f orumthread, or quizattempt based on
the URL path being accessed. While more details as to
which resources were being viewed are available (e.g. the
speci c lecture, forum thread, or quiz), we began our
investigation with only the coarse grain description of resources
being used.</p>
      <p>In this investigation we have three research questions we
want to answer with this data:</p>
      <p>R1 Can we create an explanatory model that describes the
patters of interaction that lead to learners achieving a
distinction (85% or higher) in nal course grade?
R2 Can we create a predictive model of learner
distinction (85% or higher in course grade) from interactions
in one course that have validity in a second course
offering?
R3 How accurate is a predictive model of learner
distinction (85% or higher in course grade) when applied with
limited data (e.g. for the formation of an early alert
system)?</p>
      <p>
        To address these questions, we formed predictive
models with J48 decision trees using the weka toolkit [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. For
each model, we performed a number of automated
transformations to extract features from the set of interactions as
described in the next section. We have made the software
for creating these features freely available at URL.
3.2
      </p>
    </sec>
    <sec id="sec-5">
      <title>Creating Features from Time Series Data</title>
      <p>All of the features here are described in binary; either an
access for a particular time existed (f eature = 1) or did not
(f eature = 0).
3.2.1</p>
      <sec id="sec-5-1">
        <title>Relative Offsets</title>
        <p>As we were interested in comparing two courses offered
in different calendar months, we changed all accesses to be
relative to the start of the course. We also pruned the course
interactions to ten weeks (the listed length of the course)
from the rst day the course was made available to students.
Using a single day as our smallest level of granularity, this
provided us with 71 attributes for each learner.
3.2.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Varying Degrees of Granularity</title>
        <p>It is difficult to know at what granularity one should
consider educational time series data. Some applications may
generate very ne grained resolution data, such as
milliseconds for kinesthetic learning tasks (e.g. learning to plan
a musical instrument), or second and minute resolution for
atomic learning tasks (e.g. those used by ACT-R inspired
tutoring systems). Given the sparsity of our data, we
aggregated access into three day long, week long, and month
long values for each learner. Thus the feature vector for each
learner included 71 daily accesses, 25 three day accesses, 11
week accesses, and 3 month accesses, all relative values from
the start of the course. We also included counts of the
numbers of accesses on different days of the calendar week (e.g.
Sunday through Saturday), adding another 7 attributes.
1See https://bitbucket.org/umuselab/mooc-scripts for
the open source scripts used for this process.
2We did no modi cation of these values for the time zone
the learner happened to be in.
3.2.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Applying N-Grams to Temporal Accesses</title>
        <p>The co-occurrence of features based on the time series
data may represent patterns that describe success (or lack
thereof). For instance, if all students who watch lectures
on the six, seventh, and eighth day of the course end up
with distinction in the course while those who do not watch
lectures these days fail to get distinction then this pattern
of behavior is valuable (and would captured by our existing
transformations). If, however, a successful pattern of
interaction was in watching consecutive lectures on any three
days, this pattern may be missed by our existing non-pattern
features.</p>
        <p>To capture these kinds of patterns, we apply the well-used
n-gram technique from text mining to interactions. An
ngram is a sequence of n words, and n-grams features are
often used as counts of particular n-grams. For instance, if the
words \quick brown fox" occurs twice in a given document,
the n-gram (in this case a 3-gram) feature quick brown fox
would have a value of two. In our data we are dealing with
accesses to resources such as lecture videos, so an n-gram
with the pattern (0,1,0), the label of Week, and count of 2
would indicate that a student had two occurrences of the
pattern of not watching lectures in one week, watching in
the next week, and then not watching again in the third
week.</p>
        <p>We generate the set of n-grams ranging from 2-grams to
5-grams covering all permutations of (0; 1) from (0; 0) to
(1; 1; 1; 1; 1). We repeat this process for features of days,
3day lengths, weeks, and months. The n-gram feature counts
for a given course dataset were normalized to be values
between 0 and 1. Together with the features described in
sections 3.2.1 and 3.2.2, we had a total of 1,071 features for
training.
3.3</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Results</title>
      <p>Our dataset was made up of interactions including 87K
accesses to the discussion forums, 130K accesses to the quiz
system, and 2.8M accesses to the lecture videos. It is well
recognized that the vast majority of users who sign up for
a MOOC do not participate in evaluation mechanisms. Our
educational outcome of interest was whether learners who
were actively involved in the course achieved a distinction
or not, and we split our dataset on the 85% grade for the
course (pruning learners who received a grade of zero), and
balanced the two halves through random subsampling. Our
nal dataset size was of 5,118 users.
3.3.1</p>
      <sec id="sec-6-1">
        <title>An Internal Descriptive Model</title>
        <p>Our rst interest was in building a descriptive model of the
two cohorts (hereafter called low achieving and high
achieving respectively). Such a model could be used by instructors
or instructional designers to help guide the development of
future courses by identifying the correlations between access
patterns and success. After building the model in weka3
using the features described and ten fold cross validation we
were able to correctly classify 91% of students, attaining a
kappa of 0.8199. Table 1 shows the confusion matrix for this
model.</p>
        <p>The rules created for this decision tree are fairly simple
(Figure 1). The rst decision is based on the 3 day quiz
3All models described in this paper were built with Weka
version 3.6 and J48 classi er parameters having a con dence
of 0.25 and a minimum leaf node size of 50.
(0, 0, 0, 0, 0) 3 Day Quiz Pattern &lt;= 0.2
| (0, 0, 0) Day Quiz Pattern &lt;= 0.62963: high (2526/275)
| (0, 0, 0) Day Quiz Pattern &gt; 0.62963
| | Month 2 Lecture = 0
| | | (0, 0) Day Quiz Pattern &lt;= 0.711111: high (50/17)
| | | (0, 0) Day Quiz Pattern &gt; 0.711111: low (120/37)
| | Month 2 Lecture = 1: high (200.0/46.0)
(0, 0, 0, 0, 0) 3 Day Quiz Pattern &gt; 0.2: low (2200/73)
access pattern of (0, 0, 0, 0, 0), which represents the number
of times a given student has not accessed quizzes in a 15 day
period (i.e. 5 consecutive three day periods where quizzes
were not accessed). This value is normalized to the dataset4,
and those students who have more than a value of 0.2 were
largely unable to achieve distinction (2,200 students had this
pattern in the training set). Students who had a less than or
equal to 0.2 value for this attribute were next distinguished
by whether they had a high (0.62963 or higher) three single
day quiz patterns of (0, 0, 0), with 2,526 students being
classi ed as high achievement on this alone. The last two
patterns look at whether the students viewed lectures in
the second month of the course offering and, if not, further
patterns related to quiz usage. While we are not learning
designers, one might infer from this that attempting the
quizzes is perhaps sufficient in order to gain distinction in
this course.</p>
      </sec>
      <sec id="sec-6-2">
        <title>3.3.2 Intra-Course Predictive Validity</title>
        <p>We were interested in testing how valid the model
described in Figure 1 would be at predicting distinction
achievement in subsequent offerings of the same course. This is a
challenging issue for predictive analytics, as changes in the
population or the circumstances by which they interact with
course resources will reduce the efficacy of the model. We
naively applied our previously trained model to a subsequent
course offering with 4,776 users, and correctly classi ed 65%
of the students, achieving a kappa of 0.307. An investigation
of resource utilization revealed that accesses to the quiz and
forum resources for the second course offering were much
different than in the rst offering, with zero access to quiz
content after the third week of the course. Figures 4a through
4f show histograms of the access to resources between the
two courses.</p>
        <p>While the details as to why the second offering of this
course showed different accesses were not available in time
for workshop publication (e.g. system log failure, dramatic
change in pedagogy, etc.), this does demonstrate an
im4The student with the most number of (0, 0, 0, 0, 0) quiz
attempt patterns would have a count of 1, and the student
with the least number of these patterns would have a count
of 0. Thus the closer a students' count is to zero the more
rare this pattern is in their interaction history.
portant issue when building automated predictive models.
Namely, that there should be some metric by which the time
series data of two courses can be compared in order to
determine the appropriateness of applying a particular model. In
this case, access to lecture videos (Figures 4e and 4f) in the
courses appears roughly similar, while the access to quizzes
and forum messages does not.5</p>
        <p>We retrained the predictive model for the rst offering of
the course using only lectureview resource events. We omit
the confusion matrix for brevity, and show several of the
rules that were generated in Figure 2. We applied this model
to the second course offering data, and were able to correctly
classify 78.1% of instances, achieving a kappa of 0.563. Not
only was this signi cantly better than the application of the
original model trained on all of the resources, but a kappa
of this magnitude is reasonable when developing low risk
interventions.
3.3.3</p>
      </sec>
      <sec id="sec-6-3">
        <title>Midterm Intra-Course Predictive Validity</title>
        <p>
          Re ecting on patterns of success for a course after it has
nished can be a useful endeavor for course design, and the
patterns of success generated for one course may be
indicators of success for similar courses (as shown in the previous
section). However, there is much interest within the
learning analytics community to build models that can be used to
predict academic risk so that automated interventions can
take place while the course is being offered (e.g. [
          <xref ref-type="bibr" rid="ref2 ref9">2, 9</xref>
          ]. To
investigate the suitability of a time series analysis approach
to the task of early warning, we trained a predictive model
from the rst course offering based on ve weeks worth (half)
of interaction data with lectures, using the same notion of
success (85% or higher).
        </p>
        <p>When applied to the rst ve weeks of the second course
offering, we were able to correctly classify 68.69% of the
students, a kappa of 0.374. Table 2 shows the confusion
matrix for this prediction; note the roughly balanced level
of misclassi cation, suggesting the model is roughly equally
good (or bad) at predicting whether people will fall below
or above the 85% mark. The rules for this model, given in
Figure 3, show a variety of decisions of patterns of smaller
length, suggesting that large patterns may be more useful
with an increase in time frame.
5Despite the apparent similarity of these histograms, a
twosampled Kolmogorov{Smirnov test of goodness of t
between the samples did not suggest that they were drawn
from the same population. It may be that this test is too
sensitive with this many data points, or that the samples
were indeed different populations at a p = 0:01 level.
Regardless, the demonstrated value of the model as described
in the remainder of this section suggests that other measures
of similarity may be needed.
(0, 0, 0, 0, 0) 3 Day Lecture Pattern &lt;= 0.15
| Month 2 Lecture = 0
| | 3 Day Lecture starting on Day 19 = 0
| | | (0, 0, 0, 0, 1) 1 Day Lecture Pattern &lt;= 0.4: low (50/18)
| | | (0, 0, 0, 0, 1) 1 Day Lecture Pattern &gt; 0.4: high (134/51)
| | 3 Day Lecture starting on Day 19 = 1: 85 (201/32)
| Month 2 Lecture = 1: high (1832/239)
(0, 0, 0, 0, 0) 3 Day Lecture Pattern &gt; 0.15
| Month 2 Lecture = 0
| | Week 8 Lecture = 0: 0 (2471/412)
| | Week 8 Lecture = 1
| | | (1, 1, 0, 0) 1 Day Lecture Pattern &lt;= 0: high (59/16)
| | | (1, 1, 0, 0) 1 Day Lecture Pattern &gt; 0: low (55/23)
| Month 2 Lecture = 1: high (294/87)</p>
        <p>Predicted Class
low achievement high achievement
1,672 716
779 1,609</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSIONS AND FUTURE WORK</title>
      <p>In this paper we have framed the activity of creating
predictive educational models as one of modelling time series
events inherent in educational log data. This contrasts
signi cantly with theory-driven methods of modelling learners
in that we consider no cognitive processes explicitly, and just
analyze the observations of interactions that learners have
with learning resources. Our approach is largely enabled by
the near-ubiquitous interaction level logs kept by modern
educational technology environments, and the growing size
of educational datasets available.</p>
      <p>A signi cant cost in learner modeling is the amount of
time and sophistication required to map both the
cognitive and subject domains onto the learning tools being made
available. We aim to ease this by requiring no explicit
knowledge of learning process in order to form predictive models.
These models are based solely on the interactions learners
have with resources in the learning environment. Our end
goal is to enable course-speci c predictive modeling based on
historic data without requiring the input of subject matter
experts or learning designers.</p>
      <p>While no trained educator is required to apply this
technique, historical data is needed. Thus in situations where
historical data is not available (e.g. a new course offering),
other forms of modeling learners must be used. Further, we
know of no clear measure by which two courses (or more
properly, two sets of learner interactions with resources) can
be compared to determine their similarity. Thus it is
unclear how one might determine whether it is appropriate to
apply an existing models to a new circumstance. We point
to this as being a signi cant issue in moving forward with
this approach.</p>
      <p>This work is in its infancy, and we have presented here
only a basic investigation of how educational time series data
can be used to predict student success. There are a
number of compelling questions which we are considering going
forward, including:</p>
      <p>How much data is required in order to build robust
predictive models? In this paper we used data from a
MOOC offered on the Coursera platform. Is this
technique only appropriate for extremely large datasets, or
is the data available from traditional course
management systems suitable as well?
Can more sophisticated temporal manipulations increase
the accuracy of models? For instance, does describing
a time period as if it were a continuous distribution
with a given skew and kurtosis create a useful
interaction pattern?
Can date patterns be generated from the underlying
data instead of through top down direction as we have
done? We chose combinations of days, three day
sequences, weeks, and months as levels of granularity for
feature extraction, but it does not seem unreasonable
that other segments may also be useful. Is it
possible to derive this from the interaction data directly,
leading to less arbitrary time divisions?
5.
(a) Histogram of quiz accesses by day for the rst course offering.
(b) Histogram of quiz accesses by day for the second course
offering. Note the lack of data starting around day 17, leading to
inaccurate predictions from the original trained model.
(c) Histogram of forum accesses by day for the rst course offering.
(d) Histogram of forum accesses by day for the second course
offering. Note the lack of data starting around day 17, leading to
inaccurate predictions from the original trained model.
(e) Histogram of lecture video accesses by day for the rst course
offering.
(f) Histogram of forum accesses by day for the second course
offering. Note the rough similarity in shape of the history compared
to Figure 4e, (left) suggesting feature extraction for this resource
may be appropriate.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Anderson</surname>
          </string-name>
          .
          <article-title>Rules of the mind</article-title>
          .
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K. E.</given-names>
            <surname>Arnold</surname>
          </string-name>
          and
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Pistilli</surname>
          </string-name>
          .
          <article-title>Course signals at purdue: Using learning analytics to increase student success</article-title>
          .
          <source>In Proceedings of the 2Nd International Conference on Learning Analytics and Knowledge</source>
          ,
          <source>LAK '12</source>
          , pages
          <fpage>267</fpage>
          {
          <fpage>270</fpage>
          , New York, NY, USA,
          <year>2012</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B. S.</given-names>
            <surname>Bloom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Engelhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Furst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. H.</given-names>
            <surname>Hill</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Krathwohl</surname>
          </string-name>
          .
          <article-title>Taxonomy of educational objectives: Handbook i: Cognitive domain</article-title>
          . New York: David McKay,
          <volume>19</volume>
          :
          <fpage>56</fpage>
          ,
          <year>1956</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Greer</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. I. McCalla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kettel</surname>
          </string-name>
          , and
          <string-name>
            <surname>J. Bowes.</surname>
          </string-name>
          <article-title>User modelling in i-help: What, why, when and how</article-title>
          .
          <source>In User Modeling</source>
          , pages
          <volume>117</volume>
          {
          <fpage>126</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Carnegie</given-names>
            <surname>Learning</surname>
          </string-name>
          .
          <source>The Cognitive Tutor: Applying Cognitive Science to Education</source>
          .
          <source>Technical report</source>
          , Carnegie Learning, Inc., Pittsburgh, PA, USA,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K. J.</given-names>
            <surname>Gergen</surname>
          </string-name>
          .
          <article-title>The social constructionist movement in modern psychology</article-title>
          .
          <source>American psychologist</source>
          ,
          <volume>40</volume>
          (
          <issue>3</issue>
          ):
          <fpage>266</fpage>
          ,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Graesser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Chipman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Haynes</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Olney</surname>
          </string-name>
          .
          <article-title>Autotutor: An intelligent tutoring system with mixed-initiative dialogue</article-title>
          .
          <source>Education</source>
          , IEEE Transactions on,
          <volume>48</volume>
          (
          <issue>4</issue>
          ):
          <volume>612</volume>
          {
          <fpage>618</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hall</surname>
          </string-name>
          , E. Frank,
          <string-name>
            <given-names>G.</given-names>
            <surname>Holmes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pfahringer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Reutemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and I. H.</given-names>
            <surname>Witten</surname>
          </string-name>
          .
          <article-title>The weka data mining software: An update</article-title>
          .
          <source>SIGKDD Explor</source>
          . Newsl.,
          <volume>11</volume>
          (
          <issue>1</issue>
          ):
          <volume>10</volume>
          {
          <fpage>18</fpage>
          ,
          <string-name>
            <surname>Nov</surname>
          </string-name>
          .
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>E. J. M.</given-names>
            <surname>Laur a</surname>
          </string-name>
          , E. W. Moody, S. M.
          <string-name>
            <surname>Jayaprakash</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Jonnalagadda</surname>
            , and
            <given-names>J. D.</given-names>
          </string-name>
          <string-name>
            <surname>Baron</surname>
          </string-name>
          .
          <article-title>Open academic analytics initiative: Initial research ndings</article-title>
          .
          <source>In Proceedings of the Third International Conference on Learning Analytics and Knowledge</source>
          ,
          <source>LAK '13</source>
          , pages
          <fpage>150</fpage>
          {
          <fpage>154</fpage>
          , New York, NY, USA,
          <year>2013</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B.</given-names>
            <surname>Martin</surname>
          </string-name>
          .
          <article-title>Constraint-based modelling: Representing student knowledge</article-title>
          .
          <source>New Zealand Journal of Computing</source>
          ,
          <volume>7</volume>
          (
          <issue>2</issue>
          ):
          <volume>30</volume>
          {
          <fpage>38</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ohlsson</surname>
          </string-name>
          .
          <article-title>Learning from performance errors</article-title>
          .
          <source>Psychological Review</source>
          ,
          <volume>103</volume>
          (
          <issue>2</issue>
          ):
          <volume>241</volume>
          {
          <fpage>262</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>